Table 3

Characteristics of the studies on objective structured clinical examination (OSCE) and direct observation of practical skills (DOPS) in other specialities

ToolOSCEOSCEOSCEDOPS34DOPS35
Competence assessedClinical skillsCommunication skillsKnowledgeClinical skillsOverall competence
Number of studies114211
Current practice or research purposeCurrent practice (N=5)17–19 28 29
Research purpose (N=6)20–24 30
Current practice (N=1)25
Research purpose (N=3)23 26 27
Current practice (N=1)28
Research purpose (N=1)24
Current practiceResearch purposes
Comparator tool*Intraining examination (ITE) (N=4)18–21
Oral exam (N=2)17 23
Written exam (N=2)28 29
National board exam (N=1)19
Resident performance ratings (RPR) (N=1)21
Faculty global evaluation (N=1)18
None (N=3)22 24 30
Oral exam (N=1)23
OSCE clinical scores (N=1)25
None (N=2)26 27
None (N=2)24 28 NoneCase-based discussion, (mini) assessment of clinical expertise, peer assessment tool; patient satisfaction questionnaire, case conference, journal club presentation
Sample size, median (IQR)†43 (21–74)29 (12–69)5824 and 14728 110600
SpecialtiesGeneral practice (GP) (N=3)18 28 29
Paediatrics (N=3)20 21 24
Anesthesiology (N=3)17 23 30
Internal medicine (N=1)19
Emergency medicine (N=1)22
GP (N=2)25 26
Respiratory medicine (N=1)27
Anaesthesiology (N=1)23
GP (N=1)28
Paediatrics (N=1)24
Internal medicinePsychiatry
External control populationMedical students (N=1)21
GP specialists (N=1)29
None (N=9)17–2022–24 28 30
Internal medicine residents and pulmonology attending physicians (N=1)27
None (N=3)23 25 26
None (N=2)24 28 NoneNone
Study duration, median (IQR)Reported (N=8)17–20 22 24 28 29 80 (4.5–156) weeks
Not reported (N=3)21 23 30
Reported (N=1)25 4 weeks
Not reported (N=3)23 26 27
424 and 628 weeks156 weeks36 weeks
Internal consistencyCronbach’s α=0.12–0.99 (N=8)18–24 28
Intraclass correlation coefficient=0.40 (N=1)30
r correlation coefficient=0.27–0.32 (N=1)17
Cronbach’s α=0–0.98 (N=4)23 25–27 Cronbach’s α=0.70–0.99 (N=2)24 28 ≥0.8Not performed
Inter-rater reliabilityr=0.26–0.95 (N=4)17 18 23 30 r=0.26–0.88 (N=4)23 25–27 N=00.83–0.87‡Not performed
Test–retest reliability/intrarater reliabilityr=0.62–0.72 (N=1)24 N=0r=0.62–0.72 (N=1)24 Not performedNot performed
Intermethod (or parallel forms) reliability/concurrent validityOSCE versus ITE: no correlation (N=1)19
OSCE versus ITE: r=0.30–0.71 (N=3)18 20 21
OSCE versus oral exam: r=0.14–0.54 (N=2)17 23
OSCE score versus written exam: r=0.54 (N=1)29
OSCE versus National board exam: no correlation (N=1)19
OSCE versus RPR: r=0.41 (N=1)21
OSCE versus faculty global evaluation: r=0.03–0.51 (N=1)18
OSCE versus oral exam: r=0.52–0.53 (N=1)23
No correlation between clinical and communication scores of OSCE (N=1)25
OSCE versus written exam: r=0.22 (N=1)28 Not performedNone of the tools correlate with each other (all r values <0.65)
FeasibilityApproximately 100 case presentations through OSCE to reach a g coefficient of 0.8 (N=1)19
10–12 OSCE case presentations through OSCE with three to four judges each rating all the cases to achieve a g coefficient of 0.8 (N=1)30
A g coefficient of 0.8 can be feasibly achieved with 14 case presentations through OSCE (N=1)27 N=0Not performedIt is possible to achieve a 0.8 reliability in a feasible way only with case based discussion and (mini) assessment of clinical expertise
Predictive validityN=0N=0N=0The ratings predicted performance on the national certifying examinationNot performed
Risk of biasLow (N=6)17–20 28 29
Unclear (N=4)21–23 30
High (N=1)24
Low (N=3)25–27
Unclear (N=1)23
High (N=0)
Low (N=1)28
Unclear N=0
High (N=1)24
LowLow
  • *Studies may use more than one comparator tool. For details, see online supplemental table S2 displaying individual studies.

  • †If less than three studies, individual values are shown.

  • ‡Calculation method developed in James LR et al. J Appl Psychol. 1984;69:85–98.

  • Numbers (N) indicate the number of studies.