Tool | OSCE | OSCE | OSCE | DOPS34 | DOPS35 |
---|---|---|---|---|---|
Competence assessed | Clinical skills | Communication skills | Knowledge | Clinical skills | Overall competence |
Number of studies | 11 | 4 | 2 | 1 | 1 |
Current practice or research purpose | Current practice (N=5)17–19 28 29
Research purpose (N=6)20–24 30 | Current practice (N=1)25
Research purpose (N=3)23 26 27 | Current practice (N=1)28
Research purpose (N=1)24 | Current practice | Research purposes |
Comparator tool* | Intraining examination (ITE) (N=4)18–21
Oral exam (N=2)17 23 Written exam (N=2)28 29 National board exam (N=1)19 Resident performance ratings (RPR) (N=1)21 Faculty global evaluation (N=1)18 None (N=3)22 24 30 | Oral exam (N=1)23
OSCE clinical scores (N=1)25 None (N=2)26 27 | None (N=2)24 28 | None | Case-based discussion, (mini) assessment of clinical expertise, peer assessment tool; patient satisfaction questionnaire, case conference, journal club presentation |
Sample size, median (IQR)† | 43 (21–74) | 29 (12–69) | 5824 and 14728 | 110 | 600 |
Specialties | General practice (GP) (N=3)18 28 29
Paediatrics (N=3)20 21 24 Anesthesiology (N=3)17 23 30 Internal medicine (N=1)19 Emergency medicine (N=1)22 | GP (N=2)25 26
Respiratory medicine (N=1)27 Anaesthesiology (N=1)23 | GP (N=1)28
Paediatrics (N=1)24 | Internal medicine | Psychiatry |
External control population | Medical students (N=1)21
GP specialists (N=1)29 None (N=9)17–2022–24 28 30 | Internal medicine residents and pulmonology attending physicians (N=1)27
None (N=3)23 25 26 | None (N=2)24 28 | None | None |
Study duration, median (IQR) | Reported (N=8)17–20 22 24 28 29 80 (4.5–156) weeks Not reported (N=3)21 23 30 | Reported (N=1)25 4 weeks Not reported (N=3)23 26 27 | 424 and 628 weeks | 156 weeks | 36 weeks |
Internal consistency | Cronbach’s α=0.12–0.99 (N=8)18–24 28
Intraclass correlation coefficient=0.40 (N=1)30 r correlation coefficient=0.27–0.32 (N=1)17 | Cronbach’s α=0–0.98 (N=4)23 25–27 | Cronbach’s α=0.70–0.99 (N=2)24 28 | ≥0.8 | Not performed |
Inter-rater reliability | r=0.26–0.95 (N=4)17 18 23 30 | r=0.26–0.88 (N=4)23 25–27 | N=0 | 0.83–0.87‡ | Not performed |
Test–retest reliability/intrarater reliability | r=0.62–0.72 (N=1)24 | N=0 | r=0.62–0.72 (N=1)24 | Not performed | Not performed |
Intermethod (or parallel forms) reliability/concurrent validity | OSCE versus ITE: no correlation (N=1)19
OSCE versus ITE: r=0.30–0.71 (N=3)18 20 21 OSCE versus oral exam: r=0.14–0.54 (N=2)17 23 OSCE score versus written exam: r=0.54 (N=1)29 OSCE versus National board exam: no correlation (N=1)19 OSCE versus RPR: r=0.41 (N=1)21 OSCE versus faculty global evaluation: r=0.03–0.51 (N=1)18 | OSCE versus oral exam: r=0.52–0.53 (N=1)23
No correlation between clinical and communication scores of OSCE (N=1)25 | OSCE versus written exam: r=0.22 (N=1)28 | Not performed | None of the tools correlate with each other (all r values <0.65) |
Feasibility | Approximately 100 case presentations through OSCE to reach a g coefficient of 0.8 (N=1)19
10–12 OSCE case presentations through OSCE with three to four judges each rating all the cases to achieve a g coefficient of 0.8 (N=1)30 | A g coefficient of 0.8 can be feasibly achieved with 14 case presentations through OSCE (N=1)27 | N=0 | Not performed | It is possible to achieve a 0.8 reliability in a feasible way only with case based discussion and (mini) assessment of clinical expertise |
Predictive validity | N=0 | N=0 | N=0 | The ratings predicted performance on the national certifying examination | Not performed |
Risk of bias | Low (N=6)17–20 28 29
Unclear (N=4)21–23 30 High (N=1)24 | Low (N=3)25–27
Unclear (N=1)23 High (N=0) | Low (N=1)28
Unclear N=0 High (N=1)24 | Low | Low |
*Studies may use more than one comparator tool. For details, see online supplemental table S2 displaying individual studies.
†If less than three studies, individual values are shown.
‡Calculation method developed in James LR et al. J Appl Psychol. 1984;69:85–98.
Numbers (N) indicate the number of studies.