Objective To summarise the literature on the assessment of competences in postgraduate medical training.
Methods A systematic literature review was performed within a EULAR taskforce on the assessment of competences in rheumatology training and other related specialities (July 2019). Two searches were performed: one search for rheumatology and one for related medical specialities. Two reviewers independently identified eligible studies and extracted data on assessment methods. Risk of bias was assessed using the medical education research study quality instrument.
Results Of 7335 articles in rheumatology and 2324 reviews in other specialities, 5 and 31 original studies were included, respectively. Studies in rheumatology were at variable risk of bias and explored only direct observation of practical skills (DOPS) and objective structured clinical examinations (OSCEs). OSCEs, including clinical, laboratory and imaging stations, performed best, with a good to very good internal consistency (Cronbach’s α=0.83–0.92), and intrarater reliability (r=0.80–0.95). OSCEs moderately correlated with other assessment tools: r=0.48 vs rating by programme directors; r=0.2–0.44 vs multiple-choice questionnaires; r=0.48 vs DOPS. In other specialities, OSCEs on clinical skills had a good to very good inter-rater reliability and OSCEs on communication skills demonstrated a good to very good internal consistency. Multisource feedback and the mini-clinical evaluation exercise showed good feasibility and internal consistency (reliability), but other data on validity and reliability were conflicting.
Conclusion Despite consistent data on competence assessment in other specialities, evidence in rheumatology is scarce and conflicting. Overall, OSCEs seem an appropriate tool to assess the competence of clinical skills and correlate well with other assessment strategies. DOPS, multisource feedback and the mini-clinical evaluation exercise are feasible alternatives.
- Outcome Assessment
- Quality Indicators
- Health Care
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
To date, a wide range of assessment tools are available to evaluate different educational domains in medical training. Knowledge is the priority in early years of medical training, and students are mainly trained and assessed in theoretical concepts. As training progresses, focus is shifted to the acquisition of complex medical competences, integrating knowledge with skills and attitudes to produce a positive, observable behaviour. Careful planning of assessment strategies is crucial in order to assess, not only knowledge but also competence or performance, particularly at a speciality training level.1 2
Both trainers and trainees benefit from assessments. Feedback motivates learners and identifies areas for improvement (formative assessment). Alternatively, validated methods of assessment of competence can measure the effectiveness of a teaching programme in achieving its objectives.2 3 In addition, the outcomes of assessments may be used to compare performances across centres, and to ensure the attainment of an agreed standard of a trainee (summative assessment).3 Assessment tools used both formative and summative assessments need to be valid (ie, measure what they are supposed to measure), reliable (ie, consistent and applicable in different contexts) and feasible (ie, easy to be carried out with the available resources).4
Assessment can be performed in a real-life setting while trainees are in their working environment (workplace-based assessment) or in a dedicated setting (simulation-based assessment). Workplace-based assessment includes, among others, the direct observation of practical skills (DOPS) and the mini-clinical exercise. Conversely, simulation-based assessment may include oral exams, written tests and the objective structured clinical examination (OSCE), which reproduces a patient encounter.5
However, assessment tools are used in heterogeneous ways across European countries with different times, methods and overarching strategies.6 7 Furthermore, strategies are planned to assess curricular competencies, and some competences or skills are considered core in one European country, but optional in others, and hence are not assessed. The lack of a harmonised strategy across Europe is a major unmet need in this field. In the era of large-scale movement of specialists across European countries, a pan-European approach to training and, eventually, to assessment of competences is advisable. This would ensure that when one country certifies a doctor as a rheumatology specialist, he/she has acquired the same core of knowledge, practical skills and other, previously agreed-upon, competences, at a given standard, independently of the country in which he/she has trained. However, specific recommendations for such an approach are currently lacking.
This systematic literature review (SLR) was developed to inform the EULAR taskforce responsible for developing the points to consider (PtC) for the assessment of competences in rheumatology training. Specifically, the SLR aims to summarise the available information on competence assessment methods and strategies within postgraduate medical training in both rheumatology and other related specialities.
An SLR was conducted. The steering group of the EULAR task force to develop PtC for the assessment of competences in rheumatology training outlined the scope of the literature search, according to the Population, Instrument of interest and Measurement properties of interest approach following the Outcome in Rheumatology Trials (OMERACT) methodology.8 The population consisted of medical doctors in speciality training (also referred to as trainees or fellows) in rheumatology or other related specialities. Instruments of interest included any assessment strategy or method, while the measurement properties of interest were validity, discrimination (including both reliability and sensitivity to change) and feasibility; at least one of them was required to be reported.
Two separate searches (online supplemental text S1 and S2) were performed, one for studies in rheumatology and another for SLRs in other medical specialities (online supplemental text S3). The searches were performed in MEDLINE, Embase, The Cochrane Database of Systematic Reviews, CENTRAL, DARE, HTA Database, NHS EED, CINAHL, Eric, Web of science, PsycINFO, PubMed Health (discontinued on 31 October 2018), Epistemonikos and Index to theses to July 2019. The PubMed Similar Articles tool was also used, and a crosscheck of the top 12 scientific journals in medical education was performed. Individual original research studies were selected in the rheumatology search. Conversely, in the medical specialities search, SLRs were retrieved and, subsequently, original studies extracted from the retrieved SLRs, using the same inclusion criteria as those in rheumatology.
Study selection, data collection and assessment of risk of bias
Two reviewers (AA and AN) independently assessed titles and abstracts according to predetermined inclusion/exclusion criteria following the OMERACT methodology (online supplemental text S4), followed by full-text review. The agreement between reviewers, calculated with the Cohen’s kappa, was 0.93. Discrepancies were resolved by consensus. Data on study characteristics, investigated assessment method and the included measurement(s) of interest were extracted. Risk of bias was assessed using the medical education research study quality instrument.9 This instrument has been developed to measure the quality of experimental and observational studies in medical education (eg, sampling strategy, validity of the assessment instrument and appropriateness of data analysis) (online supplemental table S1). In the absence of a validated cut-off value, we classified individual studies based on the medical education research study quality instrument scores as low (≥12), unclear (≥10 but <12) or high (<10) risk of bias. Similarly, we classified the validity/discrimination (including reliability and sensitivity to change)/feasibility as fair to moderate or good to very good.10 11 Studies were too heterogeneous to allow any form of pooling; therefore, descriptive results are presented. Table 1 shows the abbreviation, full name, short description and setting of all the assessment tools mentioned in the manuscript.
The search in rheumatology yielded 7335 articles, of which 20 were selected for detailed review; 5 met the inclusion criteria. The search in other specialities yielded 2324 SLRs; 46 were selected for detailed review, of which 36 met the inclusion criteria. Individual studies included in these SLRs totalled 2211, of which 347 were selected. After deduplication, 278 underwent full-text evaluation. Ultimately, 31 studies met the inclusion criteria in related medical specialities.
Assessment of competences in rheumatology
The five eligible studies on the assessment of competences in rheumatology were at variable risk of bias (two low, two unclear and one high) and explored only two methods: DOPS (one study) and OSCE (four studies)12–16 (table 2). Rheumatology OSCEs have been used to assess clinical skills,12–14 communication,13 professionalism13 and practical skills on musculoskeletal ultrasound.15 In particular, the latter included stations with healthy subjects and patients with rheumatic diseases and trainees were blinded on whether the joint to be examined was abnormal or normal. In the abnormal stations, gouty arthritis, synovitis and erosive arthritis were represented. With regard to the other OSCEs, they encompassed different clinical scenarios such as rheumatoid arthritis and systemic lupus erythematosus along with laboratory (eg, synovial fluid analysis) and imaging (eg, synovial bone radiography) stations. Conflicting evidence on internal consistency, intermethod and inter-rater reliability was reported in these five studies. One study at low risk of bias demonstrated that a rheumatology OSCE including a mix of clinical, laboratory and imaging stations showed a good to very good internal consistency (Cronbach’s α=0.83–0.92), intrarater reliability (correlation coefficient=0.80–0.95) and construct validity.12 A fair to moderate correlation (r=0.44–0.52) between OSCEs and other assessment tools, including DOPS, rating by programme directors and written exam was also found.12–14 16 In the specific ultrasound OSCE, the assessment of normal joint more reliably discriminated examinees from the ultrasonography experts (control population) than the evaluation of pathologic joints; inter-rater reliability was also better for normal joint assessment stations.15 The OSCE scores of normal joint stations correlated with the scores of a written multiple-choice questionnaire exam; both the overall scores of the OSCE and the multiple-choice questionnaire exam showed a poor discrimination of performing examinees from the faculty. The study on DOPS provided evidence only for feasibility16 reporting that 14 forms per resident over the time frame of a month provide a reliable estimate. None of the studies on OSCE provided evidence on feasibility.
Assessment of competences in other medical specialities
Studies in related specialities were more heterogeneous in terms of assessment tools and both type and comprehensiveness of the analysis. The 31 eligible studies were at variable risk of bias (15 low, 14 unclear and 2 high) and explored different methods, including OSCE, DOPS, multisource feedback, mini-clinical evaluation exercise and patient satisfaction questionnaires. Online supplemental tables S2 and S3 show the information of individual studies.
Simulation-based assessment (OSCE)
As far as simulation-based assessment is concerned, evidence on internal consistency of an OSCE assessing clinical skills was conflicting. The majority of studies at low risk of bias reported a fair to moderate internal consistency (Cronbach’s α=0.12–0.69),17–20 while most studies at high risk of bias reported a good to very good internal consistency (Cronbach’s α=0.8–0.98)21–24 (table 3). Nevertheless, the majority of studies exploring inter-rater reliability agreed that OSCEs assessing clinical skills have a good to very good inter-rater reliability (r=0.60–0.95).17 18 22 23 Conversely, OSCEs assessing communication skills consistently demonstrated a good to very good internal consistency (Cronbach’s α=0.7–0.98),23 25 26 while evidence on inter-rater reliability was conflicting.23 25–27 However, OSCE scores poorly correlated with those of other assessment tools such as oral exams,17 23 written exams,28 29 in-training examinations,18–20 assessment by staff and peers25 or the American Board of Internal Medicine evaluation form.19 Finally, with regard to feasibility, 10–14 OSCE stations would provide a reliable estimate of both clinical30 and communication skills.27 Simulation-based assessment can also rely on the use of standardised patient encounters to evaluate clinical and communication skills. One study at unclear risk of bias demonstrated that, in standardised patient encounters, non-verbal communication was most closely associated with patient satisfaction, with a good to very good internal consistency.31 Although scores on clinical skills obtained in the setting of standardised patient encounters poorly correlated with those of the American Board of Internal Medicine,32 one study at low risk of bias exploring patient satisfaction as an indicator of the trainee’s clinical skills demonstrated that this assessment is feasible and has a good internal consistency but a poor inter-rater reliability.33
Workplace-based assessment (DOPS, mini-assessed clinical encounter, case-based discussion, mini-clinical evaluation exercise, multisource feedback)
With regard to workplace-based assessment, one study at low risk of bias reported that DOPS showed a good to very good internal consistency (Cronbach’s α≥0.8), inter-rater reliability (r=0.83–0.87) and a good prediction of the American Board of Internal Medicine certifying examination scores in internal medicine.34 Conversely, in the field of psychiatry, DOPS did not correlate with any other assessment tool investigated such as the mini-assessed clinical encounter or the case-based discussion and was also less feasible (table 3).35
Three studies at low risk of bias provided evidence of a good to very good internal consistency (Cronbach’s α=0.65–0.90) and feasibility of the mini-clinical evaluation exercise,36–38 which also showed a good correlation with other assessment tools such as the American Board of Internal Medicine monthly evaluation form,37 or the Royal College of Physicians and Surgeons of Canada Comprehensive Examination in Internal Medicine38 (table 4). Likewise, most studies on multisource feedback reported a good to very good internal consistency (Cronbach’s α=0.65–0.90), feasibility and inter-rater reliability.39–42 However, results on validity and reliability of these two tools were conflicting.
Online supplemental table S4 displays the remaining studies on written exams and script concordance test.
Our SLR has shown a large heterogeneity in the strategies and methods used for assessment of competences in the training of rheumatology and other medical specialities. Specifically, evidence in rheumatology is scarce with all studies published over the last 10 years, while in other specialities including internal medicine and paediatrics, research on such educational matters has been ongoing for at least 25 years.21 43 This study attempted to overcome the lack of data in the field of rheumatology by exploring other related medical specialities. However, many of the investigated tools were speciality specific (eg, OSCEs with intubation simulators for anaesthesiology trainees44) and therefore non-relevant to the field of rheumatology; studies exploring these tools were excluded. Furthermore, a crucial aspect emerging from the SLR is the difficulty of comparing studies. It is challenging to explore the same tools even within the same speciality due to the heterogeneity of the specific instruments and the context of their application, the measurement properties evaluated and the data analysis. A wide variety of statistical methods have been employed to determine the properties of interest of the investigated assessment strategies, and in some cases, the analysis provided is insufficient or not robust. For example, despite being rejected as an adequate measure of inter-rater reliability, some studies continue to report the percentage of agreement between raters instead of an intraclass correlation coefficient or kappa.45 46
Overall, with regards to rheumatology training, the SLR provides enough evidence only for OSCEs, as other assessment tools were not sufficiently investigated. Initially developed to address the unreliability of traditional strategies of clinical assessment, such as the long case discussion,47 OSCEs are extensively used in undergraduate medical training.48 The key concepts behind the OSCEs are standardisation and generalisability, as all candidates deal with the same clinical tasks to be completed in the same time frame, in the same environment and are scored through a structured checklist. OSCE stations can be designed and tailored for specific purposes and should always be closely aligned to the relevant training curriculum, in order to demonstrate construct validity. In rheumatology training, current evidence suggests that OSCEs including clinical, laboratory and imaging stations are the most reliable and demonstrate good content validity.12 14 In most of the studies, not only in rheumatology but also in other specialities, clinical OSCE scores moderately correlated with those of other assessment tools such as rating by programme directors and written exams.12–20 23 25 28 29 This probably reflects that methods measure different dimensions of competence; hence, a combination of different (complementary) tools is advisable to obtain an overall perspective on the trainee. The lack of correlation between an imaging-specific OSCE and a written exam suggests that the latter may assess whether physicians can identify abnormalities on a static image, but who might not yet be skilled enough to obtain the relevant images independently.15 OSCEs can be applied beyond assessment of clinical skills, as they allow assessment of non-clinical competences, such as communication or professionalism. Medical speciality curricula highlight that, upon training completion, specialists should have acquired skills beyond clinical knowledge, and these competences can be difficult to assess. Communication skills for appropriate interaction with patients, caregivers and colleagues are included, among others, in the European, American and Canadian frameworks for the competences that doctors should have. In Europe, the European Union of Medical Specialists defines professional competence as ‘the habitual and judicious use of communication, knowledge, technical skills, clinical reasoning, emotions, values, and reflection in daily practice for the benefit of the individual and community being served’.49
With regard to workplace-based assessment, the only study exploring DOPS in rheumatology provided evidence of feasibility but did not explore any other relevant features of this assessment method.16 Furthermore, authors acknowledged that only 10% of the evaluations reflected a true, direct observation, with 80% resulting from case discussion with the trainee and 10% from a review of the written notes. Therefore, the resulting feasibility should be considered with caution, especially because two additional studies in other specialities failed to prove its feasibility.34 35 Evidence from other specialities such as internal medicine and physical medicine/rehabilitation underpins the reliability and feasibility of mini-clinical evaluation exercise and multisource feedback,36–42 but no data on these methods are available in rheumatology. The similarities between these two specialities and rheumatology suggest that both mini-clinical evaluation exercise and multisource feedback might be successfully employed in rheumatology to assess clinical and broader generic competences, respectively. The multisource feedback has the potential to contribute to the professional development of trainees. It provides a comprehensive trainee overview resulting from observation over a long period, under natural circumstances, and may include people not responsible for formal judgements about trainees. The lack of correlation between raters involved in multisource feedback is considered a strength of the assessment method as different raters (eg, nurses, patients and programme directors) focus on different skills and attitudes.41
Although the need to tackle the assessment of competences in rheumatology postgraduate training has been highlighted for the past 20 years,5 it still remains an unmet need. Despite a consistent body of evidence on assessment of competences in other specialities, data in rheumatology are scarce and conflicting. Owing to its good to very good internal consistency, intrarater reliability, construct validity and moderate correlation with other assessment strategies, OSCEs with clinical, laboratory and imaging stations appear an appropriate tool to assess clinical competences in rheumatology. Based on evidence from other specialities, DOPS, multisource feedback and mini-clinical evaluation exercise are feasible alternatives.
Considering the increasing use of technology in the medical field, one could envisage that at least for some assessment tools and strategies, online-based platforms may represent a good alternative and become the routine. In several universities across Europe, digital portfolios are already implemented in rheumatology training,50 and the scoring of recordings may replace direct observation of trainees performing a certain procedure, either in the workplace or in an OSCE station, as already shown in emergency medicine and paediatrics.22 51 Furthermore, the availability of imaging software may facilitate the assessment of MRI or radiography readings at distance. Finally, yet importantly, the recent coronavirus pandemic has dramatically increased the use of remote teaching and assessment and probably set the stage for major changes in the way education will be delivered in the future.
In conclusion, the results of the present SLR further underscore this gap with other medical specialities and highlight the need to develop recommendations to harmonise strategies and methods for the assessment of competences across Europe. This SLR informs the ongoing initiative to formulate EULAR PtC for the assessment of competences in rheumatology training.
What is already known about this subject?
Assessment of competences in postgraduate training is highly heterogeneous across Europe with different times, methods and overarching strategies, and an overview of available evidence is lacking.
What does this study add?
Evidence on assessment of competences in rheumatology training is scarce, but the available studies agree that objective structured clinical examination with clinical, laboratory and imaging stations may be an appropriate tool for this purpose.
Data from other medical specialities point out that direct observation of practical skills, multisource feedback and the mini-clinical evaluation exercise may be feasible alternatives in rheumatology.
How might this impact on clinical practice?
Evidence-based recommendations to harmonise assessment of competences in rheumatology training are needed.
Contributors All authors contributed and finally approved the current manuscript.
Funding This work was funded by European League Against Rheumatism.
Competing interests None declared.
Patient consent for publication Not required.
Ethics approval Not applicable.
Provenance and peer review Not commissioned; externally peer reviewed.
Data availability statement All data relevant to the study are included in the article or uploaded as online supplemental information.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.