Article Text
Abstract
Objective Reliable interpretation of imaging findings is essential for the diagnosis of axial spondyloarthritis (axSpA) and requires a high level of experience. We investigated experience-dependent differences in diagnostic accuracies using X-ray (XR), MRI and CT.
Methods This post hoc analysis included 163 subjects with low back pain. Eighty-nine patients had axSpA, and 74 patients had other conditions (mechanical, degenerative or non-specific low back pain). Final diagnoses were established by an experienced rheumatologist before the reading sessions. Nine blinded readers (divided into three groups with different levels of experience) scored the XR, CT and MRI of the sacroiliac joints for the presence versus absence of axSpA. Parameters for diagnostic performance were calculated using contingency tables. Differences in diagnostic performance between the reader groups were assessed using the McNemar test. Inter-rater reliability was assessed using Fleiss kappa.
Results Diagnostic performance was highest for the most experienced reader group, except for XR. In the inexperienced and semi-experienced group, diagnostic performance was highest for CT&MRI (78.5% and 85.3%, respectively). In the experienced group, MRI showed the highest performance (85.9%). The greatest difference in diagnostic performance was found for MRI between the inexperienced and experienced group (76.1% vs 85.9%, p=0.001). Inter-rater agreement was best for CT in the experienced group with κ=0.87.
Conclusion Differences exist in the learnability of the imaging modalities for axSpA diagnosis. MRI requires more experience, while CT is more suitable for inexperienced radiologists. However, diagnosis relies on both clinical and imaging information.
- magnetic resonance imaging
- inflammation
- low back pain
Data availability statement
Data are available on reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
WHAT IS ALREADY KNOWN ON THIS TOPIC
The diagnostic accuracy in assessing imaging findings depends on the radiologists’ or rheumatologists’ experience in axial spondyloarthritis (axSpA) imaging.
WHAT THIS STUDY ADDS
The level of experience and the modality has an impact on diagnostic accuracy.
Learnability differs across imaging modalities, with MRI being the most and X-ray the least teachable modality.
CT is the most accessible modality for less experienced readers.
HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY
Continuous training of young radiologists and rheumatologists in the field of axSpA imaging is crucial for accurate early diagnosis of axSpA and, thus, for improved patients’ outcomes.
Especially in MRI, training can provide significant positive effects on diagnostic accuracy.
Introduction
Imaging is of great importance in the diagnosis of axial spondyloarthritis (axSpA) as well as in therapy monitoring.1 2 Patients typically present with low back pain in young adulthood,3 but clinically it may be difficult to differentiate from other causes of back pain.4 5 The disease primarily affects the sacroiliac joints (SIJs) and the spine.6 If untreated, axSpA may lead to a loss of mobility and a significant worsening of the quality of life.7 For this reason, the early and reliable detection of axSpA with imaging techniques is crucial for diagnosis.8 As difficult as the diagnosis of axSpA is clinically,9 it is also challenging with imaging,10 as imaging parameters may be either inconclusive or prone to misinterpretation, especially in the early phase of disease.11–13
In clinical routine, an X-ray (XR) examination of the SIJs is usually performed as the initial imaging test to detect structural changes, such as erosions and ankyloses, associated with axSpA.1 However, structural lesions are absent or undetectable with this imaging modality in non-radiographic axSpA,11 and radiographs of the SIJs are notoriously difficult to interpret due to the complex anatomy of the joints. Reliable detection of inflammation, such as bone marrow oedema and other lesions, in early disease is only possible with MRI.14 Moreover, knowledge of key differential diagnoses, such as mechanical or degenerative SIJ disease, is also essential in evaluating these images since bone marrow oedema can also occur in mechanically induced conditions of the SIJs.15 In this respect, it is noteworthy that CT, the gold standard for detecting structural lesions, has recently gained attention as an imaging modality for patients with inconclusive MRI findings or contraindications to MRI or when MRI is not available.16–19
The interpretation of imaging findings in axSpA requires a high level of radiological experience, which is one of the factors that contributes to the notoriously long delay between the onset of symptoms and diagnosis.20 On the other hand, the uncritical use of classification criteria for diagnosis and misinterpretation of magnetic resonance images lead to a considerable number of false-positive diagnoses in current clinical practice.12 Accessibility, costs, patient burden, diagnostic accuracy and reliability of imaging tests are critical factors for the decision, which imaging test to choose during a specific step of the diagnostic process.19 21
This study aims to investigate the diagnostic accuracy in axSpA imaging of readers with different experience levels and to identify differences in their interpretation of XR, MRI and CT examination in order to elucidate which imaging modality or combination of modalities yield the highest diagnostic accuracy and improve reliability.
Materials and methods
Subjects
This study included two different prospective populations of patients with chronic low back pain and suspected or diagnosed axSpA and was designed as a post hoc analysis: the SacroIliac joint MAgnetic resonance imaging and Computed Tomography study with 110 patients22 and the Virtual Non-Calcium Susceptibility-Weighted Imaging study with 72 patients.23 For the present analysis, patients were divided into two groups based on the clinical diagnosis established by the local expert rheumatologists considering all imaging findings in conjunction with clinical and laboratory results—a group of patients with a diagnosis of axSpA and a control group of patients with other diagnoses (eg, degenerative or mechanical back pain or non-specific low back pain). Patients or individuals with no complete imaging or clinical data were excluded from the analysis (figure 1).
Readers and scoring system
XR, MRI and CT of the SIJs were anonymised prior to the reading sessions. In a virtual session, all readers were introduced to the reading methods and scoring system before starting the reading process. In a first reading session, nine readers scored XR, MRI (oblique-coronal T1-weighted and short-tau inversion recovery (STIR) sequence) and CT separately for the presence or absence of axSpA using a dichotomous score (‘axSpA’ or ‘non-axSpA’). In a separate reading session, the readers scored combined XR&MRI and CT&MRI. The readers had different years of experience in musculoskeletal imaging and were divided into three reader groups based on their experience level: an inexperienced reader group (low-XP) including three medical research students (CS, FR and DD) with 0 to 1 year of experience, a semi-experienced reader group (intermediate-XP) consisting of radiologists with 3–8 years of experience (STU, JG and KZ), and an experienced group of senior physicians (two radiologists (TD and IE) and one senior rheumatologist (DP)) with 12–17 years of experience in musculoskeletal imaging (high-XP). The readers were blinded to clinical information and other imaging data.
Statistical analysis
Statistical analysis was performed using GraphPad Prism (V.9.4.1 for MacOS, GraphPad Software, La Jolla, California, USA) and Microsoft Excel (V.16.65 for MacOS). Parameters of diagnostic accuracy for the different imaging modalities (XR, CT and MR) and their combinations (XR&MR and CT&MR) were calculated using contingency tables, separately for the three reader groups: sensitivity, specificity, positive/negative predictive values and likelihood ratios (LR+/LR−) using the Wilson/Brown method. The McNemar test was used to assess significant differences in correct and incorrect answers between the reader groups separately for each imaging modality and combination. Furthermore, Fleiss kappa (κ) was calculated to quantify the agreement of the three readers within each of the reader groups. Fleiss κ was interpreted according to Landis and Koch.24 A p value <0.05 was considered statistically significant.
Results
Subjects
A total of 182 subjects underwent eligibility assessment, with 19 patients excluded due to insufficient clinical or imaging data. After applying the exclusion criteria, 163 subjects with low back pain (82 women; mean age, 38 (SD 10.6), 19–62 years and a mean duration of low back pain of 80.6 months (SD 89.9)) were included in the analysis. Prior to the reading sessions, 89 patients were diagnosed with axSpA by the local rheumatologist, while 74 patients were diagnosed with other conditions (56 patients with degenerative or mechanical SIJ disease and 18 patients with non-specific low back pain) (figure 2). The mean Bath Ankylosing Spondylitis Disease Activity Index was 4.6 (SD 1.8) in patients with axSpA with a mean C reactive protein level of 7.8 mg/dL (SD 12.0). HLA-B27 was positive in 60.2% of the patients. Further clinical details are available in our earlier publication.13
Image reading
Overall, 815 imaging datasets were anonymised before scoring. Although diagnostic accuracy was higher for high-XP compared with the intermediate-XP group, except for XR, the difference was not statistically significant. Diagnostic accuracies of the intermediate-XP and high-XP group were also not significantly different in the combined evaluation of XR&MR (84.0%) and CT&MR (85.3%). In the low-XP and intermediate-XP groups, the highest diagnostic accuracy was found for CT&MR (78.5% and 85.3%, respectively), while the high-XP readers achieved the highest accuracy for MRI (85.9%). Figure 2 summarises the results of the contingency table analyses. The contingency tables are presented in online supplemental table 1. Diagnostic accuracies differed most significantly for MRI between low-XP and high-XP (p=0.001), followed by XR&MR (p=0.009) (figure 3). Examples of the image reading across the different reader groups are shown in figure 4. Regarding the inter-rater reliability, we found higher kappa values for CT in all three reader groups compared with the other modalities and combinations. Agreement between the readers was best for CT in high-XP with κ=0.87, followed by κ=0.79 for intermediate-XP. The low-XP readers achieved only fair agreement, except for CT (κ=0.50) and CT&MR (κ=0.41).
Supplemental material
Discussion
In this study, we investigated differences in diagnostic accuracies achieved in the interpretation of musculoskeletal imaging findings typical of axSpA in the SIJs by three groups of readers with different levels of experience. Our results show that the level of experience and the imaging modality (XR, CT, MRI) has an impact on the diagnostic accuracy achieved by a reader. As expected, an experienced reader had the highest diagnostic accuracy in all modalities, except for XR. The overall highest accuracy across all three reader groups was achieved for CT while the accuracy of MRI interpretation varied most markedly with the experience level. Furthermore, our study highlights that the addition of CT to MRI resulted in the highest diagnostic accuracy in both the inexperienced and the semi-experienced reader group. Interestingly, this combined modality approach had no additive value for the three highly experienced readers, suggesting that the benefits of adding CT are more pronounced in readers with less experience. This finding underscores the potential of complementary imaging modalities to enhance diagnostic accuracy, particularly in less-specialised centres. These insights contribute valuable considerations for optimising the diagnostic approach in the assessment of axSpA.
Tailoring the appropriate imaging modality is essential for diagnosis in axSpA. Therefore, knowledge about the strengths and weaknesses of the different imaging modality is essential.20 In clinical practice, the assessability and reliability of radiological findings are crucial for selecting the best modality in a given situation. Special training and experience are required, which is why the diagnosis is often delayed outside of specialised axSpA centres. This underlines the importance of continuous training of young and inexperienced radiologists in the field of axSpA imaging using established and new methods.25–27 Our results make a crucial contribution to overcoming existing shortcomings, as our findings reveal that, besides the experience of the reader itself, it is the choice of modality that has an important impact on overall diagnostic accuracy. However, the diagnosis cannot be established by imaging alone but clinical features need also consideration in the diagnostic process, as even highly experienced readers only achieve an accuracy of 85.9%.
Since its introduction in axSpA imaging, MRI has gained importance for early diagnosis and is currently being discussed as the first-line imaging modality whenever axSpA is suspected. However, our results suggest that accurate interpretation of MRI heavily depends on the reader’s experience and requires high-level training compared with CT. Furthermore, MRI is far more expensive than CT and XR and is also less widely available, which may have an overall impact on its usability, particularly in outpatient practices among radiologists without specific experience in axSpA imaging. CT seems to be easier to interpret, especially for beginners, resulting in similar diagnostic accuracy but better inter-rater reliability, which makes it more suitable for inexperienced readers. The disadvantage of radiation exposure in CT is eliminated by the advent of low-dose CT protocols, which expose the patient to radiation doses comparable to XR. A marginal increase in diagnostic accuracy was shown for XR from low-XP to intermediate-XP and high-XP group. Still, the performance and reliability of XR are inferior to those of the cross-sectional imaging techniques across all experience levels, further underlining that XR is difficult to interpret and is especially unsuited for inexperienced readers. Exercises to improve the interpretation of XR images of the SIJs have also been carried out in the past and showed poor agreement even among recognised experts in the disease.28 With XR, there is a risk of misinterpreting imaging findings and missing subtle alterations. This has not been improved over the decades, even through a broader understanding of cross-sectional imaging techniques.
Our results show MRI to be the imaging modality with the best learnability. Hence, training and experience can lead to a significant improvement in diagnostic accuracy when MRI is used29 while, for XR, the impact of training seems to be limited. Indeed, previous studies have reported its limited reproducibility in axSpA imaging,30 showing large interobserver and intraobserver variations.31 Several studies report advantages of CT and MRI over XR in axSpA diagnosis.32 However, inexperienced readers show a more constantly good performance for the detection of typical axSpA-related abnormalities on CT, which makes it interesting as an imaging modality in non-specialized centres.
One limitation of our study is that we only used SIJ images in the reading sessions. Therefore, the effect on diagnostic accuracy of using spine imaging remains to be investigated. Furthermore, we did not separately investigate the influence of known clinical data on diagnostic accuracy. The reader sample included one rheumatologist with extensive scientific experience in axSpA imaging but no specific training in radiology. As the inter-rater agreement was high, we believe that experience is more important than profession and that this fact has not a major impact on our data. Specifically, a comparison of rheumatologists and radiologists was not intended in this study. While this method poses a potential bias risk due to circular reasoning, considering that some of the readers might have participated in the diagnostic process within the clinical setting, it is crucial to note that the reading outcomes from this study were distinct from the clinical diagnostic process and thorough anonymisation was implemented. In the absence of a widely accepted reference standard in axSpA studies, an expert consensus could potentially establish a reference for imaging positivity. However, in this context, the alignment with the final clinical outcome was deemed more important than with imaging criteria alone. Our assessment also does not allow for a differentiated evaluation of the experience-dependent influence of individual inflammatory or structural lesions on the image evaluations. However, it seems reasonable that the interpretation of the large number of detectable, especially inflammatory, lesions in MRI needs more experience than the simple structural assessment with CT or XR. Lastly, measuring experience in axSpA imaging accurately is difficult, as years of experience may not correlate perfectly with the number of scans read.
In conclusion, our findings suggest that the diagnostic accuracy in interpreting images of the SIJs in patients with suspected axSpA varies based on the experience level of radiologists and the chosen imaging modality. It is noteworthy that, overall, the impact of experience on diagnostic accuracy is most pronounced in MRI. However, it is observed that even inexperienced radiologists with a short duration of training demonstrate good diagnostic accuracy when interpreting CT. Moreover, the diagnosis of axSpA relies on both clinical and imaging in the diagnostic process.
Data availability statement
Data are available on reasonable request.
Ethics statements
Patient consent for publication
Ethics approval
The institutional ethics review board approved all investigations prior to study commencement (EA1/0886/16; EA1/073/10). Participants gave informed consent to participate in the study before taking part.
Acknowledgments
The authors thank Ms Bettina Herwig for language editing. The authors thank the Berlin Institute of Health for personal funding (STU, JR and TD) and providing essential infrastructure for data collection. This research project was supported by the Assessment of Spondyloarthritis international Society (ASAS) (research grant for KZ).
References
Supplementary materials
Supplementary Data
This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.
Footnotes
Twitter @BiesenRobert, @mprotopopov, @ProftDr
Contributors FR: design of scoring system, image scoring, data evaluation, article draft, critical revision of the manuscript for important intellectual content. KZ, IE, JG, DD, CS: image scoring, critical revision of the manuscript for important intellectual content. RB: patient acquisition, data collection, critical revision of the manuscript for important intellectual content. HH: patient acquisition, data collection, critical revision of the manuscript for important intellectual content. VR: patient acquisition, data collection, critical revision of the manuscript for important intellectual content. JR: patient acquisition, data collection, critical revision of the manuscript for important intellectual content. MP: patient acquisition, data collection, critical revision of the manuscript for important intellectual content. FP, K-GAH: patient acquisition, data collection, critical revision of the manuscript for important intellectual content. DP: patient acquisition, data evaluation, critical revision of the manuscript for important intellectual content. TD: image scoring, data evaluation, critical revision of the manuscript for important intellectual content. STU: conception and design of the study, design of scoring system, image scoring, data evaluation, statistical calculations, article draft, critical revision of the manuscript for important intellectual content. STU acts as the guarantor of the study.
Funding This research project was funded by the Assessment of Spondyloarthritis international Society (ASAS) and the Berlin Institute of Health. The funding sources were not involved in study design, in the collection, analysis and interpretation of data, in the writing of the report or in the decision to submit the paper for publication.
Competing interests KZ reports funding (research grant) from the Assessment of Spondyloarthritis international Society (ASAS) during the conduct of this study. IE reports personal fees from AbbVie, Elli Lili and Novartis. RB reports personal fees from AstraZeneca, Galapagos, GlaxoSmithKline, Medac and Novartis. HH reports grants from Sobi and personal fees from AbbVie, Novartis, Pfizer, Roche and UCB outside the submitted work. JR is participant in the BIH-Charité Clinician Scientist Programme funded by the Charité—Universitätsmedizin Berlin and the Berlin Institute of Health. FP reports grants and personal fees from Novartis, Lilly and UCB, as well as personal fees from AbbVie, Amgen, BMS, Hexal, Janssen, MSD, Pfizer and Roche. K-GAH reports personal fees from AbbVie, MSD, Pfizer and Novartis, he is also the co-founder of BerlinFlame. DP reports grants and personal fees from AbbVie, Eli Lilly, MSD, Novartis and Pfizer and personal fees from Bristol-Myers Squibb, Roche, UCB, Biocad, GlaxoSmithKline and Gilead outside the submitted work. TD reports personal fees from MSD, Novartis and Eli Lilly and reports funding from the Berlin Institute of Health (BIH) during the conduct of this study. STU reports funding from BIH during the conduct of this study (Junior Digital Clinician Scientist Program). All other authors report no funding.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.