Objective The objective of this study was to evaluate the reliability of recognising structural lesions on MRI (erosions, fatty lesions, ankylosis) of the sacroiliac joints (MRI-SIJ) in clinical practice compared to a central reading in patients with a possible recent axial spondyloarthritis (axSpA).
Methods Patients aged 18–50 years, with recent (<3 years) and chronic (≥3 months) inflammatory back pain, suggestive of axSpA were included in the DEvenir des Spondyloarthrites Indifférenciées Récentes (DESIR) cohort. MRI-SIJ structural lesions were scored by non-trained local readers, and by two trained central readers. Local readers scored each SIJ as normal, doubtful or definite lesions. Central readers scored separately each type of lesion. The central reading (mean of the two central readers’ scores) was the external standard. Agreement (κ) was calculated first between local (3 definitions of a positive MRI-SIJ) and central readings (9 definitions), and then between the two central readers.
Results 664/708 patients with complete available images were included. Agreements between local and central readings were overall ‘fair’, except when considering at least 2 or 3 fatty lesions and at least 3 erosions and/or fatty lesions where agreement was ‘moderate’. Agreement between central readers was similar. MRI-SIJ was positive for 52.6% of patients according to central reading (at least 1 structural lesion) and for 35.4% of patients according to local reading (at least unilateral ‘doubtful‘ or ‘definite’ structural lesions).
Conclusions Agreement on a positive structural MRI-SIJ was fair to moderate between local and central readings, as well as between central readers. The reliability improved when fatty lesions were considered.
Trial registration number NCTO 164 8907.
Statistics from Altmetric.com
What is already known about this subject?
Recent data suggest that structural lesions on MRI of the sacroiliac joints (MRI-SIJ) may be used instead of structural damage on conventional radiographs in the classification of patients.1
What does this study add?
Reliability of non-trained readers as well as trained readers is fair to moderate to recognise structural lesions of SIJ on MRI in patients with recent inflammatory back pain suggestive of spondyloarthritis.
Reliability between local and central readings and between central readers is better when considering at least two fatty lesions.
How might this impact on clinical practice?
Structural lesions are difficult to identify on MRI-SIJ and need to be better defined to be used for axial spondyloarthritis diagnosis in clinical practice.
Diagnosing axial spondyloarthritis (axSpA) in an early stage of the disease is a challenge in clinical practice. Evolution of axSpA can lead to irreversible structural damage and impact the quality of life. There is more and more evidence that early treatment may change the outcome in patients with axSpA.2 ,3 The Assessment of SpondyloArthritis international Society (ASAS) developed classification criteria relevant for early axSpA.4 According to these criteria, patients with at least one SpA feature may be classified in case of structural lesions on X-ray of the sacroiliac joints (X-SIJ) or active sacroiliitis on MRI.5 Although only inflammatory lesions (bone marrow oedema (BME) and osteitis) have been selected in the ASAS criteria, structural lesions (erosions, fatty lesions, sclerosis, ankylosis) are also visible on MRI of the sacroiliac joints (MRI-SIJ), and have been described by the ASAS/OMERACT MRI group.6 The European League Against Rheumatism recommends taking into account inflammatory lesions as well as structural lesions when diagnosing axSpA.7 Radiographic structural lesions of SIJ are still assessed by the same grading, first described in the New York criteria.8 However, it has been shown to be difficult to identify and discriminate structural lesions on X-SIJ between different observers,9 and training does not seem to improve it.10 These findings were recently confirmed in patients with recent-onset chronic inflammatory back pain (IBP) of the DEvenir des Spondyloarthrites Indifferenciées Récentes; Outcomes in patients with Recent-onset Undifferentiated Spondyloarthritis (DESIR) cohort.11 Furthermore, radiographic sacroiliitis may appear several years after the onset of symptoms,12 delaying the diagnosis of ankylosing spondylitis (AS).13 MRI-SIJ could be an alternative to X-SIJ, especially in early axSpA. It has been recently demonstrated in the DESIR cohort that MRI-SIJ could be reliably used instead of, or in addition to, X-SIJ to assess structural lesions without significant changes in the classification according to ASAS axSpA criteria.1 Previous studies have shown a good reliability of MRI-SIJ to detect structural lesions.14–16 However, in these previous studies, MRI-SIJ were scored by trained readers, blinded for clinical information, while in daily practice scoring is performed by local radiologists or rheumatologists, who have access to clinical data. The ability of these local non-trained readers (ie, not specifically trained for the assessment of structural lesions on MRI-SIJ, but trained as a radiologist or rheumatologist) in various centres to detect structural lesions on MRI-SIJ, instead of a centralised reading, is unknown. The main objective of this study was to evaluate the reliability of non-trained investigators in recognising structural lesions on MRI-SIJ of patients with recent inflammatory back pain, suggestive of axSpA in the DESIR cohort, compared to trained central readers. To put the data in context, the agreements between the two central readers are also presented as well as agreements between ‘MRI-SIJ structural lesions’, according to the local reading, and ‘radiographic sacroiliitis’ according to the local and central reading.
This study is a cross-sectional study of baseline data of the multicentre prospective longitudinal DESIR cohort.17
All patients of the DESIR cohort with available MRI-SIJ and X-SIJ at baseline were included. The DESIR cohort has already been described extensively.17 In short, patients aged between 18 and 50 years and suffering from recent (<3 years) and chronic (>3 months) IBP (in the buttocks, lumbar or thoracic spine) fulfilling either the Calin18 or the Berlin criteria19 were recruited in 25 regional centres in France. Symptoms must be suggestive of axSpA according to the local investigator's assessment with a score ≥5/10 (0=not suggestive; 10=very suggestive for axSpA).17 In total, 708 patients were included between December 2007 and April 2010. The baseline database was locked on 30 October 2012. The cohort was approved by an ethics committee and complied with good clinical practices. A detailed description of the organisation of the cohort, centres, protocol and collected data is available at http://www.lacohortedesir.fr/desir-in-english/. This study was approved by the scientific committee of the DESIR cohort. All patients signed informed consent.
Baseline demographic parameters including age and gender, clinical parameters and laboratory test results including acute phase reactants and human leucocyte antigen (HLA)-B27 antigen status were collected using a standardised case report form.
MRI-SIJ assessment were performed on a 1 or 1.5 T MRI machine, using coronal oblique T1-weighted fast spin echo and short τ inversion recovery (STIR) sequences, with 12–15 slices of 4 mm thickness. MRI-SIJ structural lesions were scored first by the local reader and subsequently by two central readers.
The local readers were radiologists or rheumatologists in each centre with possible access to clinical data. They did not participate in any training session and were instructed to score each SIJ in 3 grades, according to the presence/absence of structural lesions (defined as typical sclerosis, erosions, bony bridges or ankylosis). Grade 0 was corresponding to normal, grade 1 to doubtful and grade 2 to definite structural lesions.
The central readers first participated in a calibration session (see below). The scoring was mainly based on a T1-weighted sequence, but the readers had free access to the STIR sequence. The central readers were blinded for clinical and other imaging data and for the local readers' results. We have used the adapted SpondyloArthritis Research Consortium of Canada scoring system by Weber et al20 to assess MRI-SIJ. This method is based on the assessment of lesions (present vs absent) on six consecutive slices through the SIJ, in its cartilaginous compartment (ie, its anteroinferior portion), starting on the slice on which at least 1 cm of the cartilage is visible, from anterior to posterior. Each SIJ is divided into four quadrants. Fatty lesions were only marked present by the readers if appearing with a distinct border and a homogeneous pattern. The readers only marked fatty lesions and erosions present if observed on at least two consecutive slices, resulting in a maximum score of 40 per lesion per patient (20 per SIJ). Ankylosis was considered present if seen on a single slice, with a maximum score of 24 per patient (note that ankylosis always involves two quadrants, either upper iliac and upper sacral or lower iliac and lower sacral quadrants). The presence of sclerosis was not taken into account for this study, since the interest of including sclerosis in structural scores has been debated: most scores did not include it14 ,20–22 and when it has been included, sclerosis usually showed low reliability.23 The mean score of the two central readers was used for each lesion to obtain the central reading score; no adjudicator has been used.
Several definitions were used to define a positive MRI-SIJ for structural lesions by local and central readers, at both the SIJ and patient levels (table 1).
Definition of a positive MRI-SIJ for structural lesions
Concerning the local reading, the MRI-SIJ was considered positive if at least one of the two SIJs was scored ‘doubtful or abnormal’, that is, ‘at least unilateral grade 1’ (definition 1). In sensitivity analyses, we also considered: ‘at least unilateral grade 2’ (definition 2), and ‘bilateral grade 2’ (definition 3) as a positive MRI-SIJ (table 1).
For the central reading, in the absence of a generally accepted definition of a positive structural MRI-SIJ, we considered the following definition: ‘presence of ≥1 erosion or ≥1 fatty lesion or ≥1 ankylosis in at least one SIJ’ (definition 1). We also considered eight additional definitions (table 1).
X-SIJs were performed in the anteroposterior view. Similar to MRI-SIJ, baseline X-SIJ were read by the local investigators, and later by two trained central readers. The procedure has been previously described in detail.11 In short, the local readers scored each SIJ according to a method derived from the modified New York criteria (mNY):24 grades 2 and 3 were pooled in one combined grade 2. Thus, grade 0 was corresponding to normal, grade 1 to doubtful, grade 2 to definite sacroiliitis and grade 3 to ankylosis. The central readers used the mNY criteria scoring method, in which sacroiliitis is defined as grade ≥2 bilaterally or grades 3–4 unilaterally. In case of disagreement, an experienced radiologist served as the adjudicator. We considered several definitions of a positive X-SIJ for local and central readings (table 1).
Definition of a patient with radiographic sacroiliitis
For the local reading, radiographic sacroiliitis was defined as at least one SIJ scored ‘grade 2 DESIR’ (definition 1). As sensitivity analyses, we also considered two other definitions (table 1). For the central reading, sacroiliitis was defined according to the mNY criteria (table 1).
Training of the central readers
The central readers of the MRI-SIJ and X-SIJ have participated in calibration sessions. The calibration process with the same readers has been extensively described previously.11 ,25 After agreements between central readers were moderate to good (κ=0.4–1.0) regarding the various types of structural lesions on MRI-SIJ, the two central readers could start reading all available baseline MRI-SIJ of the DESIR cohort.
Inter-rater agreements were calculated using Cohen's κ,26 ,27 and positive and negative percent agreements (PPA/NPA).28 All κs were interpreted according to the standards proposed by Landis and Koch; values<0 indicated no agreement, 0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial and 0.81–1 almost perfect agreement.27
Central reading was assessed by the mean score of central readers. For sensitivity analyses, we also considered lesions recognised by both central readers and lesions scored by only central reader 1 or only central reader 2.
For the primary objective, agreement was calculated between the local reading and the central reading, which served as an external standard, regarding the different definitions of a positive MRI-SIJ. Next, we investigated which type of lesions (reported by the central reading) contributed most to the global assessment of the local readers (3 definitions). As a sensitivity analysis, agreement at the SIJ level was also assessed. Then we calculated the agreement between central readers regarding the nine definitions of a positive structural MRI-SIJ with regard to the number of each structural lesion (erosion, fatty lesions and ankylosis) with various cut-off values (≥1 to ≥5) based on a patient's level. Prevalence of each type of lesion (score≥1) was also assessed.
Finally, agreement was calculated between the various definitions of a positive MRI-SIJ and a positive X-SIJ, first for the local reading and second for the central reading as the external standard. Sensitivity and specificity of MRI-SIJ compared to X-SIJ were calculated, with the X-SIJ as the external standard.
Analyses were performed using Stata SE V.12.
Complete MRI-SIJ baseline data were available in 664 patients. Their mean age was 34.9 (SD 8.7) years and mean symptom duration 24 (SD 22.3) months; 309 (46.5%) patients were men and 388 (51.4%) were HLA-B27 positive; 409/582 (70.2%) were fulfilling the ASAS criteria for axSpA.25
Prevalence of structural lesions scored by central readers
On the basis of the mean scores of the two central readers, 52.6% of patients had at least one structural lesion including erosions (32.2%), fatty lesions (29.2%) and ankylosis (24.2%) (table 2). When looking at the lesions agreed on by both central readers, 29.1% of patients had at least one structural lesion including erosions (9.3%), fatty lesions (14.6%) and ankylosis (5.2%). For each type of lesion, central reader 1 scored systematically fewer lesions than central reader 2.
Agreement between local reading and central reading in recognising structural lesions on MRI-SIJ.
The κ coefficients, PPA and NPA, between the local reading and the central reading for each definition are shown in table 3 (complete data in supplementary table 1). Local and central readings were concordant for 66.3% of patients according to the presence/absence of abnormalities (definitions 1). Overall, agreements were better for definitions considering at least two or at least three fatty lesions and at least three erosions and/or fatty lesions in central reading and if a more stringent definition was considered in local reading. According to the local reading (definition 1), 35.4% of patients had a positive MRI-SIJ for structural lesions and according to the central reading (definition 1), 52.6% of patients had a positive MRI-SIJ for structural lesions; 8.3% of patients with a positive MRI-SIJ according to the local reading were not recognised by the central reading and 25.5% of patients with a negative MRI-SIJ according to the local reading were marked positive by the central reading.
Agreement between the two central readers for various definitions of a positive MRI-SIJ for structural lesions
Agreement between the two central readers was ‘slight’ to ‘fair’ (κ=0.19–0.29) when considering definitions including at least one structural lesion (fatty lesion, erosion or ankylosis) (definition 1) or only erosions (definitions 2, 3 and 4; table 4). For definitions considering at least two or at least three fatty lesions and at least three erosions and/or fatty lesions, agreement was better but still only moderate (κ=0.44–0.51). With an increasing number of fatty lesions (from ≥1 to ≥5 fatty lesions), agreement increased (κ=0.40–0.59) (supplementary table 4). Conversely, with an increasing number of erosions (from ≥1 to ≥5 erosions), the agreement decreased (κ=0.22–0.07) (supplementary table 5). When considering definition eight including only ankylosis, agreement was ‘slight’.
At the SIJ level, agreements between local and central readings and between the two central readers were similar between the left and right SIJ (data not shown).
Comparing local reading and each central reader separately, central reader 1 had lower agreement for definitions 3 and 4 (≥2 or ≥3 erosions) than the mean of both central readers (supplementary tables 2 and 3).
Agreement between a positive MRI-SIJ for structural lesions by local Reading and radiographic sacroiliitis
In total, 649 patients had complete data for this analysis (ie, local reading of MRI-SIJ and local and central readings of X-SIJ). About 21.1% of patients had AS according to mNY criteria in the central reading. Compared to the radiographic local reading, agreements were ‘moderate’ at best (from 0.38 to 0.55) when considering the two less stringent radiographic definitions (definitions 1 and 2) (table 5). Compared to the radiographic central reading, at the patient level, κs were moderate for the various MRI-SIJ definitions (from 0.40 to 0.51). In both comparisons, MRI-SIJ sensitivity was better for a less stringent MRI-SIJ definition and specificity was better for a more stringent MRI-SIJ definition.
In patients with recent chronic IBP suggestive of axSpA, the agreement between readings of structural lesions on MRI-SIJ by two trained central readers and by local readers is ‘moderate’ at best. The agreement between local readers and the various definitions of the central reading were always low (‘slight’ or ‘fair’) except for the definitions considering ‘2’ or ‘3’ fatty lesions or ‘at least 3 erosions and/or fatty lesions’, where the agreements reached a ‘moderate’ level. These results show that the reliability of the routine reading, in this context, is overall not good. Although local readers did not independently score each type of lesion, these results also suggest that local readers may recognise fatty lesions better than other types of structural lesions. However, we have to keep in mind that local and central scorings were different, which may have impacted the assessment of agreement.
Agreement between central readers is overall not very different from agreement between local and central readings. Surprisingly, a ‘fair’ agreement between the two central readers was found for the first definition, although it was the loosest definition (‘presence of at least one erosion or at least one fat deposition, or presence of partial or total ankylosis’) and the lowest agreement was found for definition 8 regarding the presence of ankylosis. Despite the calibration sessions of the readers, reader 1 systematically scored fewer lesions than reader 2, in particular with regard to erosions. Thus, reader 1 was more specific and reader 2 more sensitive, leading to a low agreement between the two readers. Reader 2 may have considered physiological abnormalities as structural lesions (especially for erosions), which might question the validity of the calibration process. However, the same two central readers also scored BME lesions in the DESIR cohort with a similar calibration method and obtained a much better κ (0.73).25 Moreover, despite this limitation, considering the agreements between local reading and each central reader separately, the agreement for erosions was better with reader 2 than with reader 1 and agreement for fatty lesions was similar. This suggests that the fairly low agreement may be rather due to the general difficulty in distinguishing structural lesions from physiological abnormalities and not (only) to the calibration and reading methods. Moreover, global agreement for structural lesions (including erosions, fatty lesions and ankylosis) seems better in studies including a high proportion of patients with AS (κ=0.84),14 or when the median symptoms duration is longer (κ=0.76–0.80).29 Puhakka et al23 included patients with a short symptom duration (median disease duration=19 months) excluding patients with AS, and found an agreement similar to our study (κ=0.42).
Contrary to erosions and ankylosis, fatty lesions contributed most to a better agreement between central readers and between central and local MRI readings. The consideration of at least two or, even better, three fatty lesions increased the reliability between readers. Those results are in agreement with the findings of a recent study30 which defined a cut-off of at least three fatty lesions or 5 fatty lesions and/or erosions to define a positive MRI-SIJ with <5% of patients without axSpA fulfilling this definition. In previous studies, agreement seemed usually better for erosions in AS and for fatty lesions in recent axSpA.14 ,16 ,23 ,31 ,32 Weber et al33 suggested that assessment of fatty lesions in the SIJ may have a diagnostic utility in early axSpA with doubtful erosions or BME, but that only fatty lesions with a distinct border or a homogeneous pattern (and not subchondral lesions) should be considered. In our study, we only marked fatty lesions if there were a distinct border and a homogeneous pattern. Recently, a T1-weighted opposed-phase gradient-echo (opGE) sequence has shown promising results and may improve reliability in detecting erosions.34
Prevalence of structural lesions was low in our study compared to most of the previously published studies. This difference can be partly explained by the lower proportion of patients with AS in our study (21.1% according to mNY criteria by central readers, and 70.2% of axSpA according to the ASAS criteria25), whereas other studies included 60.7–73.5% of patients with AS.15 ,32 ,35 As expected, the prevalence of structural lesions is higher in patients with AS than in non-radiographic patients with axSPA.29 ,30 ,32 Part of these results can be explained by the pathophysiology of axSpA: fatty lesions appear to follow resolution of inflammation and of erosions in axSpA.36 ,37
Previous studies have compared MRI central readings to X-ray central readings with heterogeneous results (Se=49–84% and Sp=61–98%).12 ,19 Agreements between MRI-SIJ local readings and X-SIJ local readings and between MRI-SIJ local readings and X-SIJ central readings in our study were moderate. These results confirm that structural lesions are difficult to assess by MRI in clinical practice (local reading) when compared to the gold standard (X-SIJ central reading) and to what is currently performed in clinical practice (local X-SIJ reading). However, the low reliability of X-SIJ readings, which has been confirmed in the DESIR cohort,11 may interfere with the current analysis. Furthermore, this analysis was limited by the use of non-validated definitions for the X-SIJ local reading.
A limitation of this study was the absence of comparison with CT. CT is known to be reliable for the assessment of the bony structures, including erosions, sclerosis and ankylosis and is a better external standard than radiographs.38 ,39 However, it has been chosen in the DESIR cohort not to add CT scan because of time, costs and irradiation dose.
It should also be noted that both local and central readers had access to the STIR sequences and that the presence of BME lesions may have biased their scoring by influencing the global appraisal. However, this is also the situation in which the MRI-SIJ will be used in clinical practice. Moreover, the local readers also might have had access to the radiographs while scoring the MRI-SIJ, while the central readers were completely blinded for the other imaging modality.
Finally, this study may have been limited by the restriction of structural lesions scoring to their presence on two consecutive slices (which is the rule for BME lesions), contrary to previous studies in which their presence on a single slice was often sufficient. This may have hampered the comparison with previous studies.
This study also has several strengths. The DESIR cohort has included a large number of patients in whom a standardised MRI-SIJ was systematically performed at baseline. Previous studies reporting on this topic mostly included <100 patients with axSpA.14 ,16 ,23 ,29 ,31 ,40 The DESIR patients had recent disease onset with symptoms suggestive of axSpA at baseline, but the diagnosis was not necessarily confirmed. The assessment of structural lesions on MRI-SIJ is of particular interest in this type of patients to assist in diagnosing axSpA. Results from previous studies assessing structural lesions in patients with a confirmed diagnosis and long-standing disease14 ,29 ,31 cannot be extrapolated to this category of patients.
In conclusion, the reliability of non-trained readers as well as trained readers to recognise structural lesions on MRI-SIJ in patients with recent IBP suggestive of SpA is overall ‘fair’ to ‘moderate’. Fatty lesions may contribute most to a positive structural MRI-SIJ assessment by non-trained and trained readers.
The DESIR cohort is conducted under the control of Assistance Publique–Hôpitaux de Paris via the Clinical Research Unit Paris-Centre and under the umbrella of the French Society of Rheumatology and INSERM (Institut National de la Santé et de la Recherche Médicale). The database management is performed within the department of epidemiology and biostatistics (Professor Jean-Pierre Daurès, D.I.M., Nîmes, France). The authors wish to thank the 25 participating centres and their investigators: Pr Maxime Dougados (Paris—Cochin B), Pr André Kahan (Paris—Cochin A), Pr Olivier Meyer (Paris—Bichat), Pr Pierre Bourgeois (Paris—La Pitié-Salpetrière), Pr Francis Berenbaum (Paris—Saint Antoine), Pr Pascal Claudepierre (Créteil), Pr Maxime Breban (Boulogne-Billancourt), Dr Bernadette Saint-Marcoux (Aulnay-sous-Bois), Pr Philippe Goupille (Tours), Pr Jean-Francis Maillefert (Dijon), Dr Xavier Puéchal (Le Mans), Pr Daniel Wendling (Besançon), Pr Bernard Combe (Montpellier), Pr Liana Euller-Ziegler (Nice), Pr Philippe Orcel (Paris—Lariboisière), Pr Pierre Lafforgue (Marseille), Dr Patrick Boumier (Amiens), Pr Jean-Michel Ristori (Clermont-Ferrand), Dr Nadia Mehsen (Bordeaux), Pr Damien Loeuille (Nancy), Pr René-Marc Flipo (Lille), Pr Alain Saraux (Brest), Pr Corinne Miceli (Le Kremlin Bicêtre), Pr Alain Cantagrel (Toulouse), Pr Olivier Vittecoq (Rouen).
- Received April 25, 2016.
- Revision received September 8, 2016.
- Accepted September 12, 2016.
CJ and RRV contributed equally to this study.
Twitter Follow Damien Loeuille at @no
Contributors PCP drafted the study design. CJ drafted the manuscript with important contributions from PCP, DvdH, RvdB and RRV. RRV and RvdB analysed and interpreted the data. RvdB, GL, FT and MR read the X-ray and/or MRI. AR, DL, AF, MD and PCP participated in the data collection. All authors approved the final manuscript.
Funding This specific work has not been funded, but the DESIR-cohort is financially supported by unrestricted grants from both the French Society of Rheumatology and Pfizer France. Neither funding source had any role in designing the study; collecting, analysing or interpreting the data; writing the manuscript; or deciding to submit the manuscript for publication.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.