Article Text

Download PDFPDF

Original article
Validity of the rheumatoid arthritis MRI score applied to the forefeet using the OMERACT filter: a systematic literature review
  1. Yousra J Dakkak1,
  2. Désirée M van der Heijde1,
  3. Monique Reijnierse2 and
  4. Annette H M van der Helm-van Mil1
  1. 1 Department of Rheumatology, Leiden University Medical Centre, Leiden, The Netherlands
  2. 2 Department of Radiology, Leiden University Medical Centre, Leiden, The Netherlands
  1. Correspondence to Yousra J Dakkak; y.j.dakkak{at}


Objective MRI depicts inflammation and structural damage in rheumatoid arthritis (RA). The validity of MRI-scoring of wrist-joints and metacarpophalangeal-joints according to the RA MRI score(RAMRIS) has been demonstrated. The Outcomes in Rheumatology Clinical Trials (OMERACT) RAMRIS Working Group recently called for validation of the RAMRIS of the metatarsophalangeal (MTP)-joints. Therefore, a systematic literature review was performed to test if the RAMRIS applied to the MTP-joints meets the OMERACT Filter of Truth, Discrimination and Feasibility.

Methods Medical literature databases up to January 2018 were systematically reviewed for studies reporting on RAMRIS applied to MRI of the MTP-joints in RA. To be included, an article had to contain at least one MRI-feature (synovitis, bone marrow oedema (BME), tenosynovitis, erosion, joint space narrowing (JSN)) and one item from the OMERACT Filter: Truth (face, content and construct validity), Discrimination (test-retest reliability, ability to discriminate in trials, longitudinal construct validity and thresholds of meaning) and Feasibility.

Results Of the 749 retrieved studies, 13 were included, of which 9 provided data on construct validity, 4 on discrimination (3 on reliability, 2 on longitudinal construct validity and 1 on ability to discriminate in trials) and none on feasibility. Construct validity was suggested for BME and erosions, but lacking for synovitis, tenosynovitis and JSN. Data for discrimination remain to be developed for all outcomes.

Conclusion According to the OMERACT Filter, the validity of the RAMRIS of the forefeet is insufficient in different aspects. A research agenda was determined.

  • rheumatoid arthritis
  • magnetic resonance imaging
  • forefoot
  • systematic review

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

View Full Text

Statistics from

Key messages

What is already known about this subject?

  • The rheumatoid arthritis (RA) MRI score (RAMRIS) has been developed and validated for scoring MRI of the metacarpophalangeal-joints and wrists for research purposes.

  • The validation of the RAMRIS applied to the forefeet is unknown.

What does this study add?

  • Current evidence on the RAMRIS applied to the forefeet was insufficient according to the Outcomes in Rheumatology Clinical Trials criteria.

How might this impact on clinical practice?

  • A research agenda was determined.


In rheumatoid arthritis (RA), conventional radiographs are traditionally used to assess structural damage as an outcome measure for the efficacy of treatment in trials. However, radiographs do not assess disease activity, and with improving treatment strategies, structural damage has become less common.1 Therefore, in recent years MRI is increasingly used in studies, as it detects damage with greater sensitivity than radiography and in addition provides information about joint inflammation.

As part of The Outcomes in Rheumatology Clinical Trials (OMERACT), an RA MRI Working Group was set up that developed an RA MRI scoring system (RAMRIS) to standardise MRI-scoring for research-purposes.2 This working group prioritised the wrist and metacarpophalangeal-(MCP)-joints due to their frequent involvement in RA and the large amount of MRI data on these joints.2 However, studies in early RA have revealed MRI findings to be as prevalent in the metatarsophalangeal-(MTP)-joints as in the MCPs and wrists.3–5 In addition, radiographic studies have shown that erosive change occurs more commonly in the feet than in the hands and also earlier.6–8 The RAMRIS has not been described for the MTPs by the working group. There seems to be a paradox between the notion that the feet are commonly affected in RA, and the priority given to MRI studies of hands over feet. The MRI in arthritis working group has recognised this, as their recently published updated recommendations call for validation of the existing RAMRIS in other joints than the hands, such as the MTPs.9

The OMERACT-group has developed a Filter that provides guidelines for the development and validation of outcome measures for use in clinical trials.10–12 This systematic literature review set out to assess whether the existing RAMRIS, when applied to the MTP-joints in patients with RA, meets the OMERACT Filter 2.1 of Truth, Discrimination and Feasibility. More specifically, it was aimed to answer the question if there is currently enough evidence to conclude that applying the RAMRIS to MRI-scoring of the MTP-joints is a valid outcome measure in the assessment of joint inflammation and structural damage in RA, and subsequently to identify a research agenda for OMERACT criteria that still require studying for the MTP-joints.


Identification of studies

In cooperation with a medical librarian (JWS), a systematic literature search was performed to obtain all manuscripts reporting on studies that perform MRI of the MTP-joints in patients with RA. Medical literature databases (PubMed, Embase, Web of Science, COCHRANE, Emcare, Academic Search Premier and ScienceDirect) were searched up to January 2018, using all variations of the following key words: ‘foot’, ‘rheumatoid arthritis’ and ‘MRI’ (see online supplementary file 1 for the exact search strings). Animal studies, reviews, abstracts and letters to the editor and in languages other than English were excluded from the search.

Inclusion criteria

All retrieved titles were screened, if deemed relevant abstracts were reviewed and finally full text articles were read by one reviewer (YJD). A random sample of 75 titles (10% of the titles identified by the literature search) was also reviewed by a second reviewer (AHM), resulting in a similar selection of titles. Therefore, further extraction was done by a single reviewer. In case of uncertainties in the reviewing process by the single reviewer, these were discussed and solved by consensus with AHM. Retrieved studies that reported on diseases other than RA, that were not scored according to the RAMRIS applied to the foot or that did not analyse data of the MTP-joints separately were excluded.

Studies reporting on the RAMRIS-measures in the MTP-joints in patients with RA were evaluated and categorised according to the following OMERACT Filter items: Truth (subdivided into face, content and construct), Discrimination (subdivided into test-retest reliability, ability to discriminate in randomised controlled trials (RCTs)/comparative studies, longitudinal construct validity and thresholds of meaning) and Feasibility, of which a detailed description is provided elsewhere.10 11 Face and content validity of the Truth aspect were not part of the systematic literature search, as they are considered subjective measures and in the past have been assessed by consensus.11 Hence, the assumption was made that MRI captures the intended pathophysiological feature for synovitis, bone marrow oedema (BME), erosions and joint space narrowing (JSN). For face and content validity of tenosynovitis on the other hand, anatomic atlases were explored to study whether the tendons have a sheath, and thus whether MRI is able to capture this intended pathophysiological feature. Per OMERACT-item a PICO (participants, interventions, comparisons, outcomes) was made, this resulted in the following inclusion criteria per OMERACT-item:

  • Construct validity was evaluated in studies comparing one or more RAMRIS-features to conventional radiography, ultrasound, CT, histology, physical examination and symptoms. Both cross-sectional and longitudinal studies were included.

  • Test-retest reliability was evaluated in studies comparing one or more RAMRIS-features in the same patients, either by the same performer over time or by different performers at the same time point. Both cross-sectional and longitudinal studies were included.

  • Ability to discriminate in RCTs/comparative studies (sensitivity to change) was evaluated in studies that measured change of one or more RAMRIS-features over time, either in RCTs if available or from non-randomised studies or cohorts.

  • Longitudinal construct validity (responsiveness) was evaluated in studies that measured change in one or more RAMRIS-features compared with change in other instruments like conventional radiography, ultrasound, CT, histology, physical examination and symptoms. This differs from ‘Construct validity’ by the fact that it includes MRI at different time points rather than at one point in time.

  • Thresholds of meaning was evaluated in studies that rate patients with a certain response to therapy (eg, minimum important difference or patient acceptable symptom state) as measured by a change in one or more RAMRIS-features.

  • Feasibility was evaluated in studies describing the cost, patient and responder burden, equipment needs, sensitivity of content and overall ease of use of one or more RAMRIS-features. Both cross-sectional and longitudinal studies were included.

Studies that fulfilled the requirements for at least one of these items were included in this review.

After screening and including, next the articles were weighted by two reviewers (YJD and AHM) according to predefined criteria. First, to select MRI-data with high quality, the MRI had to be performed on ≥1.5 Tesla (T) MRI and to use gadolinium contrast-enhancement (CE) for synovitis and tenosynovitis. Second, the results were further weighted by the presence of replication studies. For the final conclusion, the validity of an OMERACT criterion was considered to be suggested if there were ≥2 studies with high-quality MRI-data of which ≥2/3rd had a uniform conclusion (meaning that the associations found in the studies point in the same direction of either a positive or a negative/absence of any association), this type of validity was categorised as ‘++’ or – –’ depending on the directionality of the effect. If only one article was available with regard to the type of validity, then according to the direction of the association, it was categorised as ‘+’ or ‘–’, and the conclusion was drawn that limited data possibly suggested the validity to be present but more studies were needed for a more definite conclusion. If the high-quality MRI-articles were not uniform in their conclusion, or if articles were available but the MRI-data was not deemed to be of high quality, then it was categorised as ‘+/-,’ meaning data were available but insufficient for a conclusion. If there were no articles available at all, then it was categorised as ‘?’, meaning that there were no data available.

The MRI-features studied were synovitis, BME and tenosynovitis as measures of inflammation, and erosions and JSN as measures of structural damage. An updated RAMRIS has recently been published that now besides synovitis, BME and erosions also includes tenosynovitis and JSN.9 Considering the short time interval between this systematic literature search and the publication of this updated RAMRIS definition, the systematic search included the updated RAMRIS and the old RAMRIS-definition for synovitis, BME and erosions, for tenosynovitis the score according to Haavardsholm et al and for JSN the score that was proposed by Ostergaard.2 13 14

Statistical analyses

Due to the heterogeneity of the studies and the difference in outcome measures that were used, it was not possible to perform a meta-analysis. Therefore, we chose to perform a descriptive review.


Literature flow

After removing duplicate references, 749 unique references were identified (figure 1). After reviewing 115 abstracts and 41 full-text articles, 13 articles were included in the review. Of the included studies, nine fulfilled the inclusion criteria for evaluation of the construct aspect of Truth,15–23 four on Discrimination (of which three on test-retest reliability,5 24 25 two on longitudinal construct validity5 25 and one on the ability to discriminate26) and none for Feasibility.

Figure 1

Overview of literature research.

Study characteristics

The 13 included studies are depicted in tables 1–3, grouped according to the category of the OMERACT Filter that is addressed: table 1 addresses articles on construct (aspect of Truth), table 2 on test-retest reliability (aspect of Discrimination) and table 3 on longitudinal construct validity and clinical trial discrimination (also aspects of Discrimination). These tables also depict the studies that fulfilled the inclusion criteria but were considered to have lower MRI-quality as described in the methods and thus were not part of the final conclusions drawn. Overall, the studies included a small number of patients, with a median number of 39. There were no RCTs, nine studies were cross-sectional and four longitudinal. MRI was most often performed on a 1.5 T scanner (n=7).

Table 1

Summary of previous studies on ‘Truth’: construct

Table 2

Summary of previous studies on ‘discrimination’: test-retest reliability

Table 3

Summary of previous studies on ‘discrimination’: longitudinal construct validity and clinical trial discrimination

The OMERACT Filter was applied to each MRI-measure separately. These results will be discussed per MRI-feature and are summarised in table 4.

Table 4

Validity of MRI measures of the forefeet in their assessment of inflammation and structural damage in RA


The RAMRIS defines MRI-detected synovitis as an area in the synovial compartment that shows above-normal postgadolinium enhancement of a thickness greater than the width of the normal synovium.2 As specified in the methods, face and content validity for MRI-detected synovitis were considered to be evident (‘++’ in table 4).

Regarding construct validity, three studies were deemed to have MRI-data of high quality.16 20 22 Synovitis was associated with plantar plate pathology, which represents a failure of the ligamentous system and displacement of the plantar plates of the forefoot leading to malformation.22 When patients were followed for up to 24 months, MRI-detected synovitis was not associated with radiographic damage.20 Correlations were found between MRI-detected synovitis in MTP-1 and a decrease in movement of the same joint.16 Although this is a positive correlation, the outcome of decreased range of motion in MTP-1 is not typical for RA, but is affected in for instance osteoarthritis. Therefore, this study was not considered in the final conclusion. Overall, one out of two studies found a positive association, therefore no uniform conclusion can be drawn on construct validity for MRI-detected synovitis (‘+/-’ in table 4).

Test-retest reliability has been confirmed in one study that has shown excellent inter-reader and intrareader reliability for scoring synovitis (‘+’).24 For longitudinal construct validity and the ability to discriminate in trials, the two studies available did not meet the quality criteria for MRI (‘+/-’).5 26 There were no data regarding thresholds of meaning or feasibility (‘?’).

Bone marrow oedema

Also referred to as osteitis, BME is defined as a lesion within the trabecular bone, with ill-defined margins and signal characteristics consistent with increased water content.2 Again, face and content validity were deemed present (‘++’).

Construct validity was examined in six studies that met the quality criteria for the MRI-protocol.15 18–22 MRI-detected BME was associated with clinical swelling, with pain and CRP and with plantar plate pathology.18 19 22 When MRI-detected BME in MTP-joints was assessed histologically, it was associated with the severity of osteitis.19 One study found an association for BME with the development of MRI-detected erosions a year later and reported that erosions were unlikely to develop in the absence of BME.20 Two studies did not find an association of BME with Disease Activity Score of 28 joints (DAS-28), Health Assessment Questionnaire (HAQ) or walking disability.15 21 Overall, four out of six studies found a positive association, therefore the evidence suggested construct validity to be present (‘++’).

Test-retest reliability has been investigated in one study that proved good inter-reader and intrareader reliability (‘+’).24 For the ability of MRI to discriminate in trials and for longitudinal construct validity, only studies of lower quality were available (‘+/-‘).5 26 There were no data on thresholds of meaning or feasibility (‘?’).


Tenosynovitis is defined as tendon sheath fluid, sheath thickening and enhancement after intravenous contrast injection seen in two consecutive axial slices.13 Thus, for face and content validity, it is essential for the imaged tendon to have a sheath. The anatomy books that were consulted gave contradicting information. Regarding the extensor tendons some portrayed a sheath to be absent, but other anatomic atlases did not portray the extensor tendons at the MTP-region. At the flexor tendons of the MTP-joints some resources portrayed a sheath to be present, some did not or in some cases the region of interest was not depicted.27–30 Thus, it remains unclear whether a sheath is present around the tendons, and if so what its exact localisation is. This is relevant, as it questions the nature of the inflammation observed. Thus, face and content validity of tenosynovitis at the level of MTP-joints was absent (‘?’). In addition, no studies concerning tenosynovitis at the MTP-joints have been performed to date (‘?’ for all OMERACT-elements).


MRI-detected bone erosions are defined as a sharply marginated bone lesion, with correct juxta-articular localisation and typical signal characteristics visible in two planes with a cortical break in at least one plane.2 Face and content validity for erosions were assumed to be present (‘++’).

Five studies that met the criteria for the MRI-protocol were assessed for construct validity.15 16 20–22 MRI-detected erosions were associated with plantar plate pathology and with decreased movement of MTP-1.16 22 Also, MRI-detected erosions were associated with the development of radiographic erosions 1 year later, and in the absence of MRI-detected erosions radiographic erosions were unlikely to occur.20 Two studies showed no correlation of MRI-detected erosions with DAS-28, HAQ or with walking disability.15 21 Overall, three out of five studies found a positive association, therefore the evidence suggested construct validity to be present (‘++’).

With regard to Discrimination, reliability was excellent in one study showing high inter-reader and intrareader reliability (‘+’).24 For longitudinal construct validity and ability to discriminate, the found studies did not meet the criteria on the MRI-protocol (‘+/-’).5 25 26 Thresholds of meaning and feasibility have not been studied (‘?’).

Joint space narrowing

JSN is defined as reduced joint space width compared with normal, as assessed in a slice perpendicular to the joint surface.14 There is no reason to believe that face and content validity for JSN in the forefeet are not applicable (‘++’). As of yet, there are no studies on JSN in the forefeet in RA, therefore further validation is required (‘?’).


This is the first SLR to assess the status of the development or application of the RAMRIS for the feet. From the limited evidence available, a foot RAMRIS-score would be useful to evaluate the impact of inflammation and damage in the feet on long-term outcomes. Based on the results, as presented in table 4, construct validity is suggested for BME and erosions, but lacking for synovitis, tenosynovitis and JSN. Data for discrimination remain to be developed. A research agenda was formulated for the further evaluation of the validity of MRI of the forefoot, as presented in box 1.

Box 1

Research agenda for the further validation of MRI of the forefeet

  • Evaluate whether MRI of the metatarsophalangeal (MTP)-joints performs better than MRI of the metacarpophalangeals (MCPs) and wrists in predicting radiographic and/or functional outcomes, and if it could be used solely.

  • Study whether MRI of the MTP-joints has additional value to MRI of the MCPs and wrists in predicting radiographic and/or functional outcomes.

  • Assessing the value of MRI of MTP-1.

  • Determine the longitudinal construct validity, clinical trial discrimination and thresholds of meaning for all MRI measures.

  • Evaluate the feasibility for all MRI measures.

  • Assess the anatomy of sheaths of tendons adjacent to MTP-joints for face and content validity.

  • Assess the association with clinical parameters for tenosynovitis and joint space narrowing for construct validity.

  • Assess the predictive value of all MRI features with radiographic and/or functional outcomes.

  • Assess the test-retest reliability of tenosynovitis and joint space narrowing and replicate this for synovitis, bone marrow oedema and erosions.

  • Assessment of validity in early rheumatoid arthritis (RA) versus established RA.

In contrast to previous OMERACT Filters, Filter 2.1 that was applied in this review does not distinguish between construct and criterion validity. Criterion validity assumes that the comparator instrument is a ‘gold standard’, for example, histology for MRI-detected BME.10 In addition, previous studies have subdivided criterion validity further into concurrent validity and predictive validity, where concurrent validity assumed that the comparator instrument is a gold standard and predictive validity looked at a later status of, for example, a radiographic or functional outcome.31 For the current study, all these different types of measures were taken together as ‘construct’. As a result, there were sufficient studies to conclude that construct validity was present BME and erosions. However, when one, for example, looks at only predictive validity, data are actually insufficient to draw conclusions on this aspect, as only one of the nine studies included in construct looked at the predictive value.20 This was taken into consideration when formulating the research agenda. Determining the possible added value of MRI of the MTP-joints to that of MRI of the wrist and MCPs, and assessing the value of MRI of MTP-1 are also subjects for future research.

When studying the predictive accuracy of forefoot MRI in future studies, it might be relevant to include data from healthy controls in the definition of disease-related MRI-features. Previous studies demonstrated that inclusion of information obtained from symptom-free persons from the general population increased the specificity and accuracy of MRI and that this finding also applies to the use of MRI of the MTP-joints.32 33

An updated version of the RAMRIS was recently published that now includes a definition of JSN and tenosynovitis to previous versions in addition to the previous version that only included BME, synovitis and erosions. For tenosynovitis the updated RAMRIS shows subtle differences compared with the score developed by Haavardsholm et al that is generally used in MRI of wrist and MCPs.9 13 No validity data on the updated RAMRIS are published to date. However, as the definition of synovitis, BME and erosions did not change, it is assumed that the observed results are also valid for the updated RAMRIS. No conclusions on the validity of tenosynovitis and JSN in the forefeet according to the older definitions or according to the updated RAMRIS could be drawn as there were no data available. Besides adding tenosynovitis and JSN, the updated RAMRIS also included novel recommendations on the scan-protocol, such as a slice thickness of ≤2 mm in high-quality MRI units. However, it also states that the recommendations are not intended to be exclusive, but rather to provide common standards. In this review, the quality of the MRI-protocol was only weighted regarding field strength and CE, more thorough details such as slice thickness and plane of scanning were not included in the evaluation.

The 13 included studies were heterogeneous in various aspects, such as the scan-protocols that were used and the type of patients with RA that were included (early vs established RA). Most studies mention the use of a coil, varying from dedicated extremity coils19 to knee or wrist coils.17 While most studies scanned all MTP-joints, some made a selection.16 20 22 Not all studies used CE, potentially decreasing the reliability of synovitis scores.34 Different field strengths were used, with a wide range of 0.2–3 T, of which 1.5 T was most common. Lower field MRI<1.5 T generates lower imaging quality, which may influence the interpretability of the data. Therefore, for the final conclusions of this study, included studies were weighted depending on the use of CE for synovitis, an MRI field strength of ≥1.5 T and the presence of replication studies. The importance of the presence of replication studies is underlined by the small number of patients included in the studies. Although these weighing criteria (summarised in table 4) are arbitrary, they served to give a more critical appraisal of the data and enhance the interpretability of the results. Future studies should focus on ACR core set measures for their outcomes to make comparison of studies more attainable. Finally, studies showing a negative correlation could potentially have remained unpublished leading to a reporting bias that may have affected the results of this review.

In conclusion, although for synovitis, tenosynovitis and JSN data were lacking, for MRI-detected BME and erosions, truth of the RAMRIS of the forefeet was suggested by the data. In contrast, studies on discrimination and on feasibility are needed or require validation. Thus, the validity of applying the RAMRIS to the forefeet still is insufficient on different aspects. Awareness of the gaps in the OMERACT Filter criteria prior to including MRI of the MTP-joints in the RAMRIS and its implementation in trials is essential for optimal interpretation of results obtained.


The authors thank JW Schoones (JWS) for performing the systematic literature search.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.
  25. 25.
  26. 26.
  27. 27.
  28. 28.
  29. 29.
  30. 30.
  31. 31.
  32. 32.
  33. 33.
  34. 34.
View Abstract


  • Contributors All authors made a substantial contribution to the conception and design of the work. YJD and AHMvdHvM drafted the manuscript. DMvdH and MR revised the manuscript critically for important intellectual content. All authors approved the final version of the manuscript.

  • Funding This work was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Starting grant, agreement No. 714312) and by the Dutch Arthritis Foundation.

  • Disclaimer The funding sources had no role in the design and conduct of the study; collection, management, analysis and interpretation of the data; preparation, review or approval of the manuscript or decision to submit the manuscript for publication.

  • Competing interests None declared.

  • Patient consent Not required.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.