Introduction Use of prediction matrices of risk or rapid radiographic progression (RRP) for early rheumatoid arthritis (RA) in clinical practice could help to better rationalise the first line of treatment. Before use, they must be validated in populations that have not participated in their construction. The main objective is to use the ESPOIR cohort to validate the performance of 3 matrices (ASPIRE, BEST and SONORA) to predict patients at high risk of RRP at 1 year of disease despite initial treatment with methotrexate (MTX).
Methods We selected from the ESPOIR cohort 370 patients receiving MTX or leflunomide (LEF) for ≥3 months within the first year of follow-up. Patients were assessed clinically every 6 months, and structural damage progression seen on radiography was measured by the van der Heijde-modified Sharp score (vSHS) at 1 year. RRP was defined as an increase in the vSHS≥5 points during the first year.
Results At 1 year, the mean vSHS score was 1.7±5.0 and 46 patients had RRP. The ASPIRE matrix had only moderate validity in the ESPOIR population, with area under the receiver operating characteristic curve (AUC) <0.7. The AUC for the BEST and SONORA matrices were 0.73 and 0.76. Presence of rheumatoid factor (RF)—or anti-citrullinated protein antibodies (ACPAs) and initial structural damage were always predictive of RRP at 1 year. Disease Activity Score in 28 joints (DAS28) and C reactive protein (ASPIRE threshold) were not associated with RRP.
Conclusions Matrices to identify patients at risk of RRP tested in the ESPOIR cohort seem to perform moderately. There is no matrix that shows clearly superior performance.
Statistics from Altmetric.com
What is already known about this subject?
Rapid radiographic progressor matrices can identify early rheumatoid arthritis patients with high risk of structural damage progression despite initiation of methotrexate.
What does this study add?
The present study validates the good performance of the 3 matrices (BEST, SONORA and ASPIRE) in another population of early RA patients.
How might this impact on clinical practice?
Rapid radiographic progression matrices are valuable tools to optimise the care of early RA patients by choosing more aggressive therapy than MTX alone.
During the past decade, the standard of care for early rheumatoid arthritis (RA) has evolved greatly, combining early referral to the rheumatologist to hasten the RA diagnosis,1–3 initiation of rapid disease-modifying anti-rheumatic drugs (DMARDs) within the ‘window of opportunity’1 ,2 and tight control of disease activity based on regular DMARD adaptation according to the ‘treat to target’ strategy.2 ,4–6
The choice of the first DMARD may be important for RA prognosis and has been addressed by many trials and guidelines. Synthetic DMARDs (sDMARDs) such as methotrexate (MTX) or leflunomide (LEF) are the most recommended as ‘anchor’ therapies because they can be associated with other synthetic or biological DMARDs (bDMARDs) according to the response to initial monotherapy.2 ,7–9 Additionally, some studies suggested that use of sDMARDs tritherapy or bDMARDs as first-line agents could be efficacious to rapidly achieve remission and block structural damage.2 ,9–13 However, safety concerns and the high costs of biologics appear to limit their use for early RA. Several economic evaluations reported that incremental cost-effectiveness ratios of biologics as first-line treatment for early RA are usually high and largely surpass the generally accepted thresholds.14 ,15
If biological agents are not recommended for all patients with early RA, they may still be of interest for patients with poor prognostic factors and features of severe disease, for whom MTX is most likely insufficient.2 ,5 Several trials have shown substantial structural progression in a small subset of patients despite rapid initiation of MTX therapy, which led to the development of the concept of rapid radiographic progression (RRP) to identify such patients. RRP is defined as structural damage progression by an increase in the van der Heijde-modified Sharp score (vSHS) ≥5 points over 1 year; the cut-off of 5 points corresponds to the destruction of one small joint and to the usually reported smallest detectable difference (SDD).16–18 The rationale for RRP patient identification has been validated in two different studies. In the BEST trial, patients with RRP during the first year of follow-up showed increased functional limitations and structural damage progression over 8 years of follow-up, despite a tightly controlled therapeutic strategy.18 These results were consistent with another study of the ESPOIR cohort, in which patients with RRP during the first year in the cohort—with a definition slightly different from the previous one—showed increased structural damage progression during the second and third years in the cohort.19
Patient risk stratification with regard to RRP has become highly important and has been addressed by the development of prediction matrices to quantify the risk of RRP at 1 year of follow-up. RRP matrices, based on the same methodology used in cardiology or osteoporosis,20 ,21 were developed within randomised control trials (RCTs)16 ,22 ,23 or observational cohorts24–26 and have involved different sets of baseline characteristics to calculate, at the patient level, the probability of a given patient showing RRP on a 1 year radiograph despite MTX therapy (ie, the probability of being a radiographic MTX inadequate responder). The matrices differ by the nature of the components they use, their thresholds and how they take into account structural damage. These tools have rarely been validated in populations different from those for which they were developed. The only validations were performed with the BRASS cohort, which included patients with established RA,27 and in a cohort of patients with DMARD-naïve early RA recruited at the Department of Rheumatology of the University Hospitals Leuven.25 ,28 The substantial differences between established and early RA resulted in low performance of the three tested matrices (ASPIRE, BEST and SWEFOT). Therefore, additional validation was needed (bigger cohort, extra matrix—SONORA), which is possible with the ESPOIR cohort. The ESPOIR cohort enrolled patients with early arthritis from the community (with or without unfavourable prognostic factors), most receiving MTX or LEF as first-line treatment.
In this study, we aimed to test the performance and validity of the different published RRP prediction matrices in patients with early RA from the ESPOIR cohort.
We aimed to assess the performance of different prediction matrices that were developed to identify patients with early RA at substantial risk of rapid structural damage progression, defined as an increase in the vSHS≥5 points between baseline and the 1 year follow-up visit.
Previously published matrices
The patient characteristics in the tested matrices are in online supplementary materials—table 1. Three matrices were developed with RCT populations, all testing the efficiency of a combination of MTX+infliximab versus MTX monotherapy in patients with early RA: ASPIRE,16 ,29 SWEFOT23 ,30 and BEST.22 ,31 In these studies, we only used the matrix developed for patients of the MTX arm. All trials used the same definition of RRP: an increase in the vSHS≥5 points within the first year after treatment initiation, which theoretically corresponds to the destruction of one joint based on vSHS scoring17 or also five new small erosions in five different joints, or an association of progression in erosion lesions and joint space narrowing. In addition, we used the SONORA matrix based on data for a North American observational cohort of 994 patients with early RA who received a sDMARD.24 In this study, RRP definition was an increase in the vSHS≥3.54 points, which was the SDD. We also used the original ESPOIR matrix as well as a modified version (mESPOIR matrix) developed with the same methodology.26 Although the original ESPOIR matrix assessed structural damage qualitatively—‘typical RA erosion’, yes or no—the mESPOIR matrix used a quantitative vSHS-based assessment for structural damage, categorised in three classes (<5 points, 5–14 points, >14 points). For these two matrices, RRP was defined as an increase in the vSHS≥5 points.
Characteristics of patients with early rheumatoid arthritis (RA) and variables used for the tested matrices.
The protocol for the ESPOIR cohort study was approved by the Ethics Committee of Montpellier University Hospital, France. All patients gave their signed informed consent to participate in the cohort. Between December 2002 and March 2005, 813 patients with possible RA who were referred by rheumatologists and general practitioners to one of 14 regional centres were included in the ESPOIR cohort.32 ,33 Inclusion criteria were age 18–70 years, more than two swollen joints for >6 weeks and <6 months, suspected or confirmed diagnosis of RA, and not taking any DMARDs or steroids except for <2 weeks before enrolment. During the first year, patients were followed every 6 months. At each time point, data were collected on disease activity by the Disease Activity Score in 28 joints (DAS28),34 functional ability by the Health Assessment Questionnaire (HAQ),35 radiography of the hand and foot (anteroposterior views) and therapeutic regimen. Treatment strategies were not protocol-based in the ESPOIR cohort, and patients received usual care by their rheumatologist.
This study involved data for ESPOIR patients with an RA diagnosis according to their rheumatologist and initiation of a first sDMARD such as MTX or LEF with demonstrated structural efficacy for at least 3 months during the first year of follow-up in the cohort. Patients who initially received MTX started the drug 27.2±15.1 weeks after disease onset. Those who initially received LEF received it 30.1±18.1 weeks after disease onset.26 Treatment duration was 37.2±12.3 (39.4) weeks during the first year for patients under MTX and 39.6±13.5 (43.1) for patients under LEF. No statistical difference between RRP+ and RRP− patients was found for treatment duration. Some patients received insufficient dosage of another treatment (as other sDMARD or bDMARD);26 this point has not been considered in the analysis. As detailed elsewhere,26 radiographs were read pairwise by a well-trained investigator blinded to clinical evaluation (intraclass correlation coefficient 0.99, SDD 0.966).6 Structural damage was assessed qualitatively by the presence of typical RA erosions, based on their location and aspect, and quantified by the vSHS.36 ,37 RRP was defined as change in vSHS (ΔvSHS) ≥5 at 1 year.16 ,18 ,22 ,26
Baseline characteristics and disease evolution were described for all patients and by RRP status by mean±SD (median) or frequency (percentage) as appropriate (table 1). Baseline characteristics and disease evolution were compared between RRP+ and RRP− by Mann-Whitney U test (for numerical data) and Fisher's exact test (for categorical data) (table 1). To test the relevance of RRP predictors used with the previous cohorts, we used Fisher's exact test for univariate analysis, then logistic regression analysis to determine predictors of RRP as the outcome variable (table 2).
The performance of previously published RRP matrices (BEST, SONORA, ASPIRE) with patients with early RA from the ESPOIR cohort was tested by several statistics (table 3). The SWEFOT matrix was not tested due to lack of information about the model parameters estimation available in the reference article. Two models were used for each previously published matrix. The first was based on only the published values of the different estimated parameters (intercept and regression coefficients). The second was recalibrated (with a new estimate of the intercept, the regression coefficients remaining the same) on a subsample of the ESPOIR cohort population, representing one-third of the total population, to take into account a possible systematic error of prediction. Then their performance was estimated on the remaining two-thirds of the population.
The likelihood of the fit of models was assessed by the Bayesian information criteria. The overall significance of the model was assessed by computation of the Nagelkerke pseudo-R2. The calibration of the model was assessed by the Hosmer-Lemeshow goodness-of-fit test by comparing the expected and observed event rates in subgroups of the population (deciles); a significant p value indicated that the model did not fit the observed data. The calibration assessed the degree of agreement between the predicted and observed probability. The discriminatory property was tested by the area under the receiver operating characteristic curve (AUC) of sensitivity versus 1-specificity as a predictive model to identify cases (RRP+) and non-cases (RRP−). The 95% CI of the AUC was estimated with a bootstrap procedure based on 1000 replications. The mean predicted probability was calculated for cases (RRP+) and non-cases (RRP−). The discriminatory ability of the models was tested pairwise for each model relative to the ESPOIR model by the integrated discrimination improvement statistic calculated as: (average predicted probability cases—average predicted probability controls)new model—(average predicted probability cases—average predicted probability controls)original ESPOIR model. A positive value indicated that the new model provided an improvement over the original ESPOIR model. The p values were calculated as described by Pencina et al.36 For each model, the observed probability of RRP was plotted for groups of participants by predicted probability of RRP. The groups in the model were defined by cut-offs for quartiles of predicted probability of RRP; therefore, owing to a large number of equivalent values, the construction of four groups led to large differences in the number of participants in each group. The plots represent the observed proportion of RRP and the 95% CI by group.
All analyses involved the use of R 2.15.2 for Windows (R Foundation, Vienna, Austria).
The selected patients have been described elsewhere.26 Briefly, 370 patients (45.5%) from the initial ESPOIR cohort started MTX (n=335, mean dose 17.5 mg/week) or LEF (n=35, mean dose 20 mg/week) as first-line DMARDs, referred to as ‘sDMARD-treated patients’. No statistical difference was found for the treatment dose between RRP+ and RRP− patients. Their main characteristics are given in table 1. Among the 126 patients (34.1%) with disease progression (ie, ΔvSHS≥1 (SDD)), 41 had RRP (ie, ΔvSHS≥5), representing 11.1% of all sDMARD-treated patients and 32.5% of patients with disease progression. Baseline characteristics between RRP+ and RRP− were significantly different for disease duration, which was longer for RRP+ than RRP− patients. Significant differences were found concerning the IgM rheumatoid factor (RF) positivity and the anti-citrullinated protein antibodies (ACPAs) positivity with a more frequent positivity among RRP+. Another significant difference is found concerning the baseline score vSHS with a higher score among RRP+. Moreover, we do not find any difference concerning the part of the population which fulfils the 2010 classification criteria for RA.
Relevance of the determinants
The RRP predictors identified in the previously published matrices were tested in the ESPOIR sDMARD population (table 2). Substantial differences were noted in terms of the statistical significance of these predictors. The presence of RF—or ACPAs, when tested—and initial structural damage (qualitatively or quantitatively assessed) were always predictive of RRP at 1 year. However, DAS28 was not associated with RRP-positive status, as compared with what was observed in the SONORA cohort. C reactive protein (CRP) level and RRP-positive status were associated significantly with the ASPIRE matrix thresholds, but the association did not persist with the other matrices.
Likelihood of models with the ESPOIR cohort
According to the Bayesian information criterion, not surprisingly, the more likely models were the mESPOIR model (table 3). If we consider only models not developed on the ESPOIR population, we found that SONORA seems to be the more likely model. We found the same results when we analysed the pseudo-R2, with the highest pseudo-R2 (0.2) for the mESPOIR model. The Hosmer-Lemeshow test indicates the fit of the model to the data: all models had a poor fit except for the two ESPOIR models, which was predictable, and the ASPIRE CRP model.
Performance of models with the ESPOIR cohort
The discriminative power of the matrices, determined by receiver operating characteristic (ROC) analysis, is given in figure 1 and table 3. The BEST and SONORA matrices showed interesting discriminative capacities with the ESPOIR cohort, with AUC values of 0.73 and 0.76, respectively—close to that of the original ESPOIR matrix. The ASPIRE CRP matrix showed only fair discriminative capacity (0.62) and the ASPIRE erythrocyte sedimentation rate (ESR) matrix only low capacity, with an AUC close to 0.5.
In pairwise comparisons by the integrated discrimination improvement test (table 3), any model compared with the ESPOIR model showed a significantly larger difference in predicted probabilities between cases and non-cases, which suggests no significant better discrimination between cases and non-cases.
The discriminative power was analysed graphically by evolution of the observed proportion of RRP according to the predicted proportion of RRP by the matrix model, stratified by quartiles (figures 2 and 3). As a general rule, a clear gradient should be observed between these two measurements, and as observed in figure 3, this is the case for all matrices with the ESPOIR population, except for the two ASPIRE matrices. Whatever the quartile of predicted proportion of RRP estimated by the BEST matrix, the predicted proportion of RRP in the ESPOIR population was always relatively higher.
We assessed the performance of three matrix risk models to predict RRP at 1 year in patients with early RA despite csDMARD therapy. No matrix clearly showed superior performance compared with the others. All matrices could identify cases (RRP-positive) and non-cases, but the ability was lower for matrices developed in a clinical trial population, such as ASPIRE and BEST, than those developed in a clinic-based RA population such as SONORA.
One of the major strengths of the study is the use of the ESPOIR population data,32 ,33 an early arthritis cohort in which most of the participants had early RA according to the 1987 American College of Rheumatology (ACR) or 2010 ACR/European League Against Rheumatism (ACR/EULAR) criteria.37 For this study, we focused on patients receiving MTX within the first year of the disease, to be consistent with the trials that were used for matrix development. We also included patients receiving LEF, a drug of similar efficacy, which is recommended along with MTX as a first-line agent by all national or international clinical practice guidelines.2 ,9 For these reasons, the ESPOIR sample perfectly corresponds to the target population of an RRP matrix, that is, patients with early RA receiving sDMARDs in real-life settings for whom assessing whether such a therapy could be suboptimal in terms of structural damage progression is highly relevant clinically. Compared with a previously published matrix validation study performed in an established RA population,27 our study appears more relevant. One other recent study showed only fair performance of the matrices in a single hospital-based early RA cohort; the small sample (n=74) and the low number of RRP+ patients could explain the difference of their results and ours.28
Several statistics were used to compare models. There are two major types of parameters: parameters assessing the fit and parameters assessing the discriminative power. If the ESPOIR and mESPOIR matrices are set aside since they have been developed on the same data as the validation cohort, the most adequate model was the SONORA model with the parameter of fit criteria, and the ASPIRE model according the discrimnativ properties.
The performance of the matrices varied in the ESPOIR population. Such variations might be explained by several factors. First, the study population was an unselected early RA population with a minimal level of disease activity and no specific severity characteristics or predictors required;32 ,33 this situation resulted in an favourable outcome overall for most of the patients in the ESPOIR cohort.38 As a consequence, the radiographic structural progression in the ESPOIR population was lower—mean 1.6 vSHS points—than in the MTX arms for the SWEFOT, ASPIRE or BEST populations—mean 2.7, 3.7 and 5, respectively.16 ,31 ,39
Another source of variation was the nature and framing of the tested RRP determinants. Some just reflect disease parameters at a certain point in time such as inflammation of clinical or biologic parameters, while others reflect more disease intrinsic characteristics such as erosion or autoantibody status. Expectedly, the predictive performance of the latter is better than that of the former at baseline Although ESR or CRP level, immunologic status (presence of RF or ACPA) and baseline radiographic findings were always present in the matrix, other variables were not always present, especially swollen joint count. Smoking status was present only in the SWEFOT matrix;23 this determinant was not predictive of RRP in the ESPOIR cohort and has even been associated with a more favourable outcome and remission.40 The variable framing is also a matter of variation. Baseline radiographic characteristics were expressed qualitatively—erosive disease or not—in some matrix tools. Although interesting from a clinical perspective, the framing raises the question of definition and reproducibility of the notions of erosive disease or ‘typical RA erosion’, as mentioned in the 1987 ACR criteria. Recently, van der Heijde et al41 proposed a definition of erosive disease in early RA based on the number with erosive changes, namely >3 joints. The quantitative expression of structural damage—number of erosions or vSHS value—has been used in other matrix tools and appeared to be highly predictive of further structural damage, with better AUC value and fit than matrices involving qualitative information. This outcome was clearly expected according to the paradigm ‘who has eroded joints will have eroded joints’. However, such quantitative assessment is not performed in daily care, which limits the applicability of these RRP matrices in daily practice.
In addition, the threshold used for each continuous variable is questionable. Several methods have been used, mainly based on variable distributions, so proposed matrices are too much tailored on the patient population in which they were conceived. This is particularly true for the matrices developed in an RCT population. Another question which limits the applicability of these RRP matrices in daily practice is that X-ray damage is not even seen in 70% of MTX-treated patients, and because X-ray damage is perhaps not the most relevant outcome for patients, the RRP is not a relevant outcome to try to predict. Perhaps we should focus on the prediction of clinical response. However, we should think that variables found to predict clinical response to MTX treatment overlap with those associated with rapid progression. Furthermore, about 10% of patients got RRP when treated with csDMARD monotherapy and hence could benefit from more intense therapy, such as combo DMARDs or bDMARD. Thus, RRP is an important hallmark of aggressive or severe RA.
Potentially interesting determinants have not been tested in the matrices developed to date. Proteomics of genomic biomarkers, such as serum interleukin 6, metalloproteinases, shared epitope or eventually their combination, were not included, although several studies demonstrated their ability to predict disease severity.19 Recently, a for-profit company proposed a commercial kit of serum biomarkers associated with a patented specific algorithm, showing high correlation with disease activity and severity.42 ,43 Although interesting at the group level and for clinical research, the applicability and cost-effectiveness of such a marketed biomarker kit remains to be confirmed.44
Finally, no matrix included early treatment response. The ‘treat to target’ principle5 has been widely acknowledged and is included in many clinical practice guidelines.2 This situation is reinforced by the results of several trials involving dynamic therapeutic schemes—step-up trials—which reported close efficacy of MTX, combination therapies or biologics for early RA.30 ,45 ,46 Several studies have shown that early therapeutic response (ie, at 3 or 6 months) is highly predictive of further treatment success or failure. The use of matrices could help avoid MTX failure and be included in future guidelines.
Matrices have limitations. The logistic regression models from which they are derived provide estimates of the probability of having a RRP at one year conditionally of a combination of predictors for every single cell of the matrix; however, since RRP is less frequent with modern therapies, the right-top cells, corresponding to the most at-risk categories, are usually weakly populated, which results in wide CIs of the estimates. Such lack of power deserves further research.
Matrices seem important to consider in optimising the care of early RA by identifying patients who will respond poorly to MTX. An alternative option could be biologic agents; however, their place as first-line RA therapy has been ruled out by several economic evaluations reporting high incremental cost-effectiveness ratios largely above the generally accepted thresholds.15 ,16 Yet another option could be a combination of sDMARDs.9–11 Matrix tools could be an adequate option to identify patients with early RA at risk of inadequate response to MTX, who could benefit from more aggressive therapies such as biologic agents, although for now the predicting performance was modest. Although differences between matrices are observed, this external validation study does not allow one to state that one matrix is superior to the others.
The authors thank the French rheumatologists who referred their patients to the ESPOIR cohort in the following rheumatology departments: Amiens (P. Fardellone, P Boumier), Bordeaux (T. Schaeverbeke), Brest (A. Saraux), Lille (R.M. Flipo), Montpellier (B. Combe), Paris-Bicêtre (X. Mariette), Paris-Bichat (O. Meyer), Paris-Cochin (M. Dougados), Paris-La Pitié (B. Fautrel), Paris-St Antoine (F. Berenbaum), Rouen (X. Le Loët, O. Vittecoq), Strasbourg (J. Sibilia), Toulouse (A. Cantagrel) and Tours (P. Goupille). They are grateful to N. Rincheval for data management and expert monitoring; to S. Martin for performing all the centralised assays of CRP, IgM rheumatoid factors and anti-CCP antibodies; and to S. Harvard and L. Smales for translation and copyediting. An unrestricted grant from Merck Sharp and Dohme (MSD) was allocated for the first 5 years of the ESPOIR cohort study. Two additional grants from the INSERM were obtained to support part of the biological database. The French Society for Rheumatology, Abbott, Amgen and Wyeth also supported the ESPOIR cohort study. They also thank Drs or Prs N Vastesaeger, G Trape, CR Allaart, K Visser, J Smolen, D Aletaha, P Verschueren, C Bombardier, R Van Vollenhoven, S Saevarsdottir and S Lillegraven for thoughtful discussions about the RRP prediction matrices.
- Received January 5, 2016.
- Revision received March 7, 2016.
- Accepted April 16, 2016.
Twitter Follow Alain Saraux at @alain.saraux
Contributors BG has carried out the analysis and interpretation of data and is one of the two main contributors involved in drafting the manuscript. BF has made substantial contributions to the conception, design and acquisition of data and is one of the two main contributors involved in drafting of the manuscript. BC, XLL, AS and FG have made substantial contributions to the conception and design, acquisition of data, and carried out a critical revision of the paper. All the authors read and approved the final manuscript.
Funding Institutional support: unrestricted grant from: - GERCER (Groupe d'Etudes et de Recherches Cliniques En Rhumatologie), Paris, France. - Merck/MSD, USA.
Competing interests None declared.
Patient consent Obtained.
Ethics approval Ethics Committee of Montpellier University Hospital, France.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.