Article Text

Download PDFPDF

Systematic review of patient-reported outcome measures (PROMs) for assessing disease activity in rheumatoid arthritis
  1. Jos Hendrikx1,2,
  2. Marieke J de Jonge1,
  3. Jaap Fransen2,
  4. Wietske Kievit3 and
  5. Piet LCM van Riel1
  1. 1Department of IQ Healthcare, Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
  2. 2Department of Rheumatology, Radboud University Medical Center, Nijmegen, The Netherlands
  3. 3Department for Health Evidence, Radboud University Medical Center, Radboud Institute for Health Sciences, Nijmegen, The Netherlands
  1. Correspondence to Dr Jos Hendrikx; Jos.Hendrikx{at}


Patient assessment of disease activity in rheumatoid arthritis (RA) may be useful in clinical practice, offering a patient-friendly, location independent, and a time-efficient and cost-efficient means of monitoring the disease. The objective of this study was to identify patient-reported outcome measures (PROMs) to assess disease activity in RA and to evaluate the measurement properties of these measures. Systematic literature searches were performed in the PubMed and EMBASE databases to identify articles reporting on clinimetric development or evaluation of PROM-based instruments to monitor disease activity in patients with RA. 2 reviewers independently selected articles for review and assessed their methodological quality based on the Consensus-based Standards for the selection of health Measurement Instruments (COSMIN) recommendations. A total of 424 abstracts were retrieved for review. Of these abstracts, 56 were selected for reviewing the full article and 34 articles, presenting 17 different PROMs, were finally included. Identified were: Rheumatoid Arthritis Disease Activity Index (RADAI), RADAI-5, Patient-based Disease Activity Score (PDAS) I & II, Patient-derived Disease Activity Score with 28-joint counts (Pt-DAS28), Patient-derived Simplified Disease Activity Index (Pt-SDAI), Global Arthritis Score (GAS), Patient Activity Score (PAS) I & II, Routine Assessment of Patient Index Data (RAPID) 2–5, Patient Reported Outcome-index (PRO-index) continuous (C) & majority (M), Patient Reported Outcome CLinical ARthritis Activity (PRO-CLARA). The quality of reports varied from poor to good. Typically 5 out of 10 clinimetric domains were covered in the validations of the different instruments. The quality and extent of clinimetric validation varied among PROMs of RA disease activity. The Pt-DAS28, RADAI, RADAI-5 and RAPID 3 had the strongest and most extensive validation. The measurement properties least reported and in need of more evidence were: reliability, measurement error, cross-cultural validity and interpretability of measures.

  • Rheumatoid Arthritis
  • Patient perspective
  • Outcomes research

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary points

What is already known about this subject?

  • Over the past years many Patient Reported Outcome Measures (PROMs) have been developed to measure disease activity in Rheumatoid Arthritis (RA), though information on these measures has been spread over numerous reports.

What does this study add?

  • This study provides an overview of: available PROMs to measure disease activity in RA, which measurement properties have been assessed and the level of evidence of validation efforts.

  • Of all patient-reported outcome measures in this review, Patient-derived Disease Activity Score with 28-joint counts (Pt-DAS28), Rheumatoid Arthritis Disease Activity Index (RADAI), RADAI-5 and Routine Assessment of Patient Index Data (RAPID) 3 had the strongest and most extensive validation.

  • The measurement properties least reported and in need of more evidence are: reliability, measurement error, cross-cultural validity and interpretability of measures.

How might this impact on clinical practice?

  • Physicians should be cautious when interpreting patient-reported outcome measures for disease activity and when comparing results of these instruments across different countries.


Traditionally, the monitoring of rheumatoid arthritis (RA) in clinical trials and treat to target strategies has been based on indices, such as the Disease Activity Score (DAS), Disease Activity Score with 28-Joint Counts (DAS28), Clinical Disease Activity Index (CDAI) or Simplified Disease Activity Index (SDAI), involving formal joint counts performed by trained professionals.1–3 Formal joint counts, though valued for their information, have been criticised for their use in daily practice because of their time-consuming nature. With an increasing focus on patient-centred care, rising costs in healthcare and accompanying decreases in resources, patient-reported outcome measures (PROMs) might offer a patient-friendly, location independent, and time-efficient and cost-efficient means of monitoring chronic diseases such as RA. PROM research in rheumatology spans over 30 years, during which period various measures have been developed.4–8 These cover a broad spectrum of health domains, reflecting useful information from patients' perspectives on the effectiveness of therapies tested in clinical trials. The Health Assessment Questionnaire Disability Index (HAQ), Rheumatoid Arthritis Disease Activity Index (RADAI) and Routine Assessment of Patient Index Data (RAPID) are well-known examples of PROMs for RA that are used in trials as well as in practice.2 ,5 ,6 Recently, though, other patient-reported measures reflecting their ‘physician-based’ counterparts, such as the Patient-derived Disease Activity Score with 28-joint counts (Pt-DAS28) or Patient-derived Simplified Disease Activity Index (Pt-SDAI), have also been developed.9–13 Until now, information about the measurement properties of all these patient-reported disease activity measures has been spread over numerous reports, hindering the comparison and choice of PROMs to monitor RA disease activity.

In order to understand how we can make good use of PROMs in daily practice, the first step needed is to have an overview of instruments suited to this task. Second, the level of evidence for the various measurement properties of each PROM has to be determined in order to make recommendations for clinical use. The objective of this study was therefore to identify PROMs to assess disease activity in RA and to evaluate their measurement properties.


Consensus-based Standards for the selection of health Measurement Instruments (COSMIN were applied in this systematic review.14–18 The first step in the methodology recommended by COSMIN is the development of a search strategy. This strategy is a combination of five elements: a construct search, a population search, an instrument search, a validated PubMed filter for measurement properties and an exclusion filter.19 To retrieve as many PROM-based instruments as possible, a search strategy was developed with the emphasis on sensitivity rather than specificity. PubMed and EMBASE were searched to identify articles published between January 1994 and May 2014. Studies eligible for inclusion in the search results met the following criteria: English language, published in an international peer-reviewed journal, an adult RA population, a focus on clinimetric properties of PROM-based (without a formal professional joint count) instruments aimed at capturing disease activity or focused on the association of PROM-based instruments and disease activity measures. The focus on PROMs specifically addressing disease activity, rather than PROMs measuring other consequences of disease, was chosen in order to collect a comparable set of measures with respect to construct validity. The search strategy was refined with MeSH terms, keywords and free-text words, until a test-set of 11 target publications covering different PROM-based instruments was fully covered.9 ,10 ,12 ,13 ,20–26 A full specification of the search strategies is presented in online supplementary appendices I and II.

Supplementary appendices

The second step of the review process involved independent evaluation by two assessors (WK and JH) of abstracts found by the search strategies. The selection criteria were as follows:

Inclusion criterion:

  • The article describes psychometric/clinimetric development or evaluation of a PROM-based instrument, without a formal joint count, for assessing disease activity in RA.

Exclusion criteria:

  • 1. The article describes the above specifically for a juvenile population.

  • 2. The article describes the above specifically for a population other than RA.

  • 3. The article only describes results already presented in earlier articles.

Any discordance in abstract selection was discussed in a consensus meeting. Two assessors (MJdJ and JH) then read the full text of the remaining articles as a final check of eligibility.

In the third step of the review, the methodological quality of each included study was checked by two assessors (MJdJ and JH) independently using COSMIN checklists with a four-point rating scale ranging from poor to excellent.18 Each measurement property, out of a possible 10, was scored in a separate box containing 5–18 items referring to quality aspects for the respective measurement property (eg, sample size, description of missing items or statistical method used). The guidance given to rate each item of the reported measurement properties was followed and any existing discordance in scores between the assessors was relieved in a second consensus meeting. As recommended, a final overall rating for each measurement property, described in each study, was determined by taking the lowest rating of any item in the respective box. Additionally, the second lowest score was reported to give insight into the possibility of a single low score in a respective category determining the total score.

Finally, the study characteristics and clinimetric data were extracted from the included studies (see table 1 and online supplementary appendix III).27–30 For the interpretation of statistical measures being reported in studies, several suggestions have been stated. According to Nunnally and Bernstein,31 a Cronbach's α of 0.8 is sufficient for research purposes and a value of 0.9 is recommended in case individual decisions are based on specific test scores. As a rule of thumb, Hinkle et al have proposed the following categorisation for correlational measures: 0.1–0.29 no or negligible correlation, 0.30–0.49 low correlation, 0.50–0.69 moderate correlation, 0.70–0.89 high correlation and 0.9–1.0 very strong correlation.32 For Cohen's κ as a measure of agreement, several different categorisations have been proposed, though can largely be regarded as: <0.4 poor, 0.4–0.6 fair/moderate, 0.60–0.80 substantial/good, 0.80–1.00 excellent/almost perfect.33–36 According to Swets, area under the curve (AUC) values from 0.5 to 0.7 represent poor accuracy, those from 0.7 to 0.9 are moderate and those above 0.9 represent high accuracy.37 For the overall overview of measurement properties across the included studies (table 2), the following values were considered as positive indicators of the respective measurement property: Cronbach's α ≥0.80, correlation coefficients ≥0.60, Cohen's κ ≥0.60, AUCs ≥0.70. Since there is a lack of guidance for categorisation of the magnitude of measurement error, we considered the measurement error to be positive if it was on par or smaller than similar physician-reported measures (eg, DAS28 or SDAI) that were reported in the same study. The overall quality and consistency of evidence for the measurement properties of each instrument (evaluated over multiple studies shown in online supplementary appendix III) was summarised using a method originally proposed by the Cochrane Back review group and that has been used by others since (table 3).20–23 Depending on the presence of either one or more studies of fair, good or excellent methodological quality, and the consistency of findings across studies, the level of overall evidence ranges from unknown to strong (table 3).

Table 1

Characteristics of the study population of included studies

Table 2

Overall levels of evidence of measurement properties per instrument across all included studies

Table 3

Levels of evidence for the overall quality of measurement properties per instrument across all included studies20–23


The search strategy resulted in 358 articles in PubMed and 275 articles in EMBASE. The two search strategies had a 32% overlap, resulting in 424 articles to be reviewed (figure 1). Independent assessment of the abstracts resulted in 94% concordance and consensus was reached after discussing the remaining abstracts. Discordance was mostly due to discussion if the article was aimed at validating PROM-based instruments intended to measure disease activity. After consensus, 56 abstracts were included for full review and 368 were excluded.

Figure 1

Search results PubMed/EMBASE, overlap, exclusion based on abstract review, exclusion based on full review.

Of the 56 articles that were retrieved for full-text review, 22 were excluded. Reasons for exclusion were as follows: the reported instrument was not specifically developed to assess disease activity; the reported instrument was not PROM based; the article reviewed results of earlier publications; the report did not focus on a clinimetric evaluation or the report did not provide subgroup analyses for the RA subpopulation.

The 34 articles included for full review described the following instruments: Pt-SDAI, Patient-derived Disease Activity Score with 28-joint counts (Pt-DAS28), Patient-Based Disease Activity (PDAS) I, PDAS II, RADAI, RADAI-5, Patient Activity Score (PAS) I, PAS II, RAPID 3, Routine Assessment of Patient Index Data 4-Patient Joint Count (RAPID 4-PtJC), Routine Assessment of Patient Index Data 4 Medical Doctor Joint Count (RAPID 4-MDJC), RAPID 5, Patient Reported Outcome-index (PRO-index) majority (M), continuous (C), Patient Reported Outcome CLinical ARthritis Activity (PRO-CLARA).9–13 ,20–26 ,38–59 An overview of the basic study characteristics is given in table 1. Most reports focused on 2 or 3 out of 10 possible measurement properties (see online supplementary appendix III). Aspects of validity and responsiveness were evaluated most frequently, whereas aspects of interpretation, cross-cultural validity, content validity, measurement error and reliability were seldom or not investigated. The quality of individual studies ranged from poor to excellent. Most noted reasons for poor scores were: not reporting missing items, not reporting how missing items were dealt with and poor choice of statistical measures.

Levels of evidence, over multiple studies, for each of the 17 instruments are shown in table 2. Overall, most instruments had limited or moderate levels of evidence for 3–5, out of a possible 10, measurement properties. The four instruments with the most extensive validations and strongest levels of evidence were: Pt-DAS28, RADAI, RADAI-5 and RAPID 3.


In this study, PROM-based instruments for disease activity in RA were identified and their measurement properties were systematically reviewed based on the COSMIN method.14–19 ,27 ,29 ,60 There is a large body of research related to patient-reported outcomes with inconsistent usage of terms describing outcome measures that are patient reported and terms describing different aspects of clinimetric validation. A lot of work has been carried out validating several PROM-based instruments to capture disease activity, though none of the identified instruments have good quality validation studies covering all clinimetric domains (table 2). All the information gathered in this review will be taken up in the European League Against Rheumatism (EULAR) Outcomes Measures Library (OML; in order to create an openly accessible database of PROM-based measures, which can be updated with new information as it becomes available.61

The first part of this review involved identifying reports which described the clinimetric/psychometric evaluation of PROM-based instruments to assess disease activity in RA. The two search strategies (PubMed and EMBASE) resulted in a substantial amount of unique candidate articles and 32% overlap in search hits (see figure 1, online supplementary appendices I and II). This demonstrates the value of not restricting search efforts to only one major referencing database.

In the second review round, 56 reports were included for full-text evaluation of PROM-based instruments and their measurement properties. Most candidates did not meet the inclusion criterion that the article described in a clinimetric/psychometric evaluation of a PROM-based instrument. This was to be expected as the search strategies (see online supplementary appendices I and II) were developed with a focus on sensitivity not specificity, due to a lack of consistent terminology for describing clinimetric evaluations and PROMs in the literature until now.16 ,19 It is notable that the Rheumatoid Arthritis Impact of Disease (RAID) instrument was not selected for this systematic review. This decision was made because this instrument was designed to capture ‘patients’ perception of the impact of the disease on domains of health.62 This covers a broader construct of health, including, for example, emotional well-being, when compared with the more ‘biologically’ oriented clinical indices of disease activity.63 Additionally, the focus of the RAID is on the impact of disease, which can be moderated by coping. These differences make the RAID heterogeneous to the instruments specifically focusing on RA disease activity, which this review was aimed at, and therefore less comparable, especially with regard to assessment of validity.

The third part of this review involved rating the level of evidence for each measurement property reported in the 34 articles selected in step 2. As recommended by COSMIN, the level of evidence was determined by the lowest score of all quality items for each measurement property. Almost all articles failed to report the number of missing items, or did not describe how missing data were handled, reducing the quality rating of the evidence (see online supplementary appendix III). In order to adequately evaluate an instrument, it is important to know if certain items are often missing and why this is so. The issue of failure to report missing data and their handling is not restricted to clinimetric evaluations; between 2006 and 2014, it has been reported as 1 of the 14 most frequently given review comments in the Annals of Rheumatic Diseases.64 We encourage authors and journals to place more emphasis on clear reporting of the occurrence and handling of missing data in the respective methods and results sections of the reports. Another aspect that was not clearly reported was the measurement model of the instrument. The COSMIN guide differentiates between reflective and formative models.65 Reflective models consist of items which are a manifestation of the same underlying construct. Also known as effect indicators, these items are expected to be highly correlated and interchangeable. Formative models consist of items that together form the construct. These items do not need to be correlated and internal consistency is therefore not relevant. Judging by the content of most instruments, these are probably based on formative models. Since there was no clear description of the measurement models, we scored internal consistency measurement properties as suggested by COSMIN guidelines. It should, however, be noted that these scores are most likely not relevant to the reported instruments and should therefore not be taken into account when judging its clinimetric quality. Of further note, some authors chose to refer to earlier reports for the description of the study population, which we would not advise.23 ,47–50 This hampers readers from adequately judging reported instruments, as the diversity of the study population can severely impact the evaluation of measurement properties.

In the fourth and final step of the review, all the available evidence of each instrument was compiled into table 2. It can be seen that the result of using the classification method proposed in table 3, that it is not necessarily the case that the PROMs which have been most published on, such as RADAI or RAPID 3, are thereby automatically the ‘best’ scores with regard to evidence. This is due to the quality of each individual validation study or the presence of conflicting findings across studies. Furthermore, explicit judgement on which are the ‘best’ scores is not given because that is reliant on the purpose for which the instruments are to be used. Some physicians might want to trade off ease of use against accuracy of an instrument, while others might not. Therefore, an overview with regard to the evidence available of these measures is provided and the choice of instruments is up to the reader/user, for they will be the best judges given the intended use.

To the best of our knowledge, this is the first systematic review based on the COSMIN method for PROM-based instruments assessing disease activity in patients with RA. It adds to earlier reviews of physician-based/professional-based instruments for disease activity assessment.2 ,6 We limited the review to studies published in the past 20 years, because these instruments are the most likely to be relevant to current clinical trials and daily practice. The aim of the search strategy was to focus on sensitivity. The search used a validated PubMed filter in conjunction with many free-text terms and identified all 11 test-set articles plus an additional 23 other relevant validation studies. This strengthens our belief that the search strategy was indeed sensitive; however, it is still possible that some validation studies were not found due to the high heterogeneity of terms used in the literature. To ensure the uptake of evidence concerning clinimetric evaluation of PROMs, we recommend that authors pay close attention to choosing appropriate keywords in the title, abstract and keyword section, and that they make use of consistent terminology suggested by COSMIN.16 As part of the EULAR OML strategy, authors of the identified instruments will be contacted and encouraged to provide any evidence that might have been missed by the search strategy to further enhance sensitivity. The OML will be periodically updated with new evidence by refining and rerunning the search strategies (see online supplementary appendices I and II).

Clinical implications of this systematic review can be deduced from table 2. It is clear that until now the most effort has gone into the measurement properties concerning validity aspects (hypothesis testing, criterion validity, responsiveness). Other clinimetric domains such as reliability and interpretability are in need of more evidence. If, for instance, the measurement error or minimal important change is not well known, this impedes the use of a measure. The clinical implication of this is that without these measurement properties physicians cannot judge if differences in scores are due to chance and if they are truly important to their patients. In addition to this, evaluations of cross-cultural validity and direct comparison studies are needed in order to facilitate comprehension of instrument scores across different studies and different countries. Without evidence of formal validations of instruments in the language of their choice, physicians should be cautious of using instruments, comparing scores or generalising results of clinical studies using instruments in languages other than their patients'.

In conclusion, this systematic review of PROM-based instruments identified 17 measures aimed at monitoring disease activity in RA. The quality and extent of clinimetric validation varied among reports. The measurement properties least reported and in need of more evidence were: reliability, measurement error, cross-cultural validity and interpretability of measures. In general, the Pt-DAS28, RADAI, RADAI-5 and RAPID 3 had the strongest and most extensive validation. We hope this systematic review will aid professionals in the choice of PROM-based tools for disease activity assessment in RA. It is a first step in enhancing standardisation and clinimetric evaluation of these measures for disease activity in RA, and ultimately for supporting their use in clinical trials and daily practice.



  • Twitter Follow Jos Hendrikx at @joshendrikx

  • Contributors JH, JF, WK and PLvR drafted the initial research idea and methods. JH developed the search strategy. JH and WK reviewed all abstracts for inclusion. JH and MJdJ reviewed all full-text articles and extracted the results for publication. All authors declare having read and made a substantial contribution to the final manuscript.

  • Funding This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.