Article Text

Download PDFPDF

Original research
Validity and score interpretation of the 12-item Psoriatic Arthritis Impact of Disease: an analysis of pooled data from two phase 3 trials of bimekizumab in patients with psoriatic arthritis
  1. Laure Gossec1,2,
  2. Ana-Maria Orbai3,
  3. Laura C Coates4,
  4. Dafna D Gladman5,
  5. Alexis Ogdie6,
  6. Christopher G Pelligra7,
  7. Valérie Ciaravino8,
  8. Barbara Ink9,
  9. Vanessa Taieb8,
  10. Jérémy Lambert8 and
  11. Maarten de Wit10
  1. 1INSERM, Institut Pierre Louis d’Epidémiologie et de Santé Publique, Sorbonne Université, Paris, France
  2. 2Rheumatology Department, AP-HP, Pitié-Salpêtrière Hospital, Paris, France
  3. 3Division of Rheumatology, John Hopkins University School of Medicine, Baltimore, Maryland, USA
  4. 4Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, University of Oxford, Oxford, UK
  5. 5Schroeder Arthritis Institute, Krembil Research Institute, Toronto Western Hospital, University Health Network, University of Toronto, Toronto, Ontario, Canada
  6. 6Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, USA
  7. 7Evidera, Medellín, Colombia
  8. 8UCB Pharma, Colombes, France
  9. 9UCB Pharma, Slough, UK
  10. 10Patient Research Partner, Stichting Tools, Amsterdam, Netherlands
  1. Correspondence to Professor Laure Gossec; laure.gossec{at}aphp.fr

Abstract

Objectives To investigate psychometric performance of the 12-item Psoriatic Arthritis Impact of Disease (PsAID-12) total and individual item scores in patients with psoriatic arthritis (PsA) and to estimate score change thresholds and scores corresponding to different levels of symptom/impact severity.

Methods Data up to week 16 from 1252 patients with active PsA enrolled in two randomised controlled trials of bimekizumab (BE OPTIMAL (NCT03895203) and BE COMPLETE (NCT03896581)) were used to assess construct validity (correlations with other patient-reported outcomes), known-groups validity (based on Minimal Disease Activity index, Disease Activity Index for Psoriatic Arthritis and Psoriatic Arthritis Disease Activity Score), reliability (Cronbach’s alpha and intraclass correlation coefficients (ICCs)) and responsiveness (sensitivity to change). Clinically meaningful within-patient improvement thresholds were estimated by anchor-based and distribution-based analyses, and symptom/impact severity thresholds were estimated by receiver operating characteristic curve analyses.

Results The mean (SD) PsAID-12 total score at baseline was 4.19 (1.94). PsAID-12 scores demonstrated good convergent validity and good known-groups validity. Internal consistency reliability (Cronbach’s alpha 0.95) and test–retest reliability (ICC ≥ 0.70) were also good. Responsiveness was acceptable (correlations ≥0.30 for most scores). Improvement thresholds were estimated at 1.5–2 points for the PsAID-12 total score and 2 or 3 points for item scores. Thresholds for different levels of symptom/impact severity could be derived for most PsAID-12 items.

Conclusions The PsAID-12 demonstrated robust psychometric properties in a large sample of patients with active PsA, supporting its use as a fit-for-purpose patient-reported outcome in this population. Furthermore, thresholds for score interpretation were derived.

  • Arthritis, Psoriatic
  • Patient Reported Outcome Measures
  • Outcome Assessment, Health Care

Data availability statement

Data are available on reasonable request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

WHAT IS ALREADY KNOWN ON THIS TOPIC

  • The 12-item Psoriatic Arthritis Impact of Disease (PsAID-12) is a patient-reported outcome (PRO) measure for assessing the symptoms and impacts of psoriatic arthritis (PsA). It has received provisional endorsement for use in trials from the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA) and Outcome Measures in Rheumatology (OMERACT).

  • Several moderate-sized studies have validated psychometric properties of the PsAID-12 total score in patients with PsA. However, additional validation of the PsAID-12 total score and assessment of the psychometric properties of individual PsAID-12 items are needed.

  • Furthermore, clinically meaningful within-patient improvement thresholds (cut-off values for change) would facilitate interpretation of this PRO measure.

WHAT THIS STUDY ADDS

  • Through analysis of data for more than 1200 patients in the BE OPTIMAL and BE COMPLETE randomised controlled trials, this study confirmed the good psychometric properties of the PsAID-12 total score.

  • Psychometric properties of the individual PsAID-12 items were generally acceptable. In particular, PsAID-12 pain, fatigue and skin problems item scores showed good convergent validity, known-groups validity, test–retest reliability and responsiveness.

  • Clinically meaningful within-patient improvement thresholds were established using anchor-based methods recommended by the US Food and Drug Administration. These thresholds were 2 or 3 points for individual PsAID-12 items and 1.5–2 points for the PsAID-12 total score.

HOW THIS STUDY MIGHT AFFECT RESEARCH, PRACTICE OR POLICY

  • This new information on the robust psychometric properties of PsAID-12 item and total scores in a large sample of patients with active PsA demonstrates that the PsAID-12 is fit-for-purpose to capture treatment effects in this patient population. The study provides evidence highlighted by GRAPPA-OMERACT as being necessary for full endorsement of the PsAID-12 as a recommended quality-of-life outcome measure in the PsA core measurement set.

  • The clinically meaningful within-patient improvement thresholds estimated in this study can be used to evaluate treatment effects in clinical trials and can help physicians to determine whether their patients are responding to treatment.

Introduction

The physical and psychological manifestations of psoriatic arthritis (PsA) have a major impact on patients’ well-being, limiting activities and causing emotional distress.1 Patients with PsA frequently report pain, fatigue, sleep problems, difficulties at work, withdrawal from social activities, problems interacting with family members and intimacy issues.2 3

Many of the symptoms and impacts of PsA are best assessed by patients themselves. The 12-item Psoriatic Arthritis Impact of Disease-12 (PsAID-12) is a patient-reported outcome (PRO) measure for assessing the symptoms and impact of PsA. The PsAID-12 was developed with patients and covers the main aspects important to them.4 It is provisionally endorsed by the Group for Research and Assessment of Psoriasis and Psoriatic Arthritis (GRAPPA)-Outcome Measures in Rheumatology (OMERACT) community to capture health-related quality of life in PsA clinical trials.5

The PsAID-12 total score was previously validated in moderate-scale studies,4 6–10 which were mostly observational. However, the individual PsAID-12 items have not been subject to psychometric validation, which is needed for them to be used individually. Because they represent different aspects of PsA, and because qualitative research has shown that they are some of the most frequent and disruptive symptoms of the disease,11 the PsAID-12 items relating to pain, skin problems and fatigue could be considered as priorities.

Thresholds defining clinically meaningful within-patient change in symptoms and different levels of symptom/impact severity—including the level at which patients consider themselves well12—are needed to support the interpretation of PRO scores and score changes. Such thresholds are important for fully assessing patients’ status and properly interpreting treatment effects in clinical trials. Earlier studies estimated the minimum clinically meaningful within-patient improvement threshold for the PsAID-12 total score.4 13 However, the estimates were solely derived using receiver operating characteristic (ROC) curves without reporting a range of estimates based on the anchor used in the analysis, while the US Food and Drug Administration (FDA) recommends methods supplemented by empirical cumulative distribution function and probability density function curves to evaluate a range of thresholds.14–16

The aims of this study were to confirm the psychometric performance of the PsAID-12 total score in patients with PsA; to investigate the psychometric performance of individual PsAID-12 items, in particular the pain, fatigue and skin problems items; and to estimate clinically meaningful within-patient change thresholds and disease symptom/impact severity thresholds.

Methods

Study design

This study was a post hoc analysis of pooled data from two previously reported phase 3 clinical trials of bimekizumab, a monoclonal IgG1 antibody that selectively inhibits IL-17F in addition to IL-17A, in patients with active PsA.17 18 In the BE OPTIMAL trial (NCT03895203), patients with PsA who were biological disease-modifying antirheumatic drug-naïve were randomised 3:2:1 to subcutaneous bimekizumab (160 mg every 4 weeks (Q4W)), placebo or an active reference arm (adalimumab 40 mg every 2 weeks).17 In the BE COMPLETE trial (NCT03896581), patients with PsA and a prior inadequate response or intolerance to tumour necrosis factor-α inhibitor treatment were randomised 2:1 to bimekizumab (160 mg Q4W) or placebo.18 For both trials, blinded data from the 16-week, double-blind, placebo-controlled treatment period were used. Data for patients who received placebo and active treatment were pooled and analysed together. Patients with major depression were ineligible to participate in the trials.

PRO measures and other assessments

The PRO measure of interest was the PsAID-12, formed of 12 questions which are each assessed on a 0–10 numeric rating scale. A higher score indicates a worse impact of PsA.19 The PsAID-12 was completed in the BE OPTIMAL and BE COMPLETE trials at baseline, week 4, week 12 (BE COMPLETE only) and week 16. It was administered using an electronic device, which did not allow for partially completed questionnaires. Other assessments, done at various time points including baseline and week 16, included American College of Rheumatology (ACR) response,20 21 Health Assessment Questionnaire-Disability Index (HAQ-DI),22 Patient’s Assessment of Arthritis Pain,21 Patient’s Global Assessment of PsA (PGA-PsA),21 Psoriatic Arthritis Quality of Life (PsAQoL),23 Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue),24 Short Form 36-item Health Survey (SF-36),25 Disease Activity Index for Psoriatic Arthritis (DAPSA),26 Psoriatic Arthritis Disease Activity Score (PASDAS)27 and Minimal Disease Activity (MDA) index.28–30

Statistical analyses

Psychometric analyses were conducted using data from observed cases for all randomised patients who had at least one non-missing PsAID-12 assessment in the period from baseline to week 16. Treatment allocation was not considered in any of the analyses. Descriptive statistics were calculated for PsAID-12 scores and scores for other outcome measures. PsAID-12 scores were further analysed for normality of distribution using the Shapiro-Wilk test.

For PsAID-12 item and total scores, the available data rate was calculated as the number of patients with a non-missing score at a given assessment divided by the number of randomised patients who were expected to complete the questionnaire (ie, were still receiving study drug) or had a non-missing score at that assessment. The completion rate was calculated as the number of patients with a non-missing score at a given assessment divided by the total number of randomised patients.

Construct validity and reliability

To assess convergent and divergent validity, Spearman rank correlation coefficients were computed for the associations between PsAID-12 scores and scores for selected outcome measures (PsAQoL total score, FACIT-Fatigue, PGA-PsA, and SF-36 physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, mental health, physical component summary (PCS), and mental component summary (MCS)). Correlations were interpreted as |r|<0.3= weak, 0.3≤|r|<0.5=moderate and |r|≥0.5=strong.31 We hypothesised that, compared with other outcome measures, PsAID-12 pain would be more strongly correlated with SF-36 bodily pain and that PsAID-12 fatigue would be more strongly correlated with FACIT-Fatigue and SF-36 vitality. As skin problems were not assessed directly by the other PROs, it was hypothesised that this PsAID-12 item would generally have weaker correlations than the other PsAID-12 items. PsAID-12 total score was hypothesised to be more strongly correlated with PGA-PsA, PsAQoL total score and SF-36 general health than with other outcome measures.

Known-groups validity was assessed by analysis of variance (ANOVA) comparing mean PsAID-12 scores between groups of patients categorised based on MDA index (MDA, very low disease activity (VLDA) and non-MDA), DAPSA (≤4=remission, >4 to 14=low disease activity, >14 to 28=moderate disease activity and >28= high disease activity)26 and PASDAS (≤1.9=remission, >1.9 to <3.2= low disease activity, 3.2 to <5.4=moderate disease activity and ≥5.4= high disease activity).32

To assess internal consistency reliability, item-total correlations were calculated as the Spearman rank correlation coefficient for a given PsAID-12 item score with the PsAID-12 total score derived without that item (ie, the corrected total score). |r|≥0.30 was considered evidence of acceptable internal consistency reliability.33 Internal consistency reliability was further assessed by calculating Cronbach’s alpha after deletion of individual items.34 A standardised alpha coefficient ≥0.70 was regarded as acceptable.35

Test–retest reliability was assessed in stable patients (those with the same DAPSA category at week 12 and week 16) in BE COMPLETE (the BE OPTIMAL trial did not include a PsAID-12 assessment at week 12). The intraclass correlation coefficient (ICC) was calculated using two-way mixed-effect ANOVA models that included time (week) as a fixed effect. An ICC value >0.70 was considered acceptable.36

Responsiveness

To assess potential anchors for the responsiveness (sensitivity to change) analysis, Spearman correlation coefficients were calculated for the associations between changes from baseline to week 16 in PsAID-12 scores and changes from baseline to week 16 in other outcomes (PGA-PsA, DAPSA and PASDAS scores), as well as ACR response at week 16 (absolute value since ACR response is already a change). |r|≥0.3 was considered an acceptable association for including the anchor in the responsiveness analysis.37 38 The above composite anchors were chosen as they include measures assessing the key features of PsA and are widely used and validated in this disease.39 The anchors chosen refer to disease activity, not at the symptoms levels, however, thus leading to a grouping of different concepts.

Clinically meaningful within-patient change thresholds

In accordance with FDA guidance,14 clinically meaningful within-patient change thresholds were derived by a mixed approach combining anchor-based and distribution-based analyses. In the anchor-based analysis, only outcome measures with |r|≥0.3 in the responsiveness analysis were considered as potential anchors. For the included anchors (ACR response and PASDAS), patients were divided into different response groups, and descriptive statistics were calculated for PsAID-12 score changes from baseline to week 16 within these response groups. The descriptive statistics (particularly the mean and median PsAID-12 score changes) were used as a basis for clinically meaningful within-patient change thresholds. Effect sizes were calculated as the mean change from baseline to week 16 divided by the overall baseline SD. In addition, empirical cumulative distribution function and probability density function curves of changes in PsAID-12 scores from baseline to week 16 were used in the selection of the final clinically meaningful within-patient change thresholds.

In the distribution-based analysis, SE of measurement (SEM) and 0.5 SD were calculated using baseline data.35 40 The estimates from the anchor- and distribution-based analyses were triangulated to determine a threshold for meaningful within-patient change. Ranges for the thresholds were provided based on the point estimates from the anchor-based and distribution-based estimates, with greater weight given to the anchor-based estimates than the distribution-based estimates, as recommended in FDA guidance.14

Patient symptom/impact severity thresholds

For each PsAID-12 score, thresholds for different levels of symptom/impact severity (ie, to interpret absolute PsAID-12 score at a given time) were derived using pooled data from the baseline, week 4 and week 16 assessments. First, patients’ disease activity was separately categorised as ‘remission’, ‘low disease activity’, ‘moderate disease activity’ or ‘high disease activity’ based on published thresholds for the DAPSA and PASDAS.26 32 Then, PsAID-12 scores were assessed for their ability to classify patients in each of the different DAPSA and PASDAS disease activity groups in ROC analyses. For example, to determine the threshold to distinguish patients with no symptoms/impact from patients with low symptom/impact severity on the PsAID-12, the optimal PsAID-12 item score was determined to correctly classify patients into the remission group versus the combined low/moderate/high disease activity group defined by DAPSA. This was repeated for each of the thresholds (ie, combined remission/low disease activity vs combined moderate/high disease activity group by DAPSA to distinguish low symptoms/impact severity from moderate symptoms/impact severity on the PsAID-12; combined remission/low disease activity/moderate disease activity group vs high disease activity group by DAPSA to distinguish moderate symptoms/impact severity from high symptoms/impact severity on the PsAID-12). The optimal symptom/impact severity threshold was the PsAID-12 score that maximised Youden’s index (sensitivity+specificity–1).41 An area under the curve (AUC) value of ≥0.70 was interpreted as evidence of satisfactory accuracy in differentiating patients.42 The above analysis that used severity groups on the DAPSA was repeated using severity groups on the PASDAS.

Patient and public involvement

Patients were not directly involved in the design or conduct of this study, which used data from previous trials. However, interpretation of the data and writing of this manuscript benefited from input from an author and patient research-partner (MdW).

Results

Patient characteristics

A total of 1252 patients completed at least one PsAID-12 assessment and were included in the analysis. Patients had a mean age of 49.3 years at baseline (table 1). Just over half (53.0%) were female. Disease activity was high. However, most patients (84.2%) had a Psoriasis Area and Severity Index score of <10 and thus had mild or moderate psoriasis. Nearly three-quarters of patients (73.4%) had previously received one or more conventional disease-modifying antirheumatic drugs.

Table 1

Patient demographics and baseline disease characteristics for 1252 patients with active PsA

Available data and completion rates

The available data rate for the PsAID-12 was 99.9% at baseline, 99.4% at week 4 and 99.8% at week 16. The completion rate was 99.9% at baseline, 98.3% at week 4 and 96.2% at week 16.

Distributions of PsAID-12 item scores

The mean (SD) PsAID-12 total score at baseline, on a 0–10 scale. was 4.19 (1.94) (online supplemental table 1). Mean (SD) baseline scores for PsAID-12 pain, fatigue and skin problems were 5.56 (2.26), 4.83 (2.53) and 4.68 (2.80), respectively. Compared with baseline, mean (SD) scores for PsAID-12 total score (2.65 (2.02)), pain (3.76 (2.51)), fatigue (3.32 (2.56)) and skin problems (2.60 (2.60)) were lower at week 16.

Floor and ceiling effects were not observed for PsAID-12 total score, pain, fatigue or skin problems at baseline or week 16. By contrast, strong floor effects were observed for anxiety/fear/uncertainty (scored as 0 by 32.6% of patients at baseline and 47.1% at week 16), embarrassment/shame (31.3%/52.2%), social participation (30.6%/46.9%) and depression (56.9%/65.1%) (online supplemental table 2).

Convergent and divergent validity

For PsAID-12 pain, fatigue, skin problems and total score, correlations with other outcome measures were generally stronger at week 16 than at baseline (table 2 and online supplemental tables 3,4). PsAID-12 pain showed strong correlations with SF-36 bodily pain and moderate to strong correlations with most other outcome measures at baseline and week 16 (table 2). The exceptions were SF-36 role-emotional (weak correlations at baseline and week 16), SF-36 mental health (weak correlation at baseline) and SF-36 MCS (weak correlations at baseline and week 16). PsAID-12 pain was more strongly correlated with SF-36 bodily pain than with most other outcome measures.

Table 2

Correlations between PsAID-12 pain, fatigue, skin problems, and total score and scores for other patient-reported outcomes

PsAID-12 fatigue showed moderate to strong correlations with all outcome measures at baseline and week 16 and was more strongly correlated with FACIT-Fatigue and SF-36 vitality than with most other outcome measures.

Generally, correlations for PsAID-12 skin problems were weaker than those for other PsAID-12 items, including PsAID-12 pain and fatigue. At baseline, PsAID-12 skin problems showed weak correlations with FACIT-Fatigue and all SF-36 outcomes. At week 16, PsAID-12 skin problems had moderate to strong correlations with all outcomes except for SF-36 role-emotional, SF-36 mental health and SF-36 MCS. Correlations with PsAQoL total score were moderate at both baseline and week 16.

The PsAID-12 total score showed moderate to strong correlations with all outcome measures at baseline and week 16. Among the strongest correlations were those with PsAQoL total score, FACIT-Fatigue, PGA-PsA, SF-36 physical functioning, SF-36 role-physical, SF-36 bodily pain and SF-36 PCS.

For the other PsAID-12 items, most correlations with other outcome measures were moderate or strong, especially at week 16 (online supplemental table 4). Of note, PsAID-12 embarrassment/shame had weaker correlation with other outcome measures than other PsAID-12 items.

Known-groups validity

PsAID-12 pain, fatigue, skin problems and total score showed known-groups validity based on MDA index and DAPSA and PASDAS scores at week 16 (figure 1 and online supplemental table 5). Mean and median scores were consistently higher (worse) for patients with higher disease activity than for those with lower disease activity or remission. Known-groups validity was also demonstrated for the other PsAID-12 items based on week 16 data (online supplemental table 6) and for all PsAID-12 items and the PsAID-12 total score based on baseline data (online supplemental tables 7,8). All distributions of PsAID-12 scores significantly differed by known group, as shown by the p values from the Kruskal-Wallis test (p<0.001).

Figure 1

Box plots of PsAID-12 scores for known groups based on disease activity. Data are shown as the mean (diamonds), median (horizontal lines in boxes), first and third quartiles (lower and upper edges of boxes), and minimum and maximum (excluding outliers: lower and upper external horizontal bars). Outliers are represented by circles. A lower score indicates a less severe symptom or impact. MDA, minimal disease activity; PsAID-12, Psoriatic Arthritis Impact of Disease-12; VLDA, very low disease activity.

Internal consistency reliability

Item-total correlations at week 16 ranged from 0.51 to 0.90—above the prespecified threshold of 0.30 (table 3). This indicated a high degree of consistency among the PsAID-12 items. Internal consistency reliability was confirmed by an overall standardised Cronbach’s alpha of 0.95 for the PsAID-12 total score at week 16 (table 3). This was substantially higher than the prespecified threshold of 0.70. Removing individual PsAID-12 items had little or no effect on Cronbach’s alpha, suggesting that there was no item redundancy. Similar results were obtained at baseline (online supplemental table 9).

Table 3

Internal consistency reliably of the PsAID-12, as assessed using week 16 data

Test–retest reliability

PsAID-12 pain, fatigue, skin problems and total score all showed acceptable test–retest reliability (ICC ≥ 0.70) (table 4). Test–retest reliability was also acceptable for the other PsAID-12 items (online supplemental table 10).

Table 4

Test–retest reliability (n=263): pain, fatigue, skin problems and PsAID-12 total score

Responsiveness

For PsAID-12 pain, fatigue, skin problems and total score, correlations of score changes from baseline to week 16 with ACR response and with changes from baseline to Week 16 in DAPSA, PASDAS and PGA-PsA were all ≥0.3, indicating acceptable responsiveness (table 5). PsAID-12 anxiety/fear/uncertainty, embarrassment/shame, social participation and depression showed low responsiveness based on ACR response and DAPSA (|r|<0.3). Embarrassment/shame and social participation showed acceptable responsiveness based on PASDAS and PGA-PsA; anxiety/fear/uncertainty and depression showed low responsiveness based on these anchors. The other PsAID-12 items (work/leisure activities, functional capacity, discomfort, sleep disturbance and coping) showed acceptable responsiveness based on all four anchors. Effect sizes for changes from baseline to week 16 in PsAID-12 scores by ACR response (≥70%; ≥50% to <70%; ≥20% to <50%; <20%) and PASDAS improvement categories (three levels, two levels and one level of improvement, no change and one level of worsening) are presented in online supplemental tables 11,12. The higher the level of ACR response or disease activity improvement assessed with the PASDAS, the larger the effect sizes for the changes in PsAID-12 scores; while the effect sizes were small in the group of patients with an ACR response <20% or who had no change in disease activity assessed with the PASDAS.

Table 5

Responsiveness analysis of correlations between PsAID-12 score changes from baseline to week 16 and changes for other outcomes during the same time interval, as well as ACR response at week 16

Clinically meaningful within-patient improvement thresholds

Based on the clear separation for each response group within each of the selected anchors as plotted in the empirical cumulative distribution function curves (figure 2 and online supplemental figure 1), probability density function curves (online supplemental figures 2,3), and generally higher correlations than those for DAPSA in the responsiveness analysis, ACR response and PASDAS were chosen as appropriate anchors for deriving clinically meaningful within-patient improvement thresholds. PGA-PsA was not used because clinically meaningful within-patient change has not yet been clearly established. Clinically meaningful within-patient worsening thresholds were not derived due to the small number of patients who experienced worsening on the PASDAS (n=33).

Figure 2

Empirical cumulative distribution function curves of changes in PsAID-12 scores from baseline to week 16 by ACR response category. (A) PsAID-12 pain. (B) PsAID-12 fatigue. (C) PsAID-12 skin problems. (D) PsAID-12 total score. A three-level improvement was defined as an ACR response of ≥70%, a two-level improvement as an ACR response of 50% to <70%, a one-level improvement as an ACR response of 20% to <50%, and no change as an ACR response of <20%. ACR, American College of Rheumatology; PsAID-12, Psoriatic Arthritis Impact of Disease-12.

Median changes from baseline to week 16 in patients with an ACR response of 20% to <50% at week 16 were −2 for pain, fatigue and skin problems, and −1.95 for PsAID-12 total score (online supplemental table 13). Mean changes were −2.11 for pain, −1.67 for fatigue, −2.57 for skin problems and −1.86 for PsAID-12 total score. Median changes from baseline to week 16 in patients with a 1-level improvement in PASDAS from baseline to week 16 were −2 for pain, −1 for fatigue, −2 for skin problems and −1.35 for PsAID-12 total score. Mean changes were −1.91 for pain, −1.50 for fatigue, −2.17 for skin problems and −1.62 for PsAID-12 total score.

In a distribution-based analysis, SEM was 0.82 for pain, 1.07 for fatigue, 1.37 for skin problems and 0.68 for PsAID-12 total score; 0.5 SD was 1.13 for pain, 1.26 for fatigue, 1.40 for skin problems and 0.97 for PsAID-12 total score.

Triangulation of the various estimates indicated that the clinically meaningful within-patient improvement thresholds should be a 2-point or 3-point decrease for PsAID-12 pain, fatigue and skin problems and a 1.5–2 point decrease for the PsAID-12 total score, with a higher threshold indicating greater stringency. Similar analyses for PsAID-12 work/leisure activities, functional capacity, discomfort, sleep disturbance and coping gave clinically meaningful within-patient improvement thresholds of 2 or 3 points. Improvement thresholds were not estimated for the anxiety/fear/uncertainty, embarrassment/shame, social participation and depression items.

Patient symptom/impact severity thresholds for PsAID-12 scores

Derived from ROC analyses (online supplemental table 14), the threshold values shown in table 6 distinguish different levels of symptom/impact severity from the patient perspective. For PsAID-12 pain, a score of ≤2 was associated with remission/no symptoms/no impact, 3 low symptom/impact severity, 4 moderate symptom/impact severity and ≥5 high symptom/impact severity. The corresponding values for PsAID-12 fatigue and skin problems were 0 or 1, 2, 3 or 4 and ≥5. For PsAID-12 total score, a score of ≤1.15 denotes remission/no symptoms/no impact, >1.15 to 1.95 low symptom/impact severity, >1.95 to 3.60 moderate symptom/impact severity and >3.60 high symptom/impact severity. Symptom/impact severity thresholds were not derived for PsAID-12 anxiety/fear/uncertainty, embarrassment/shame and depression because AUC values from the ROC analyses were less than the prespecified threshold of 0.70.

Table 6

PsAID-12 item and total scores for different disease activity states

Discussion

PRO measures are valuable tools for assessing symptoms and impacts; like other outcome measures, they need to be validated to ensure scores reflect what is intended to be measured, and threshold values are very useful to aid their interpretation. In this analysis of data from more than 1200 patients with active PsA, conducted in accordance with FDA guidance, measurement properties of the PsAID-12 total score were shown to be robust. Moreover, we were able to validate PsAID-12 items, and to estimate threshold values corresponding to clinically meaningful within-patient improvements for most PsAID item scores and the PsAID-12 total score. These ‘change thresholds’ can be used to interpret changes in PsAID-12 scores over time, for a given individual. Furthermore, we derived thresholds for PsAID-12 scores to define different levels of symptom/impact severity from the patient perspective, which correspond to different levels of disease activity on composite scores. These ‘state thresholds’ can be used to interpret absolute PsAID-12 scores at a given time point. Our findings confirm the validity of the PsAID-12 total score, and in particular its responsiveness, and are expected to facilitate its interpretation. They also open the possibility to use individual PsAID-12 items to facilitate patient management by maximising information use from this measure. Importantly, this study provides the requisite randomised controlled trial (RCT) evidence and threshold derivation for the PsAID-12 to progress from provisional to full endorsement by GRAPPA-OMERACT as a recommended quality-of-life outcome measure in the PsA core measurement set to be used in observational and interventional clinical trials.5

Distributional properties of the PsAID-12 were as expected, with most items showing no floor effects. PsAID-12 item and total scores demonstrated good convergent, divergent and known-groups validity and moderate-to-good test–retest reliability. Internal-consistency reliability analyses indicated strong consistency among the PsAID-12 items and no item redundancy. Responsiveness was established for the PsAID-12 total score and individual PsAID-12 items, except for the anxiety/fear/uncertainty, embarrassment/shame, social participation and depression items.

The results of this study are consistent with a preliminary psychometric evaluation of the PsAID-12 total score using data from 474 patients with PsA.4 Values for construct validity (Spearman correlations with other instruments), internal-consistency reliability (Cronbach’s alpha: 0.95 in this study, 0.93–0.94 in the preliminary validation study) and test–retest reliability (ICC: 0.88 in this study, 0.95 in the preliminary validation study) were similar in the two studies. Moreover, responsiveness was acceptable in both studies. Importantly, this study confirms the responsiveness of the PsAID-12 and its items, when the concept of responsiveness of PRO measures has led to debate in the past.43 In an analysis based on prospectively collected data from 129 patients with PsA attending a specialist rheumatology hospital in the UK, the PsAID-12 total score had a Cronbach’s alpha of 0.95 and an ICC of 0.91.13 As in this study, the PsAID-12 total score was strongly correlated with PsAQoL, FACIT-Fatigue and PGA scores. Additional validation studies have reported similar findings.6–10

A 2-point decrease in PsAID-12 pain, fatigue or skin problems score was identified as the lower bound for a clinically meaningful within-patient improvement, with a 3-point decrease indicating a marked within-patient improvement (large to very large effect size). These results are consistent with the responder definition for the Fatigue numeric rating scale derived from the clinical trial programme of ixekizumab in PsA, which demonstrated a 3-point decrease as a large improvement and a meaningful score change estimated between 2 and 4 points.44 For the PsAID-12 total score, the corresponding score changes were 1.5-point and 2-point decreases. Threshold estimates were based on mean and median PsAID-12 score changes in groups of patients who experienced an ACR response of 20% to <50% or a one-level improvement on the PASDAS. These two anchors had shown acceptable responsiveness and demonstrated separation between response groups in empirical cumulative distribution function and probability density function curves of change in PsAID-12 scores. The threshold estimates were supported by distribution-based estimates, which were lower than all anchor-based estimates.

The threshold values for the PsAID-12 total score are lower than the estimate of 3 points from the previous psychometric evaluation by Gossec et al.4 This difference is likely due to differences in the study design and populations, and in the methodology and anchors used. Indeed, in the clinical study data sets, no patient-derived anchor, such as a patient global impression of severity or of change for which a clinically meaningful change threshold has been established, was available; composite outcome measures such as the PASDAS and DAPSA, with evidence for what constitutes a clinically meaningful within-patient change, were used as anchors. This is a limitation to the current study since the anchors used as references are not based on patient perceptions, but rather on disease activity on composite scores. Thus, the cut-off values proposed herein offer a way to superimpose different but overlapping concepts—the concepts of disease activity, and of patient-perceived disease impact. Furthermore, using composite measures which have been well-validated and are widely used, as anchors, offers an original approach to the issue often found when defining cut-off values, which is how to validate the reference standards used as anchors. The timing of the assessments for the clinically meaningful within-patient improvement analysis was also different for the two studies: 16 weeks in the present analysis vs 10–16 weeks in the previous study. Similarly, our symptom/impact severity thresholds are lower than similar thresholds reported from a cross-sectional analysis of 144 patients with PsA attending two tertiary rheumatology centres in Italy,7 although this difference may reflect the use of a different methodological approach and a different patient population.

The floor effects and psychometric performance limitations observed for PsAID-12 anxiety/fear/uncertainty, embarrassment/shame, social participation and depression were likely due to formal exclusion of patients with depression from the BE OPTIMAL and BE COMPLETE trials. It should be noted that three of these four PsAID-12 items (embarrassment/shame, social participation and depression) were excluded from an alternative 9-item version of the PsAID because their inclusion did not improve psychometric performance of the PsAID.4 Although these items were ranked as less important than pain, fatigue and skin problems, they were included in the PsAID-12 because they are important for a minority of people with PsA4 45 and should be evaluated and addressed in clinical practice.

This study has strengths and limitations. The analysis of convergent and divergent analysis did not include a dedicated skin-related outcome measure, which limited our ability to assess convergent validity of the PsAID-12 skin problems item. Moreover, as noted above, the estimation of PsAID-12 symptom/impact severity thresholds did not include MDA index or another measure that includes skin symptoms, but instead used the DAPSA and PASDAS. However, the significance of this limitation when using the estimated thresholds for patients with PsA may depend on the patients’ particular disease characteristics. A further limitation is the use of data from phase 3 trial populations who were required to meet certain eligibility criteria. This skewed the sample to patients with active moderate-to-severe PsA and may limit extrapolation of the findings to the broader population of patients with PsA. Conversely, the multinational nature of the BE OPTIMAL and BE COMPLETE trials is a strength of the study since PsAID-12 validation can be considered in different languages. However, no cross-country comparisons were made. Additionally, data from RCTs with both active and placebo arms allowed for wider variation in PsAID-12 score changes and disease activity to be captured in the analyses, thus potentially resulting in more accurate estimates than those that might have been obtained from a non-interventional study.

In conclusion, the PsAID-12 total score and individual PsAID items capturing disease concepts important to patients with PsA demonstrated robust psychometric properties in a large sample of the target patient population. The results support use of the PsAID-12 as a fit-for-purpose PRO to inform assessment of treatment effects in patients with active PsA. Future studies should evaluate the utility of individual PsAID-12 items for monitoring and managing patients with PsA and should explore PsAID-12 symptom/impact severity thresholds maximising sensitivity (instead of Youden’s index), which may be more useful in clinical practice.

Data availability statement

Data are available on reasonable request.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and these studies were done in accordance with the Declaration of Helsinki and the International Conference onHarmonisati on Guidance for Good Clinical Practice. Ethics approval was obtained from the relevant institutional review boards at participating sites. Participants gave informed consent to participate in the study before taking part.

Acknowledgments

The authors thank the patients and the investigators and their teams who took part in this study. Portions of this work were previously presented at ISPOR 2023 (7 May 2023–10 May 2023, Boston, Massachusetts, USA) and at EULAR 2023 (31 May–3 June, Milan, Italy). The authors acknowledge Stephen Gilliver, PhD, of Evidera, Sweden for providing medical writing support based on the authors’ input and direction; Bethan Taylor, BA, Costello Medical, UK, for publication coordination; Dr Martin Bauer, MD, FNWC, UCB Pharma, Monheim am Rhein, Germany for his work in the BKZ in PsA program; and Heather Edens, PhD, UCB Pharma, Smyrna, GA, USA for publication coordination and editorial assistance. Medical writing support, publication coordination, and editorial assistance were provided in accordance with Good Publication Practice (GPP) 2022 guidelines (https://www.ismpp.org/gpp-2022https://www.ismpp.org/gpp-2022).

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Twitter @drlauracoates

  • Contributors Substantial contributions to study conception and design: LG, CGP, JL, VC, BI and VT; substantial contributions to analysis and interpretation of the data: LG, LCC, AO, A-MO, MdW, CGP, JL, VC, BI, VT and DDG; drafting the article or revising it critically for important intellectual content and final approval of the version of the article to be published: LG, LCC, AO, A-MO, MdW, CGP, JL, VC, BI, VT and DDG; guarantor: LG

  • Funding This study, as well as medical writing support, publication coordination, editorial assistance, and journal submission fees, were funded by UCB Pharma, Brussels, Belgium.

  • Competing interests LG: Grants from AbbVie, Biogen, Eli Lilly, Novartis, Sandoz and UCB Pharma; consulting fees from AbbVie, Amgen, BMS, Celltrion, Galapagos, Janssen, Eli Lilly, MSD, Novartis, Pfizer, Sandoz and UCB Pharma. A-MO: Research grants to Johns Hopkins University from AbbVie, Amgen and Janssen; consulting fees from BMS, Janssen, Sanofi and UCB Pharma. LCC: Grants/research support from AbbVie, Amgen, Celgene, Eli Lilly, Gilead, Janssen, Novartis, Pfizer and UCB Pharma; consultant for AbbVie, Amgen, BMS, Boehringer Ingelheim, Celgene, Domain, Eli Lilly, Galapagos, Gilead, Janssen, Moonlake Pharma, Novartis, Pfizer, and UCB Pharma; speaking fees from AbbVie, Amgen, Biogen, Celgene, Eli Lilly, Galapagos, Gilead, GSK, Janssen, medac, Novartis, Pfizer, and UCB Pharma. DG: Consultant and/or received grant support from Abbvie, Amgen, BMS, Celgene, Eli Lilly, Gilead, Galapagos, Janssen, Novartis, Pfizer, and UCB Pharma. AO: Grant/research support from AbbVie, Amgen, Janssen, Novartis and Pfizer, consultant of AbbVie, Amgen, BMS, Celgene, CorEvitas, Eli Lilly, GSK, Gilead, Janssen, Novartis, Pfizer, Takeda and UCB Pharma. CGP: Employed by Evidera, a part of Thermo Fisher Scientific, which received funding for this research from UCB Pharma. VC, VT and JL: Employees and stockholders of UCB Pharma. BI: Employee of UCB Pharma; shareholder of AbbVie, GlaxoSmithKline and UCB Pharma. MdW: Over the last five years Stichting Tools has received fees for lectures or consultancy provided by MdW from Celgene, Eli Lilly, Janssen-Cilag, Pfizer and UCB Pharma.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.