Objective To determine minimal clinically important differences (MCIDs) for improvement and worsening in various health dimensions in knee osteoarthritis under conservative therapy.
Methods Health, symptoms and function were assessed by the generic Short Form 36 and the condition-specific Western Ontario and McMaster Universities Osteoarthritis Index in n=190 patients with knee osteoarthritis before and after comprehensive rehabilitation intervention (3-month follow-up). By means of construct-specific transition questions, MCIDs were defined as the difference between the ‘slightly better/worse’ and the ‘almost equal’ transition response categories according to the ‘mean change method’. The bivariate MCIDs were adjusted for sex, age and baseline score to obtain adjusted MCIDs by multivariate linear regression. They were further standardised as (baseline) effect sizes (ESs), standardised response means (SRMs) and standardised mean differences (SMDs) and compared with the minimal detectable change with 95% confidence (MDC95).
Results Multivariate, adjusted MCIDs for improvement ranged from 2.89 to 16.24 score points (scale 0–100), corresponding to ES=0.14 to 0.63, SRM=0.17 to 0.61 and SMD=0.18 to 0.72. The matching results for worsening were –5.80 to –12.68 score points, ES=–0.30 to –0.56, SRM=–0.35 to –0.52 and SMD=–0.35 to –0.58. Almost all MCIDs were larger than the corresponding MDC95s.
Conclusions This study presents MCIDs quantified according to different methods over a comprehensive range of health dimensions. In most health dimensions, multivariate adjustment led to higher symmetry between the MCID levels of improvement and worsening. MCIDs expressed as standardised effect sizes (ES, SRM, SMD) and adjusted by potential confounders facilitate generalisation to the results of other studies.
- knee osteoarthritis
- effect size
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known about this subject?
Changes in health or symptoms should not only be detectable by statistical significance tests, they have also to be perceived by the person affected. This led to the development of minimal clinically important differences (MCIDs).
In knee osteoarthritis, there are no data on MCIDs beyond pain and general physical function.
What does this study add?
The present study extended the determination of MCIDs to overall and leg-specific function, standing/walking, mobility/stiffness, physical role performance, activity/vitality, social functioning, affective/mental health, and general health perception in knee osteoarthritis.
The MCIDs were quantified by three different effect sizes and adjusted by confounders.
How might this impact on clinical practice?
Multidimensional, generalisable MCIDs enable us to rate the clinical impact of measured effects in any study of knee osteoarthritis, with effect sizes specifically adjusted according to the type of study design.
Minimal clinically important differences (MCIDs) play an increasingly important role in evidence-based medical practice and outcome measurement.1–5 Changes in health or symptoms should not only be detected by statistical significance tests; they also have to be perceived by the person affected. The patient’s perspective of health is integral to understanding health outcomes.1 3 5 The subjective perception of outcome effects is the key element of the MCID concept.
In clinical trials, every outcome difference becomes statistically significant provided the sample size is large enough, as has been demonstrated.5 However, we can safely assume that, despite reaching statistical significance, a very small effect difference in a very large sample will not be subjectively perceptible to the person affected, that is, is not ‘clinically important’. In contrast to the concept of statistical effect significance, an effect measured that is larger or equal to the MCID indicates that the patients in that setting subjectively perceive their improvements as beneficial.
It has become accepted practice to use ‘anchor’-based estimates to determine the MCID because the patient’s viewpoint is the key characteristic and predictor for patient-rated outcome measures.1 3 5 6 Anchor-based methods use an external indicator, the ‘transition item’, to assess changes in health status; the transition item asks patients to rate any change in their health between baseline and a specific follow-up point.1 3 5 7 8 Today, the most important and most frequently used anchor-based method is the ‘mean change method’ originated by Redelmeier and Lorig in 1993.8 The MCID for improvement, for example, equals the mean of an instrument’s score difference between baseline and follow-up (eg, of pain) of the ‘slightly better’ transition response group minus that of the ‘almost equal’ group results.
While a number of studies have been published assessing MCIDs in knee osteoarthritis (for example: 3 9–13), none, to our knowledge, has examined MCIDs in health dimensions beyond pain and function in general. The present study seeks to fill this gap by extending the determination of MCIDs to other, specific functional abilities and psychosocial domains. The evaluated MCIDs will provide future studies dealing with therapy effects in knee osteoarthritis with a basis for comparison of their measured effects. Furthermore, this is the first report on the application of our recently proposed methodology using multivariate adjustment by potential confounders to minimise bias in the estimated MCID. This method maximises the generalisability of the estimated MCID levels to other testing settings.5
Using construct-specific transition questions (anchors), this evaluation study aimed to determine the MCIDs for improvement and worsening in patients with knee osteoarthritis in the following dimensions: generic and condition-specific pain, overall function, standing and walking, mobility/stiffness, physical role performance, activity/vitality, social functioning, affective/mental health and general health perception.
Patients and data collection
The data derive from the ‘Zurzach Osteoarthritis Study’, an observational, prospective cohort study, which examined health and quality of life before and after comprehensive rehabilitation for hip and knee osteoarthritis.14 All patients fulfilled the American College of Rheumatology criteria for knee osteoarthritis and signed written informed consent to participate.15 Clinical outcome data were earlier reported for controlled short-term effects in knee patients.14 16
The rehabilitation intervention consisted chiefly of active therapeutic entities (individual and group, land-based and water-based physiotherapy) and passive modalities (massage, packs, educational measures, coping instructions), which have been previously described in detail.14 For this study, all patients with knee osteoarthritis treated in hospital (2–3 weeks’ inpatient stay) and ambulatory (6–9 weeks’ outpatient sessions) settings were included. Data were collected at the start of the intervention (baseline) and at follow-up 3 months later, when all therapies had been completed (n=190).
MCIDs were determined on the basis of two instruments: the well-known and best-tested generic Medical Outcomes Study 36-Item Short Form Health Survey (SF-36), version 1, and the condition-specific Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC).17–22 In the medical literature, the SF-36 is the most widely used and most important patient-rated outcome instrument for healthy people and any health-affecting condition.6 10–14 The WOMAC is the most frequently used specific tool for hip and knee.6 9–14 Both instruments require license payments. While the manual of the SF-36 can be obtained for unlimited use for approximately US$300 (as can the German version and manual), the use of the WOMAC is limited by the number of applications (persons and follow-ups).17 19 However, all the WOMAC items are included in the Knee injury and Osteoarthritis Outcome Score (KOOS), which is available free of charge, and the German WOMAC version 1 used in this study is in the public domain.6 20
The SF-36 consists of 36 items distributed over eight scales, four physical and four psychosocial. The WOMAC is a 24-item instrument with three subscales: five items for pain, two for stiffness and 17 for function. By Rasch analysis, four functional factor subdimensions can be further defined.21 22 Of those, the WOMAC standing and walking factor, was included in the present study. The complex rationale for the construction and validation of this factor dimension has been published before.20–22
Table 2 summarises the 11 construct domains of the SF-36 and WOMAC scales used in our analysis and the corresponding transition item anchors, with their specific contents. Seven of the eight scales of the SF-36 were included (role emotional was omitted) together with the three original WOMAC subscales and the additional WOMAC factor standing/walking.
The WOMAC factor standing/walking comprises one pain and one function item for standing together with one pain and one function item for walking on flat surface, that is, four items in total. SF-36 role physical consists of four items: limits in the amount of time spent on work or other activities, accomplished less than would have liked, limits in the kind of work or other activities, and difficulties performing work or other activities. SF-36 social functioning has two items: extent and time of interference of physical or emotional problems with normal social activities.
All analyses were performed using the statistical software package IBM SPSS V.23.0 for Windows. For all steps described below, an exemplary calculation is outlined in the online supplementary appendix 1.
Descriptive data at baseline and score differences to 3-month follow-up were compiled using arithmetic means and metric SD. For ease of comparison, all scores were scaled from 0=worst (most pain, no function, worst health) to 100=best (no pain, best function, best health).14 17
Assessment of appropriateness for determining the MCID
As a preliminary analysis to test a scale’s appropriateness for calculation of the MCID, floor and ceiling proportions were determined for each scale, with the floor being the percentage of subjects with the minimum score=0, or worst health, and the ceiling the percentage with the maximum score=100, or best health. A high floor at baseline means that many patients can experience no further deterioration to the follow-up. Vice versa, a high ceiling phenomenon at baseline indicates that many patients cannot further improve. Both the floor and the ceiling affect the ability to calculate changes between baseline and follow-up on a closed scale (0–100) and, consequently, also the quantification of valid MCIDs.
In addition, Spearman rank correlation coefficients between the five transition ratings (much worse, slightly worse, almost equal, slightly better, much better) and the score differences (baseline to follow-up) were calculated. ‘Revicki’s criterion’ was applied1 13: that is, a proposed correlation of ≥0.30 to prove a sufficient level of construct convergence between the scale and the anchor.
The content of the transition item questions was construct specific for all scales (see table 2). For the constructs of function and role physical (role limitations due to physical health), two transition items were combined (table 2): walking plus dressing and personal hygiene; household chores plus work. At least one of the items had to be rated ‘slightly better’ for the combined item to score ‘slightly better’. The same applied to the rating ‘slightly worse’. However, the response to both items had to be ‘almost equal’ for the rating ‘almost equal’ to be attributed.
Bivariate, unadjusted MCIDs
Bivariate, unadjusted MCIDs were calculated according to the ‘mean change method’.5 8 The mean score change or difference of the transition category ‘slightly better’ minus that of the transition category ‘almost equal’ defined the MCID for improvement. By analogy, the mean score change and difference of the transition category ‘slightly worse’ minus that of the transition category ‘almost equal’ defined the MCID for worsening. A positive MCID reflects improvement, a negative MCID worsening.
The interval for 95% confidence of the MCID was calculated by ±t×square root of (1/n1 +1/n2)×pooled SD.23 The pooled SD is equal to , where the ‘slightly better’ group had n1=number of patients and s1=SD of the score changes (baseline to follow-up), and the ‘almost equal group’ had correspondingly n2 and s2. The multiplicator ‘square root of (1/n1 +1/n2)’ gives the SE of the MCID. The t-value comes from Student’s t distribution with two-sided type I error of 0.05 and df= n1+ n2–2 and can be calculated on the internet.24
Multivariate, adjusted MCIDs
In order to obtain a less biased, more generalisable estimate of the MCID, multivariate, confounder-adjusted MCIDs were calculated using the regression model as follows5: score change (baseline to follow-up; in score points)=b1×transition group (binary coded: 1=‘slightly better’, 0=‘almost equal’)+b2×baseline score (in score points)+b3×sex (m/f)+b4×age (in years)+constant term. The coefficient b1 is equal to the adjusted MCID and the SE of b1 is the SE of the MCID. This adjusted, multivariable MCID can also be divided by the specific SD to obtain adjusted effect sizes as described above. The interval for 95% confidence can be calculated by ±t×SE of the MCID.23 24
Standardisation of MCIDs by effect sizes
All MCIDs were further standardised, that is, divided by specific SD to obtain a parameter belonging to the family of ‘effect sizes’.5 23 This results in a dimensionless parameter that is less biassed, especially by different baseline scores, and is more generalisable to the findings of other studies.5 A positive effect size (ES) reflects improvement, a negative ES a worsening of health.
The following three most important and frequently used parameters were calculated: (1) the (baseline) ES according to Kazis et al, which is the crude MCID divided by the baseline SD of the whole sample.25 (2) The standardised response mean (SRM) according to Liang, which is the division of the crude MCID by the SD of the score differences of the whole sample.26 (3) The SMD according to Borenstein, which is the crude MCID divided by the pooled SD of the ‘slightly better’ transition category and the ‘almost equal’ transition category (see above) and multiplied by a correction factor J to reduce bias for small sample sizes.23
MDC95 for comparison with MCID
For each scale, the intraclass correlation coefficients (ICCs) were sourced from the literature.20 21 27 The ICC measures test–retest reliability and is needed to determine the minimum detectable changes with 95% confidence (MDC95). The MDC95 reflects the measurement error of two different measurements on the same scale.25 If an MCID is larger than the MDC95, it reflects a difference (between baseline and follow-up) larger than a difference that may occur due to an error of measurement with 95% confidence. The MDC95 is calculated by the t-values multiplied by the SE of measurement (SEM), which is equal to the SE of the MCID multiplied by the square root of (1−ICC).25 The MDC95 is a CI which applies to single parameter estimates (the MCID) but not to CIs of a parameter.25
Setting and descriptive data
The sample consisted of n=190 patients with knee osteoarthritis with complete baseline and follow-up data (tables 1–3). Of those, 149 (78.1%) were women. The mean age was 66.1 years and the SD 10.2 years (table 1).
The construct overlap (correlation) was ≥0.30 for all scales, with the exception of SF-36 role physical, vitality and social functioning, which failed the ‘Revicki’s criterion’ (table 2).1 However, those three scales were kept in the analysis for the sake of completeness; they are marked in lighter font. The scores at baseline (see also table 3), showed a very high floor rate for SF-36 role physical (65.8%) reflecting n=125/190 subjects with score 0 (worst health); no other scale had a floor phenomenon (table 2). Ceiling (ie, 100=best health) percentage scores were as follows: 23.7% (n=45) for SF-36 social functioning, 5.4% (n=10) for role physical and 3.7% (n=7) for WOMAC stiffness, but were very low or absent on all the other scales. The test–retest ICCs were found in three studies of hip and knee osteoarthritis.20 21 27
Baseline and follow-up scores for the whole sample (n=190) and within the three transition groups relevant for the MCID are shown in table 3. All baseline score levels were around the middle of the scale 0–100, except SF-36 bodily pain, physical functioning and role physical (16.58), reflecting relatively great pain and low physical function and role (table 2). This means that changes between the baseline and follow-up were detectable in both directions, that is, improvements (positive differences) and worsening (negative differences). The mean score at the 3-month follow-up can be calculated by adding the change to the baseline score, example SF-36 general health: 53.49+(–1.80)=51.69.
Consistent with the anchor rating, all changes showed improvements in the ‘slightly better’ category. In the ‘almost equal’ category, on average most changes showed improvements (maximum on SF-36 bodily pain, mean=+8.31), with the exception of WOMAC stiffness and SF-36 general health (table 3). In the ‘slightly worse’ group, 8/11 scales showed worsening, except SF-36 bodily pain, role physical and vitality.
Minimal clinically important differences
In the domain of pain, MCIDs for improvement ranged from 7.09 to 10.41 score points and those for worsening from –4.26 to –7.07; symmetry (improvement and worsening) was higher in the adjusted parameters: 7.09 and 8.19 versus –7.07 and –6.08 (table 4). In the domain of function, WOMAC function showed relatively large MCIDs for improvement (14.48 bivariate and 11.25 multivariate) compared with SF-36 physical functioning (4.23 and 3.81), whereas the levels for worsening were comparable (–5.38 to –7.29). Consistent with these findings, the MCIDs for the WOMAC factor standing/walking were large (improvement: 10.20, 5.93; worsening: –10.62, –12.87). WOMAC stiffness showed very large MCIDs for improvement (20.24 bivariate, 16.24 multivariate) and lower levels for worsening (–6.21, –9.91). Moderate and comparable MCIDs were found for SF-36 mental health (improvement: 5.23, 2.89 vs worsening: –4.07, –8.94) and SF-36 general health (6.00, 7.15 vs –6.53, –5.80).
The three scales with low correlations (<0.30) with the transition item or having high floor/ceiling effects, namely SF-36 role physical, SF-36 vitality and SF-36 social functioning, showed partly lopsided results (eg, SF-36 social functioning, multivariate: +2.57 for improvement, –18.50 for worsening). Moreover, most of those three MCIDs were lower than the corresponding MCD95.
The MCIDs of all the other scales were mostly much higher than the corresponding MDC95, for example, WOMAC function improvement, bivariate: 14.48 versus 3.05; the very few exceptions included the bivariate MCID for worsening of WOMAC stiffness (table 4).
Expressed as ES, SRM and SMD, 12 of 21 of the bivariate and 8/21 of the multivariate MCIDs for improvement were in the range of small effect levels of 0.30–0.50 (table 4). This was also the case for 10/21 (bivariate) and 13/21 (multivariate) MCIDs for worsening. The highest effect sizes were for improvement in WOMAC function (bivariate up to SMD 0.93).
This study presents the MCIDs for improvement and worsening in various health dimensions in patients with knee osteoarthritis. The MCIDs were quantified by the classical mean change method (unadjusted, bivariate) and adjusted for confounders by multivariate linear regression modelling. The MCIDs were expressed as raw score changes/differences in score points and in the number of SD by effect sizes, namely the (baseline) ES according to Kazis (applicable to pilot studies, which have only baseline data), the SRM according to Liang (for longitudinal cohort studies without a control group) and the SMD according to Borenstein (for randomised controlled trials).5 23 25 26 Valid MCID estimates could be determined for pain, function, standing/walking, stiffness, mental health and general health. All multivariate MCIDs and almost all bivariate MCIDs were higher than the corresponding MCD95, that is, beyond the instruments’ error of measurement with 95% confidence.25
The adjusted, multivariate MCIDs were more evenly balanced between improvement and worsening than the crude, bivariate levels (eg, WOMAC pain:+7.09/–7.07 vs +8.74/–6.09 score points; SF-36 bodily pain:+8.19/–6.08 vs +10.41/–4.26). The same is true for WOMAC function and WOMAC stiffness but not for the other scales. However, there was little difference between the multivariate and bivariate MCIDs in most cases.
In other words, the bivariate is a good estimate of the multivariate MCID. Nevertheless, the adjusted, multivariate MCID is to be preferred for application to the results of other studies.5 28 29 The variables sex, age and baseline score substantially confound health changes, but since they are available in every study setting, adjustment can be made for them. Adjustment provides more valid and generalisable MCIDs, which can be transferred and applied to different settings and patient groups for the purpose of assessing the clinical relevance of the outcome effects measured. If an effect measured in a specific study is larger or equal to the MCID, this means that the patients in that setting subjectively perceive their improvements as beneficial both on average and at group level.
Our MCIDs are in line with those of a recent multidisciplinary outpatient programme for patients with knee osteoarthritis (OA) (table 5).13 In that study, the MCIDs for improvement on the function scale of the KOOS, which is identical to the WOMAC function scale, were 8.93 score points for the transition question regarding ‘walking on level ground’ and 7.64 for the transition question regarding ‘my knee in general’. The corresponding MCIDs for worsening were –6.57 and –4.00. MCIDs for patients with knee OA after 6 weeks’ treatment with non-specific, non-steroidal anti-inflammatory drugs were calculated for the WOMAC total score (sum of 5 pain, 2 stiffness, 17 function items) and resulted in an MCID for improvement of 11 and for worsening of −16 score points.12
In an earlier study, detailed MCIDs for the SF-36 and WOMAC were calculated 6 months after patients had undergone total knee replacement surgery. They showed score point differences comparable witho the bivariate MCIDs of our study on most scales, ranging from 0.12 to 11.66 for improvement and from –4.39 to 10.62 for worsening (table 5).10
On half of the scales, our MCIDs expressed as ESs were in the range of 0.30 to 0.50, a generalisable range according to the literature.1 However, it seems that the MCID increased with the responsiveness of the scale. In other words, a highly responsive/sensitive scale measured relatively large differences between the baseline and follow-up until the change reached the level subjectively measurable by the transition rating. This was especially true for WOMAC function improvement, which is more responsive than SF-36 physical functioning (multivariate, adjusted SMD=+0.72 vs +0.21).9 30 31 Furthermore, the threshold at which improvement in stiffness became subjectively perceptible was high, and much higher than that for deterioration (multivariate SMD=+0.72 vs –0.39). Finally, subjectively perceived changes of rehabilitation in mental and general health were as expected small, reflected in the relatively small MCIDs.14
The MCIDs of SF-36 role physical, vitality and social functioning are presented but for various reasons do not appear to provide valid estimates: low construct convergence to the anchor (correlations<0.30), high floor or high ceiling effects, and MCID <MDC95 (respectively >for worsening).1 5 MCID data of those scales were presented for completeness, interest and to give an idea of their levels, but caution should be used in applying them for comparison with the effect data of other studies.
One strength of our study is that it is the first to examine MCIDs in health dimensions beyond pain and general function in patients with knee OA. The anchor rating was not global but construct specific, which improved the specificity of the MCIDs. Construct convergence to the anchor rating and floor/ceiling effects were taken into account to rate the validity of the MCID estimate for each scale. All MCIDs were related to the corresponding MCD95s. Quantification of the MCID by ESs improves their generalisability and the application of the results to testing studies, especially by the SMD for randomised controlled trials.5 Multivariate adjustment further improved the generalisability to enable comparison with other study results.5 28 29
The most important limitation is that the setting examined conservative therapy of knee osteoarthritis and it may not be possible to generalise the MCID to knee surgery.
This study was supported by the Zurzach Rehabilitation Foundation SPA, Bad Zurzach, Switzerland. The authors thank Elizabeth Kyrke for the linguistic editing of the paper.
Contributors FA planned the study, analysed and interpreted the data, and wrote the draft study report. TB, as a clinical expert, helped in the literature search and contributed to interpreting the data and to putting them into a clinical context. SL helped to plan and to carry out the study, collected the data and helped to interpret them. AA provided the resources for carrying out the study and helped in planning the study and in interpreting the data. JA helped to interpret the data, to put them into a clinical context and to finalise the study report.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent Obtained.
Ethics approval EK AG 2008/026, Switzerland.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.