Original article
A taxonomy for responsiveness

https://doi.org/10.1016/S0895-4356(01)00407-3Get rights and content

Abstract

Responsiveness is quickly becoming a critical criterion for the selection of outcomes measures in studies of treatment effectiveness, economic appraisals, and other program evaluations. Statistical characteristics, specifically “large effect sizes,” are often felt to indicate the relative worth of one instrument over another. However, debates about their meaning led the present authors to propose a taxonomy for responsiveness based on the context of the study concerned. The three axes underlying the classification system relate to: who is this being analyzed for (individuals or groups); which scores are being contrasted (over time/at one point in time); and the type of change being quantified (for example, observed change or important change). It is concluded that responsiveness should be considered a highly contextualized attribute of an instrument, rather than a static property and should be described only in that way. A questionnaire could thus be described as being “responsive to” a given category in the new taxonomy.

Introduction

Clinicians often use indexes, instruments or questionnaires to evaluate their patients over time, and they must select which one(s) to use. The most appropriate measure must be sensible, reliable, valid and (if used to evaluate change over time) it must also be “responsive” 1, 2. Responsiveness is defined as the ability of an instrument to detect accurately change when it has occurred 3, 4 and is usually quantified by a statistical or numeric score, such as an effect size statistic 4, 5, 6, 7, 8, 9 or a standardized response mean. The question arises, however, of whether such scores provide clinicians with enough information about the usefulness of an instrument in its intended application.

In many ways, the interpretation of statistics of responsiveness is analogous to the interpretation of P-values used in treatment trials, where too much emphasis is often placed on the magnitude of its numeric estimates, such as a P-value of 0.05 or less, with too little attention paid to the nature and meaning of the change being quantified. Which patients are being compared? How long was the follow-up? Which treatments are involved? The answers to these questions provide the context of the trial and are essential for interpreting the results (the P-value). Similarly, with responsiveness, the magnitude of the effect size statistic alone is unlikely to provide enough information to aid in its' interpretation. The context of the study of responsiveness (i.e., the nature of the change that the study is set up to measure) must be considered before interpreting the magnitude. What had the patients experienced to be considered “changed” in the study of responsiveness? What change scores are being quantified: the change in treatment over control or the change in the treatment group alone. Is the change quantified change in one patient or change in a group of patients? To date, attention has focused primarily on finding measures with the largest responsiveness statistics or determining if a measure can produce a “large” effect size statistic, greater than .80, according to Cohen [10], and therefore is “responsive.” But similar to the P-value these statistics on their own lack any context and therefore often lead to the assumption that responsiveness, once established is a static, context-free attribute of the questionnaire—an assumption we feel has fueled the numerous discussions about interpretation of these statistics.

It is argued here that discussing the context of the measurement of responsiveness (i.e., the nature of the change that the study is set up to measure) rather than the magnitude of the statistic is more likely to advance the already protracted debate 9, 11, 12, 13, 14, 15, 16, 17 over the interpretability [18] of responsiveness statistics. This approach looks deeper than the statistic's numeric value to “what” is actually being quantified. As Michell suggests, “protracted controversy suggests that the disagreement lies much deeper than the arguments hitherto presented imply” [19, (p. 398)]. The debate over the interpretation of responsiveness statistics is not just of methodological or academic interest, but has direct implications for how we assess patients and how we decide if treatments have truly made them better. Without a clear understanding of responsiveness statistics, a meaningless change could be misconstrued as clinically significant when it is merely statistically significant 20, 21; alternatively, a small gain in mobility might be statistically insignificant 22, 23 but be considered by the patient as being a very important improvement. As Jenkinson [24] suggests, “the results of health status measures could not simply be misleading, but actually harmful.”

We suggest that there are three core topics in the debates over responsiveness ordered according to the frequency with which we found them in the literature: first, the interpretation of the statistic (i.e., is the change relevant or important?) 11, 13, 21, 25, 26, 27, 28, 29, 30, 31, 32, 33; second, methodological issues (i.e., how should studies of responsiveness be designed? 7, 17, 29, 31, 33 and how should the property be quantified and analyzed? 4, 7, 9, 17, 27, 34, 35, 36); and finally, the conceptual and definitial issues (i.e., how is responsiveness conceptualized? what is being quantified? 17, 36, 37). Although the conceptual and definitional issues have received least attention, they may be the most fundamental factors and are probably critical to making sense of the data (interpretability). It is these issues that are the focus of this article.

The literature contains many definitions of responsiveness (Table 1), and the differences between them are instructive. Most authors agree that responsiveness involves the ability of a measure to detect change but there are wide variations in opinion about nature of the change that is being detected. For example, in 1977 Guyatt et al. [38] defined responsiveness as “the ability to detect change, specifically, important change, in the way patients are feeling, even if those changes are small,” thus focusing on individual feelings and fine discrimination. In 1994, Testa et al. took a broader view, defining responsiveness as “the ability to detect meaningful treatment effects” [22], whereas Anderson and Chernoff in 1993, [39] said it was the “ability to detect important changes in disease activity over time” (emphasis added). Differences between the definitions are critical, as they each reflect distinct types of change being quantified in a given analysis of responsiveness, and thus different distinct types of categories of responsiveness. We suggest that many parts of the conceptual and definitional debate could be resolved by allowing these to stand as distinct types of responsiveness, each depending on the nature of the change described within the study. Earlier we used deBruin et al.'s [3] definition of responsiveness. This was chosen as our operational definition because it did not specify the nature of the change but did specify that there needed to be some determination that the change had occurred. This definition encompasses all the types of change specified in the others.

The purpose of the present article is to propose a classification system that reflects the context of the measurement of responsiveness where the context is defined by specific attributes of the change designed into a study of responsiveness. This suggests that different categories or types of responsiveness can be defined in terms of the attributes of the particular change being quantified.

Section snippets

Literature review

Articles discussing the theory of responsiveness were gathered. The initial source was based on the personal files of the authors. In addition, Murawski and Miederhoff [9] (who published a review of the literature on responsiveness up to 1994) shared their list of 324 references and search strategy. Their approach was to seek all articles using health status measures through an electronic search and through a hand-review of the more than 20,000 abstracts, to find those which dealt with

Building a taxonomy of responsiveness

Several articles concerning responsiveness have discussed different aspects of the nature of the change being studied and how they relate to interpretation of resultant statistics. Three groups of articles were identified: those that discussed individual-level versus group-level assessment of change 13, 18, 35, 41, 45, 46, 47; those that considered the contrast of between-person difference versus within-person change 13, 14, 34, 40, 48, 49, 50; and those that addressed different types of change

Discussion

This article has revealed the need to consider responsiveness a context-specific attribute, where the nature of the change designed into the study partially defines that context. It has also created a taxonomy of responsiveness to define the nature of the change in the study. We suggest that this proposed taxonomy reconciles many of the debates and discussions in the literature 14, 21, 36, 46, 101 by locating the nature of change (category of change) within a matrix defined by three axes: Who

Acknowledgements

The authors thank Dr. Matthew Murawski for sharing the results of his literature review on responsiveness up to 1994 and Dr. Valerie Tarasuk for her significant contribution to the writing and review of this manuscript. Dr. Beaton was supported by a PhD fellowship (health research) from the Medical Research Council of Canada and by the Institute for Work & Health while this research was done. Dr. Wright is the R.B. Salter Chair of Surgical Research and a Medical Research Council of Canada

References (121)

  • D.A Redelmeier et al.

    Assessing the minimal important difference in symptomsa comparison of two techniques

    J Clin Epidemiol

    (1996)
  • R Buchbinder et al.

    Classification systems of soft tissue disorders of the neck and upper limbdo they satisfy methodological guidelines?

    J Clin Epidemiol

    (1996)
  • S Wood-Dauphinee

    Assessing quality of life in clinical researchfrom where have we come and where are we going?

    J Clin Epidemiol

    (1999)
  • E.F Juniper et al.

    Determining a minimal important change in a disease-specific quality of life questionnaire

    J Clin Epidemiol

    (1994)
  • R Jaeschke et al.

    Measurement of health status. Ascertaining the minimal clinically important difference

    Control Clin Trial

    (1989)
  • C.R MacKenzie et al.

    Can the sickness impact profile measure change? An example of scale assessment

    J Chronic Dis

    (1986)
  • R.A Deyo et al.

    Assessing the responsiveness of functional scales to clinical changean analogy to diagnostic test performance

    J Chronic Dis

    (1986)
  • J.I Williams et al.

    How should health status measures be assessed? Cautionary notes on procrustean frameworks

    J Clin Epidemiol

    (1992)
  • A.R Feinstein

    Twentieth century paradigms that threaten both scientific and humane medicine in the twenty-first century

    J Clin Epidemiol

    (1996)
  • H.M Jacobs et al.

    The evaluation of changes in functional health status in patients with abdominal complaints

    J Clin Epidemiol

    (1996)
  • M.S Lachs

    The more things change

    J Clin Epidemiol

    (1993)
  • L Christensen et al.

    A method of assessing change in a single subjectan alteration of the RC index

    Behav Ther

    (1986)
  • P Ravaud et al.

    Assessing smallest detectable change over time in continuous structural outcome measuresapplication to radiological change in knee osteoarthritis

    J Clin Epidemiol

    (1999)
  • N.S Jacobson et al.

    Psychotherapy outcome researchmethods for reporting variability and evaluating clinical significance

    Beh Ther

    (1984)
  • K.W Wyrwich et al.

    Further evidence supporting standard error of measurement based criterion for identifying meaningful intra-individual change in health-related quality of life

    J Clin Epidemiol

    (1999)
  • G Stucki et al.

    Interpretation of change scores in ordinal clinical scales and health status measuresthe whole may not be equal to the sum of the parts

    J Clin Epidemiol

    (1996)
  • C.V van Walraven et al.

    Surveying physicians to determine the minimal important differenceimplications for sample-size calculation

    J Clin Epidemiol

    (1999)
  • D.E Beaton et al.

    Assessing the reliability and responsiveness of five shoulder questionnaires

    J Shoulder Elbow Surg

    (1998)
  • R.D Hays et al.

    Psychometric consideration in evaluating health-related quality of life measures

    Qual Life Res

    (1993)
  • J.G Wright et al.

    A comparison of different indices of responsiveness

    J Clin Epidemiol

    (1998)
  • J Cohen

    Things I have learned so far

    Am Psychologist

    (1990)
  • L.E Kazis et al.

    Effect sizes for interpreting changes in health status

    Med Care

    (1989)
  • J.N Katz et al.

    Comparative measurement sensitivity of short and longer health status instruments

    Med Care

    (1992)
  • M.M Murawski et al.

    On the generalizability of statistical expressions of health related quality of life instrument responsivenessa data synthesis

    Qual Life Res

    (1998)
  • J Cohen

    Statistical power analysis for the behavioral sciences

    (1998)
  • J.L Donovan et al.

    Assessing the need for health status measures

    J Epidemiol Comm Health

    (1993)
  • R.A Deyo et al.

    Strategies for improving and expanding the application of health status measures in clinical settingsa researcher-developer viewpoint

    Med Care

    (1992)
  • J Nunnally

    The study of change in evaluation researchprinciples concerning measurement, experimental design and analysis

  • A Leplege et al.

    The problem of quality of life in medicine

    JAMA

    (1997)
  • G.H Guyatt et al.

    Health status, quality of life, and the individual

    JAMA

    (1994)
  • J Michell

    Measurement scales and statisticsa clash of paradigms

    Psychol Bull

    (1986)
  • Redelmeier DA, Goldstein R, Min ST, Hyland RH. Spirometry and dyspnea in patients with COPD: when small differences...
  • I McDowell et al.

    Development standards for health measures

    J Health Serv Res Policy

    (1996)
  • M Testa et al.

    Methods for quality of life studies

    Annu Rev Public Health

    (1994)
  • L.J Cronbach et al.

    How we should measure “change”—or should we?

    Psychol Bull

    (1970)
  • L.E Braitman

    Confidence intervals assess both clinical significance and statistical significance

    Ann Intern Med

    (1991)
  • Fortin PR, Stucki G, Katz JN. Measuring relevant change: an emerging challenge in rheumatic clinical trials. Arthritis...
  • M.H Liang

    Evaluating measurement responsiveness

    J Rheumatol

    (1995)
  • M Drummond et al.

    Clinical importance, statistical significance and the assessment of economic and quality-of-life outcomes

    Health Economics

    (1993)
  • R.A Deyo et al.

    Strategies for improving and expanding the application of health status measures in clinical settingsa researcher-developer viewpoint

    Med Care

    (1992)
  • Cited by (413)

    View all citing articles on Scopus
    View full text