Original articleA taxonomy for responsiveness
Introduction
Clinicians often use indexes, instruments or questionnaires to evaluate their patients over time, and they must select which one(s) to use. The most appropriate measure must be sensible, reliable, valid and (if used to evaluate change over time) it must also be “responsive” 1, 2. Responsiveness is defined as the ability of an instrument to detect accurately change when it has occurred 3, 4 and is usually quantified by a statistical or numeric score, such as an effect size statistic 4, 5, 6, 7, 8, 9 or a standardized response mean. The question arises, however, of whether such scores provide clinicians with enough information about the usefulness of an instrument in its intended application.
In many ways, the interpretation of statistics of responsiveness is analogous to the interpretation of P-values used in treatment trials, where too much emphasis is often placed on the magnitude of its numeric estimates, such as a P-value of 0.05 or less, with too little attention paid to the nature and meaning of the change being quantified. Which patients are being compared? How long was the follow-up? Which treatments are involved? The answers to these questions provide the context of the trial and are essential for interpreting the results (the P-value). Similarly, with responsiveness, the magnitude of the effect size statistic alone is unlikely to provide enough information to aid in its' interpretation. The context of the study of responsiveness (i.e., the nature of the change that the study is set up to measure) must be considered before interpreting the magnitude. What had the patients experienced to be considered “changed” in the study of responsiveness? What change scores are being quantified: the change in treatment over control or the change in the treatment group alone. Is the change quantified change in one patient or change in a group of patients? To date, attention has focused primarily on finding measures with the largest responsiveness statistics or determining if a measure can produce a “large” effect size statistic, greater than .80, according to Cohen [10], and therefore is “responsive.” But similar to the P-value these statistics on their own lack any context and therefore often lead to the assumption that responsiveness, once established is a static, context-free attribute of the questionnaire—an assumption we feel has fueled the numerous discussions about interpretation of these statistics.
It is argued here that discussing the context of the measurement of responsiveness (i.e., the nature of the change that the study is set up to measure) rather than the magnitude of the statistic is more likely to advance the already protracted debate 9, 11, 12, 13, 14, 15, 16, 17 over the interpretability [18] of responsiveness statistics. This approach looks deeper than the statistic's numeric value to “what” is actually being quantified. As Michell suggests, “protracted controversy suggests that the disagreement lies much deeper than the arguments hitherto presented imply” [19, (p. 398)]. The debate over the interpretation of responsiveness statistics is not just of methodological or academic interest, but has direct implications for how we assess patients and how we decide if treatments have truly made them better. Without a clear understanding of responsiveness statistics, a meaningless change could be misconstrued as clinically significant when it is merely statistically significant 20, 21; alternatively, a small gain in mobility might be statistically insignificant 22, 23 but be considered by the patient as being a very important improvement. As Jenkinson [24] suggests, “the results of health status measures could not simply be misleading, but actually harmful.”
We suggest that there are three core topics in the debates over responsiveness ordered according to the frequency with which we found them in the literature: first, the interpretation of the statistic (i.e., is the change relevant or important?) 11, 13, 21, 25, 26, 27, 28, 29, 30, 31, 32, 33; second, methodological issues (i.e., how should studies of responsiveness be designed? 7, 17, 29, 31, 33 and how should the property be quantified and analyzed? 4, 7, 9, 17, 27, 34, 35, 36); and finally, the conceptual and definitial issues (i.e., how is responsiveness conceptualized? what is being quantified? 17, 36, 37). Although the conceptual and definitional issues have received least attention, they may be the most fundamental factors and are probably critical to making sense of the data (interpretability). It is these issues that are the focus of this article.
The literature contains many definitions of responsiveness (Table 1), and the differences between them are instructive. Most authors agree that responsiveness involves the ability of a measure to detect change but there are wide variations in opinion about nature of the change that is being detected. For example, in 1977 Guyatt et al. [38] defined responsiveness as “the ability to detect change, specifically, important change, in the way patients are feeling, even if those changes are small,” thus focusing on individual feelings and fine discrimination. In 1994, Testa et al. took a broader view, defining responsiveness as “the ability to detect meaningful treatment effects” [22], whereas Anderson and Chernoff in 1993, [39] said it was the “ability to detect important changes in disease activity over time” (emphasis added). Differences between the definitions are critical, as they each reflect distinct types of change being quantified in a given analysis of responsiveness, and thus different distinct types of categories of responsiveness. We suggest that many parts of the conceptual and definitional debate could be resolved by allowing these to stand as distinct types of responsiveness, each depending on the nature of the change described within the study. Earlier we used deBruin et al.'s [3] definition of responsiveness. This was chosen as our operational definition because it did not specify the nature of the change but did specify that there needed to be some determination that the change had occurred. This definition encompasses all the types of change specified in the others.
The purpose of the present article is to propose a classification system that reflects the context of the measurement of responsiveness where the context is defined by specific attributes of the change designed into a study of responsiveness. This suggests that different categories or types of responsiveness can be defined in terms of the attributes of the particular change being quantified.
Section snippets
Literature review
Articles discussing the theory of responsiveness were gathered. The initial source was based on the personal files of the authors. In addition, Murawski and Miederhoff [9] (who published a review of the literature on responsiveness up to 1994) shared their list of 324 references and search strategy. Their approach was to seek all articles using health status measures through an electronic search and through a hand-review of the more than 20,000 abstracts, to find those which dealt with
Building a taxonomy of responsiveness
Several articles concerning responsiveness have discussed different aspects of the nature of the change being studied and how they relate to interpretation of resultant statistics. Three groups of articles were identified: those that discussed individual-level versus group-level assessment of change 13, 18, 35, 41, 45, 46, 47; those that considered the contrast of between-person difference versus within-person change 13, 14, 34, 40, 48, 49, 50; and those that addressed different types of change
Discussion
This article has revealed the need to consider responsiveness a context-specific attribute, where the nature of the change designed into the study partially defines that context. It has also created a taxonomy of responsiveness to define the nature of the change in the study. We suggest that this proposed taxonomy reconciles many of the debates and discussions in the literature 14, 21, 36, 46, 101 by locating the nature of change (category of change) within a matrix defined by three axes: Who
Acknowledgements
The authors thank Dr. Matthew Murawski for sharing the results of his literature review on responsiveness up to 1994 and Dr. Valerie Tarasuk for her significant contribution to the writing and review of this manuscript. Dr. Beaton was supported by a PhD fellowship (health research) from the Medical Research Council of Canada and by the Institute for Work & Health while this research was done. Dr. Wright is the R.B. Salter Chair of Surgical Research and a Medical Research Council of Canada
References (121)
- et al.
Measuring change over timeassessing the usefulness of evaluative instruments
J Chronic Dis
(1987) - et al.
Assessing the responsiveness of a functional status measurethe Sickness Impact Profile versus the SIP68
J Clin Epidemiol
(1997) - et al.
Evaluating changes in health statusreliability and responsiveness of five generic health status measures in workers with musculoskeletal disorders
J Clin Epidemiol
(1997) - et al.
On the debate over methods for estimating the clinically important difference
J Clin Epidemiol
(1996) Issues in the use of change scores in randomized trials
J Clin Epidemiol
(1989)- et al.
Methodological problems in the retrospective computation of responsiveness to changethe lesson of Cronbach
J Clin Epidemiol
(1997) The ethics of attributionthe case of health care outcome indicators
Soc Sci Med
(1998)Evaluating the efficacy of medical treatmentpossibilities and limitations
Soc Sci Med
(1995)- et al.
Can there be a more patient-centred approach to determining clinically important effect sizes for randomized treatment trials?
J Clin Epidemiol
(1994) The minimal important differencewho's to say what is important?
J Clin Epidemiol
(1996)
Assessing the minimal important difference in symptomsa comparison of two techniques
J Clin Epidemiol
Classification systems of soft tissue disorders of the neck and upper limbdo they satisfy methodological guidelines?
J Clin Epidemiol
Assessing quality of life in clinical researchfrom where have we come and where are we going?
J Clin Epidemiol
Determining a minimal important change in a disease-specific quality of life questionnaire
J Clin Epidemiol
Measurement of health status. Ascertaining the minimal clinically important difference
Control Clin Trial
Can the sickness impact profile measure change? An example of scale assessment
J Chronic Dis
Assessing the responsiveness of functional scales to clinical changean analogy to diagnostic test performance
J Chronic Dis
How should health status measures be assessed? Cautionary notes on procrustean frameworks
J Clin Epidemiol
Twentieth century paradigms that threaten both scientific and humane medicine in the twenty-first century
J Clin Epidemiol
The evaluation of changes in functional health status in patients with abdominal complaints
J Clin Epidemiol
The more things change
J Clin Epidemiol
A method of assessing change in a single subjectan alteration of the RC index
Behav Ther
Assessing smallest detectable change over time in continuous structural outcome measuresapplication to radiological change in knee osteoarthritis
J Clin Epidemiol
Psychotherapy outcome researchmethods for reporting variability and evaluating clinical significance
Beh Ther
Further evidence supporting standard error of measurement based criterion for identifying meaningful intra-individual change in health-related quality of life
J Clin Epidemiol
Interpretation of change scores in ordinal clinical scales and health status measuresthe whole may not be equal to the sum of the parts
J Clin Epidemiol
Surveying physicians to determine the minimal important differenceimplications for sample-size calculation
J Clin Epidemiol
Assessing the reliability and responsiveness of five shoulder questionnaires
J Shoulder Elbow Surg
Psychometric consideration in evaluating health-related quality of life measures
Qual Life Res
A comparison of different indices of responsiveness
J Clin Epidemiol
Things I have learned so far
Am Psychologist
Effect sizes for interpreting changes in health status
Med Care
Comparative measurement sensitivity of short and longer health status instruments
Med Care
On the generalizability of statistical expressions of health related quality of life instrument responsivenessa data synthesis
Qual Life Res
Statistical power analysis for the behavioral sciences
Assessing the need for health status measures
J Epidemiol Comm Health
Strategies for improving and expanding the application of health status measures in clinical settingsa researcher-developer viewpoint
Med Care
The study of change in evaluation researchprinciples concerning measurement, experimental design and analysis
The problem of quality of life in medicine
JAMA
Health status, quality of life, and the individual
JAMA
Measurement scales and statisticsa clash of paradigms
Psychol Bull
Development standards for health measures
J Health Serv Res Policy
Methods for quality of life studies
Annu Rev Public Health
How we should measure “change”—or should we?
Psychol Bull
Confidence intervals assess both clinical significance and statistical significance
Ann Intern Med
Evaluating measurement responsiveness
J Rheumatol
Clinical importance, statistical significance and the assessment of economic and quality-of-life outcomes
Health Economics
Strategies for improving and expanding the application of health status measures in clinical settingsa researcher-developer viewpoint
Med Care
Cited by (413)
Minimal important change and difference in health outcome: An overview of approaches, concepts, and methods
2024, Osteoarthritis and CartilageExamination of Dry Needling Dose and Effect Duration for Individuals With Chronic Ankle Instability
2023, Journal of Manipulative and Physiological TherapeuticsPsychiatric risks for worsened mental health after psychedelic use
2024, Journal of Psychopharmacology