Handling missing data issues in clinical trials for rheumatic diseases
Introduction
In the last few decades, statistical methodology for analyzing missing data has advanced considerably. Despite recent advances in the statistical analyses of missing data, data from many clinical trials in rheumatic diseases continue to be analyzed using elementary methods for handling missingness. The simplest and most overused method of dealing with missing data is the intention to treat (ITT) analysis with the last observation carried forward (LOCF). Reliance upon LOCF is typically justified by its supposedly conservative statistical properties; however, this justification has been recently called into question (see Section 4.1). Because of the ubiquity of missing data problems and their potential impact on the conclusions in clinical trials, there is increasing interest in other disciplines, ranging from clinical epidemiology to political science, to address missing data issues in statistical inference; see, for example, Heitjian [1], King et al. [2], Gadbury et al. [3], Donders et al. [4] and Belin [5].
The main aim of this paper is to inform researchers in rheumatic diseases on a variety of modern statistical concepts and methods for handling missing data in clinical trials, including the assumptions that are required for each method to be valid and the pros and cons of each method. Throughout we use data from patients with diffuse cutaneous systemic sclerosis (dcSSc) in a clinical trial to illustrate the concepts and demonstrate the analysis techniques. We also make an effort to include a bibliography of recent expository papers on handling missing data issues from statistical journals.
The second aim is to describe software for analyzing different types of missing data. Data may be missing in ways that may or may not depend on observed and/or unobserved data. When the missingness depends on the unobserved data, it is harder to perform valid analysis and there is virtually no standard statistical program for analyzing such data.
In the next section, we briefly describe the two-arm multi-center randomized clinical trial using patients with dcSSc. In Section 3 we review different types of missing data and how we may distinguish one from the other in practice. Section 4 presents common statistical methods for handling missing data, and in Section 5, we demonstrate many of the analyses for the dcSSc clinical trial data. We compare the results from the various analytical strategies and show how inconsistent results may be resolved. Conclusions are offered in Section 6.
Section snippets
The bovine collagen trial
Throughout the manuscript we refer to data from a recently concluded NIH-sponsored randomized oral bovine collagen trial for dcSSc patients [6] to illustrate the differences in results that can arise using different analytic approaches, although the actual trial had pre-specified analyses.
A total of 168 eligible patients with dcSSc were enrolled in this multi-center phase II double-blind placebo controlled trial. Patients were randomized to receive oral native collagen at a dose of 500 μg/day or
Types of missing data and missing data mechanisms
Each statistical method for analyzing missing data has its assumptions, weaknesses and strengths. It is important to use an appropriate method because different methods and the amount of missing values in the data sets may lead to different statistical inferences. For example, Barzi and Woodward [7] studied missing data for serum cholesterol in 28 cohort studies and reported that if fewer than 10% of values were missing, many of the commonly used methods resulted in similar conclusions. If
Methods for handling of missing data
In this section, we review different ways of handling missing values and methods of analysis. Many of these methods are widely used in the statistical literature and increasingly in other disciplines as well.
Analysis of the bovine collagen trial data
We now consider both determination of the missingness structure and the application of a range of analysis techniques to the bovine collagen data described in Section 2. The easiest way to determine if the assumptions of MCAR hold is to graph certain features of the data. For example, in the bovine collagen clinical trial, if data were MCAR, one would expect that the mean trajectory of MRSS of patients who remained in the study throughout the trial (completers) would be similar to patients who
Summary
In this article, we have described missing data issues, statistical terminology and data analytical strategies for analyzing a dcSSc clinical trial with missing values. It should be clear that missingness greatly complicates data analysis. The old adage to collect data assiduously is worth repeating here because it is the best and most powerful defense for dealing with missing data. At a minimum, this means that all patients need to be followed to end-point irrespective of whether they are on
Acknowledgements
All authors are partly supported by a Senior Investigator Grant Award from the Scleroderma Foundation in 2006 and 2007. We gratefully thank the Scleroderma Foundation for the award.
References (68)
- et al.
Review: a gentle introduction to imputation of missing values
J Clin Epidemiol
(2006) Missing data: what a little can do, and what researchers can do in response
Am J Ophthalmol
(2009)Ruminations on the intent-to-treat principle
Control Clin Trials
(2000)- et al.
A reanalysis of a longitudinal Scleroderma clinical trial using informative dropout models
J Stat Plann Infer
(2007) What can be done about missing data? Approaches to imputation
Am J Public Health
(1997)- et al.
Analyzing incomplete political science data: an alternative algorithm for multiple imputation
Am Politicial Sci Rev
(2001) - et al.
Modern statistical methods for handling missing repeated measurements in obesity trial data: beyond LOCF
Obes Rev
(2003) - et al.
A Multicenter, randomized, double-blind, placebo-controlled trial of oral type I collagen treatment in patients with diffuse cutaneous systemic sclerosis: I. Oral type I collagen does not improve skin in all patients, but may improve skin in late-phase disease
Arthritis Rheum
(2008) - et al.
Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies
Am J Epidemiol
(2004) - et al.
Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models
Am Stat
(2007)
Inference and missing data
Biometrika
Distinguishing “Missing at random” and “missing completely at random”
Am Stat
Can one assess whether missing data are missing at random in medical studies?
Stat Meth Med Res
Testing for random dropouts in repeated measurement data
Biometrics
Testing for random drop-outs in repeated measurement data
Biometrics
Last observation carry-forward and last observation analysis (Letter to the Editor)
Stat Med
What is meant by intention to treat analysis? Survey of published randomized controlled trials
Br Med J
Intention-to-treat principle
Can Med Assoc J
Intent-to-treat analysis for longitudinal studies with drop-outs
Biometrics
The application of the principle of intention-to-treat to the analysis of clinical trials
Drug Inf J
Intention to treat: who should use it?
Br J Cancer
Analyzing incomplete longitudinal clinical trial data
Biostatistics
Last observation analysis in ANOVA and ANCOVA
Stat Sin
Last observation carry-forward and last observation analysis
Stat Med
Size of the treatment effect on cognition of cholinesterase inhibition in Alzheimer's disease
J Neurol Neurosurg Psychiatry
A review of hot deck imputation for survey non-response
Int Stat Rev
Multiple imputation of missing blood pressure covariates in survival analysis
Stat Med
Statistical analysis with missing data
Analysis of incomplete multivariate data
Multiple imputation: a primer
Stat Meth Med Res
Missing data in clinical studies
Multiple imputation in health-care databases: an overview and some applications
Stat Med
Using multiple imputation to incorporate cases with missing items in a mental health services study
Multiple imputation in practice: comparison of software packages for regression models with missing variables
Am Stat
Cited by (17)
A comparative study of patient-reported outcomes after contemporary radiation techniques for prostate cancer: QOL comparison of modern radiotherapy techiques
2022, Radiotherapy and OncologyCitation Excerpt :The SABR cohort had a slightly lower compliance rate (84.3%) compared to HDR boost (86.6%) and HDR mono (89.6%). This was due to a printing error of the questionnaire, hence MCAR [36]. There was no significant difference seen in the baseline characteristics between patients with or without available EPIC domain scores at baseline and 12 months (Appendix-C.3).
Major applications of data mining in medical
2022, Materials Today: ProceedingsCitation Excerpt :Today, cancer has become devastating and is a threat to our lives. Experts have introduced many useful methods to diagnose the disease at earlier stage and suggest its preventive measures [9]. Healthcare experts have introduced many strategies to detect the infectious diseases before its treatment.
Preliminary Report on the Role of Dry Needling Versus Corticosteroid Injection, an Effective Treatment Method for Plantar Fasciitis: A Randomized Controlled Trial
2019, Journal of Foot and Ankle SurgeryCitation Excerpt :Another limitation is the LOCF method, which was used for handling missing data from 6 patients (12.2%). Although it may impart a bias, we believed that this method would be less problematic than simply omitting the data (18). This study should lead physicians to consider needling.
Why Big Data and What Is It? Basic to Advanced Big Data Journey for the Medical Industry
2019, Internet of Things in Biomedical EngineeringQuality of life changes after stereotactic ablative radiotherapy for liver metastases: A prospective cohort analysis
2018, Radiotherapy and Oncology