Handling missing data issues in clinical trials for rheumatic diseases

https://doi.org/10.1016/j.cct.2010.09.001Get rights and content

Abstract

Missing data are ubiquitous in clinical trials for rheumatic diseases, and it is important to accommodate them using appropriate statistical techniques. We review some of the basic considerations for missing data and survey a range of statistical techniques for analysis of longitudinal clinical trial data with missingness. Using clinical trial data from patients with diffuse systemic sclerosis, we show that different approaches to handling missing data can lead to different conclusions on the efficacy of the treatment. We then suggest how such discrepancies might be addressed. In particular, we emphasize that the commonly used method in rheumatic clinical trials of carrying the last observation forward to impute missing values should not be the primary analysis. We review software for analyzing different types of missing data and discuss our freely available software library for analyzing the more difficult but more realistic situation when the probability of dropout or missing data may depend on the unobserved missing value.

Introduction

In the last few decades, statistical methodology for analyzing missing data has advanced considerably. Despite recent advances in the statistical analyses of missing data, data from many clinical trials in rheumatic diseases continue to be analyzed using elementary methods for handling missingness. The simplest and most overused method of dealing with missing data is the intention to treat (ITT) analysis with the last observation carried forward (LOCF). Reliance upon LOCF is typically justified by its supposedly conservative statistical properties; however, this justification has been recently called into question (see Section 4.1). Because of the ubiquity of missing data problems and their potential impact on the conclusions in clinical trials, there is increasing interest in other disciplines, ranging from clinical epidemiology to political science, to address missing data issues in statistical inference; see, for example, Heitjian [1], King et al. [2], Gadbury et al. [3], Donders et al. [4] and Belin [5].

The main aim of this paper is to inform researchers in rheumatic diseases on a variety of modern statistical concepts and methods for handling missing data in clinical trials, including the assumptions that are required for each method to be valid and the pros and cons of each method. Throughout we use data from patients with diffuse cutaneous systemic sclerosis (dcSSc) in a clinical trial to illustrate the concepts and demonstrate the analysis techniques. We also make an effort to include a bibliography of recent expository papers on handling missing data issues from statistical journals.

The second aim is to describe software for analyzing different types of missing data. Data may be missing in ways that may or may not depend on observed and/or unobserved data. When the missingness depends on the unobserved data, it is harder to perform valid analysis and there is virtually no standard statistical program for analyzing such data.

In the next section, we briefly describe the two-arm multi-center randomized clinical trial using patients with dcSSc. In Section 3 we review different types of missing data and how we may distinguish one from the other in practice. Section 4 presents common statistical methods for handling missing data, and in Section 5, we demonstrate many of the analyses for the dcSSc clinical trial data. We compare the results from the various analytical strategies and show how inconsistent results may be resolved. Conclusions are offered in Section 6.

Section snippets

The bovine collagen trial

Throughout the manuscript we refer to data from a recently concluded NIH-sponsored randomized oral bovine collagen trial for dcSSc patients [6] to illustrate the differences in results that can arise using different analytic approaches, although the actual trial had pre-specified analyses.

A total of 168 eligible patients with dcSSc were enrolled in this multi-center phase II double-blind placebo controlled trial. Patients were randomized to receive oral native collagen at a dose of 500 μg/day or

Types of missing data and missing data mechanisms

Each statistical method for analyzing missing data has its assumptions, weaknesses and strengths. It is important to use an appropriate method because different methods and the amount of missing values in the data sets may lead to different statistical inferences. For example, Barzi and Woodward [7] studied missing data for serum cholesterol in 28 cohort studies and reported that if fewer than 10% of values were missing, many of the commonly used methods resulted in similar conclusions. If

Methods for handling of missing data

In this section, we review different ways of handling missing values and methods of analysis. Many of these methods are widely used in the statistical literature and increasingly in other disciplines as well.

Analysis of the bovine collagen trial data

We now consider both determination of the missingness structure and the application of a range of analysis techniques to the bovine collagen data described in Section 2. The easiest way to determine if the assumptions of MCAR hold is to graph certain features of the data. For example, in the bovine collagen clinical trial, if data were MCAR, one would expect that the mean trajectory of MRSS of patients who remained in the study throughout the trial (completers) would be similar to patients who

Summary

In this article, we have described missing data issues, statistical terminology and data analytical strategies for analyzing a dcSSc clinical trial with missing values. It should be clear that missingness greatly complicates data analysis. The old adage to collect data assiduously is worth repeating here because it is the best and most powerful defense for dealing with missing data. At a minimum, this means that all patients need to be followed to end-point irrespective of whether they are on

Acknowledgements

All authors are partly supported by a Senior Investigator Grant Award from the Scleroderma Foundation in 2006 and 2007. We gratefully thank the Scleroderma Foundation for the award.

References (68)

  • A.R. Donders et al.

    Review: a gentle introduction to imputation of missing values

    J Clin Epidemiol

    (2006)
  • T.R. Belin

    Missing data: what a little can do, and what researchers can do in response

    Am J Ophthalmol

    (2009)
  • C.B. Begg

    Ruminations on the intent-to-treat principle

    Control Clin Trials

    (2000)
  • W.J. Boscardin et al.

    A reanalysis of a longitudinal Scleroderma clinical trial using informative dropout models

    J Stat Plann Infer

    (2007)
  • D.F. Heitjian

    What can be done about missing data? Approaches to imputation

    Am J Public Health

    (1997)
  • G. King et al.

    Analyzing incomplete political science data: an alternative algorithm for multiple imputation

    Am Politicial Sci Rev

    (2001)
  • G.L. Gadbury et al.

    Modern statistical methods for handling missing repeated measurements in obesity trial data: beyond LOCF

    Obes Rev

    (2003)
  • A.E. Postlethwaite et al.

    A Multicenter, randomized, double-blind, placebo-controlled trial of oral type I collagen treatment in patients with diffuse cutaneous systemic sclerosis: I. Oral type I collagen does not improve skin in all patients, but may improve skin in late-phase disease

    Arthritis Rheum

    (2008)
  • F. Barzi et al.

    Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies

    Am J Epidemiol

    (2004)
  • J.N. Horton et al.

    Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models

    Am Stat

    (2007)
  • D.B. Rubin

    Inference and missing data

    Biometrika

    (1976)
  • D.F. Heitjian et al.

    Distinguishing “Missing at random” and “missing completely at random”

    Am Stat

    (1996)
  • R.F. Potthoff et al.

    Can one assess whether missing data are missing at random in medical studies?

    Stat Meth Med Res

    (2006)
  • P.J. Diggle

    Testing for random dropouts in repeated measurement data

    Biometrics

    (1989)
  • M. Ridout

    Testing for random drop-outs in repeated measurement data

    Biometrics

    (1991)
  • J. Carpenter et al.

    Last observation carry-forward and last observation analysis (Letter to the Editor)

    Stat Med

    (2004)
  • S. Hollis et al.

    What is meant by intention to treat analysis? Survey of published randomized controlled trials

    Br Med J

    (1999)
  • V.M. Montori et al.

    Intention-to-treat principle

    Can Med Assoc J

    (2001)
  • R.J.A. Little et al.

    Intent-to-treat analysis for longitudinal studies with drop-outs

    Biometrics

    (1996)
  • D. Gilings et al.

    The application of the principle of intention-to-treat to the analysis of clinical trials

    Drug Inf J

    (1991)
  • J.A. Lewis et al.

    Intention to treat: who should use it?

    Br J Cancer

    (1993)
  • G. Molenberghs et al.

    Analyzing incomplete longitudinal clinical trial data

    Biostatistics

    (2004)
  • B. Cheng et al.

    Last observation analysis in ANOVA and ANCOVA

    Stat Sin

    (2005)
  • J. Shao et al.

    Last observation carry-forward and last observation analysis

    Stat Med

    (2004)
  • K. Rockwood

    Size of the treatment effect on cognition of cholinesterase inhibition in Alzheimer's disease

    J Neurol Neurosurg Psychiatry

    (2004)
  • R.R. Andridge et al.

    A review of hot deck imputation for survey non-response

    Int Stat Rev

    (2010)
  • S. van Buuren et al.

    Multiple imputation of missing blood pressure covariates in survival analysis

    Stat Med

    (1999)
  • R.J.A. Little et al.

    Statistical analysis with missing data

    (1987)
  • J.L. Schafer

    Analysis of incomplete multivariate data

    (1997)
  • J.L. Schafer

    Multiple imputation: a primer

    Stat Meth Med Res

    (1999)
  • G. Molenberghs et al.

    Missing data in clinical studies

    (2007)
  • D.B. Rubin et al.

    Multiple imputation in health-care databases: an overview and some applications

    Stat Med

    (1991)
  • T.R. Belin

    Using multiple imputation to incorporate cases with missing items in a mental health services study

  • N.J. Horton et al.

    Multiple imputation in practice: comparison of software packages for regression models with missing variables

    Am Stat

    (2001)
  • Cited by (17)

    • A comparative study of patient-reported outcomes after contemporary radiation techniques for prostate cancer: QOL comparison of modern radiotherapy techiques

      2022, Radiotherapy and Oncology
      Citation Excerpt :

      The SABR cohort had a slightly lower compliance rate (84.3%) compared to HDR boost (86.6%) and HDR mono (89.6%). This was due to a printing error of the questionnaire, hence MCAR [36]. There was no significant difference seen in the baseline characteristics between patients with or without available EPIC domain scores at baseline and 12 months (Appendix-C.3).

    • Major applications of data mining in medical

      2022, Materials Today: Proceedings
      Citation Excerpt :

      Today, cancer has become devastating and is a threat to our lives. Experts have introduced many useful methods to diagnose the disease at earlier stage and suggest its preventive measures [9]. Healthcare experts have introduced many strategies to detect the infectious diseases before its treatment.

    • Preliminary Report on the Role of Dry Needling Versus Corticosteroid Injection, an Effective Treatment Method for Plantar Fasciitis: A Randomized Controlled Trial

      2019, Journal of Foot and Ankle Surgery
      Citation Excerpt :

      Another limitation is the LOCF method, which was used for handling missing data from 6 patients (12.2%). Although it may impart a bias, we believed that this method would be less problematic than simply omitting the data (18). This study should lead physicians to consider needling.

    View all citing articles on Scopus
    View full text