Missing data
From TrialTree Wiki
Missing data
Missing data in RCTs can compromise the integrity of trial results. It can introduce bias, reduce statistical power, and threaten the overall validity of the conclusions if not appropriately handled.
Impact of Missing Data
Missing data can bias results if its occurrence is related to treatment or outcomes, reduce the trial’s statistical power by decreasing the effective sample size, and undermine the validity of findings when not addressed with appropriate analytical methods.
Types of Missing Data
Understanding the mechanism behind missing data is critical to selecting the appropriate handling strategy. Data can be missing completely at random (MCAR), where the missingness is unrelated to any data, such as when records are lost due to technical error. In this case, results remain unbiased, but power is reduced.
Data can also be missing at random (MAR), where the missingness is related to observed data, but not to the missing data itself. For instance, older participants may be more likely to miss follow-up visits, even though their outcomes are similar to others. MAR can often be addressed using statistical techniques like multiple imputation.
The most challenging scenario is when data are missing not at random (MNAR), meaning the missingness depends on the unobserved values. For example, participants who experience worsening symptoms may drop out. MNAR poses a high risk of bias and usually requires sensitivity analyses to assess its potential impact.
Strategies to Prevent Missing Data
Designing trials to minimize participant burden, such as using shorter forms and fewer visits, can reduce missing data. Electronic data capture systems and automated reminders can also help ensure completeness. Retention strategies like regular communication, flexible scheduling, and remote follow-up further support complete data collection.
It is important to define how missing data will be handled in the Statistical Analysis Plan (SAP) before the trial begins. Using standardized case report forms (CRFs) and piloting procedures can enhance data completeness and consistency.
Methods for Handling Missing Data
Several approaches exist to handle missing data. Complete Case Analysis (CCA) involves analyzing only those participants with full data. It is straightforward but may lead to bias unless data are MCAR.
The Last Observation Carried Forward (LOCF) method fills in missing values with a participant's last recorded observation. Although it retains sample size, it assumes no change over time, which can introduce bias.
Mean or median imputation is easy to implement but reduces variability and underestimates standard errors. Multiple Imputation (MI), on the other hand, predicts missing values based on observed data and generates multiple datasets. It preserves variability and is appropriate under MAR.
Inverse Probability Weighting (IPW) adjusts for missing data by weighting complete cases according to their likelihood of being observed. This method also addresses MAR but relies on a well-specified model of missingness.
Mixed-effects models can analyze incomplete data by accounting for within-subject correlation and are effective under MAR. For MNAR, sensitivity analyses are required. These include worst-case scenario assumptions or pattern mixture models to evaluate how different assumptions about missing data affect conclusions.
Regulatory Guidance
Regulatory agencies emphasize proper planning and handling of missing data. The ICH E9 and E9(R1) guidelines advocate for pre-specified strategies. The FDA and EMA recommend including missing data methods in the Statistical Analysis Plan (SAP), and the CONSORT statement requires transparent reporting of missing data and how it was addressed.
Example Applications
In diabetes trials, multiple imputation is frequently used for missing outcomes like HbA1c. COVID-19 vaccine trials have relied on sensitivity analyses to account for dropout. In psychiatric studies, mixed-effects models have helped manage incomplete follow-up data.
Conclusion
Effective prevention and handling of missing data are essential for preserving the validity and credibility of RCT findings. A clear understanding of the underlying missing data mechanism, coupled with proactive trial design and appropriate analytical strategies, ensures that the results remain robust and interpretable.
Bibliography
- Little RJA, Rubin DB. Statistical Analysis with Missing Data. 3rd ed. Wiley; 2019. A foundational text covering theoretical and applied aspects of handling missing data.
- National Research Council. The Prevention and Treatment of Missing Data in Clinical Trials. Washington, DC: National Academies Press; 2010. Available from: https://www.ncbi.nlm.nih.gov/books/NBK209904/
- White IR, Horton NJ, Carpenter J, Pocock SJ. Strategy for intention to treat analysis in randomised trials with missing outcome data. BMJ. 2011;342:d40.
- Jakobsen JC, Gluud C, Wetterslev J, Winkel P. When and how should multiple imputation be used for handling missing data in randomised clinical trials – a practical guide with flowcharts. BMC Medical Research Methodology. 2017;17:162.
- Sterne JAC, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.
Adapted for educational use. Please cite relevant trial methodology sources when using this material in research or teaching.