Presented at the 2019 ASA Symposium on Data Science and Statistics in Bellevue, Washington on May 31, 2019.
Background: Defining baseline characteristics for covariate-adjusted analyses to increase study power is not new. Multifactorial heterogeneous diseases including Amyotrophic Lateral Sclerosis (ALS), Alzheimer’s Disease (AD), Parkinson’s Disease (PD), and Huntington’s Disease (HD) present a challenge in defining baseline covariates that add substantial benefit to study power. We developed a methodology for training machine-learning (ML) models that utilizes historical clinical trial patient data to provide a single prediction value to be used as a covariate in a trial’s statistical analysis. We have adapted this methodology across disease areas and have developed a rigorous audit methodology based on best practices in the biostatistics field so that these new methods can be more easily shared across a field where rigorous vetting of new technologies is critical to adoption.
Objectives: To demonstrate through clinical trial simulation:
- A methodology for adopting rigorous methods for analysis dataset preparation for ML modeling
- A practical application of ML models to traditional biostatistical analysis
- A scalable approach that is applicable to multiple heterogeneous disease areas in which a suitable covariate is lacking
Authors: Albert A. Taylor, Danielle Beaulieu, Dustin Pierce, Andrew Conklin, Jonavelle Cuerdo, Mike Keymer, David L. Ennist