Methods for Missing and Auxiliary Data in Clinical Trials (Grant Cycle 1)

Led by: Marie Davidian, PhD

Randomized clinical trials are the primary mechanism by which new cancer therapies are tested for efficacy and evaluated for regulatory approval. The advent of novel biomarkers and emerging genomic technologies that may yield important new baseline predictors of primary clinical outcomes, the increasing emphasis on analyses of longitudinal progression of markers such as measures of quality of life, and the routine complications of missing information and subject drop-out, present both challenges and opportunities for the interpretation of these studies. This project is focused on the development of new methodological advances to exploit prognostic auxiliary information and provide frameworks for analyses in the presence of missing data that will affect notably the strength and impact of inferences possible from current cancer clinical trials.

This will be achieved through four aims:

Aim 1. Improving efficiency of inferences using auxiliary covariates.

That it is possible to improve efficiency of primary analyses of clinical trials by exploiting prognostic baseline auxiliary information is well known; however, such analyses are controversial because of the temptation to choose the analysis that leads to the most dramatic treatment effect. New methods for such "covariate adjustment" that can circumvent this issue and improve over existing approaches are being studied and refined. Regression modeling of the relationship between outcome and auxiliary covariates is the foundation of such adjustment, and a major focus is to establish the most appropriate approaches for developing such models and for accounting for the uncertainty inherent in such model selection. Providing methods that can be used when key auxiliary information is missing for some subjects is another objective.

Aim 2. Improving efficiency of inferences and longitudinal analyses in the presence of drop-out.

The methods for covariate adjustment to improve efficiency are being extended to the situation where some subjects drop out of a trial prior to ascertainment of the clinical outcome of interest. Efficient methods for longitudinal analysis of measures such as quality of life and biomarkers in the presence of drop-out will also be developed that provide protection against incorrect statistical modeling assumptions made in these analyses.

Aim 3. Diagnostic measures for joint models for longitudinal and survival data in the presence of non-ignorably missing data.

Cancer trials may involve studies of the association between longitudinal markers and clinical outcomes such as relapse-free survival or death, and a popular framework for analysis is that of joint models for the longitudinal data and time-to-event outcome. Methods for assessing the correctness of so-called joint statistical models used for this purpose and for assessing the influence of particular observations on the fit of the model, where the data used to develop the model may be missing, are being developed and studied.

Aim 4. Inference methods for sensitivity analyses of missing data.

Taking appropriate account of missing data sometimes requires unverifiable assumptions about why the data are missing, which are incorporated in models that thus cannot be checked based on the data. Thus, analyses may be predicated on incorrect such models, leading to misleading inferences. A popular strategy in practice is to undertake a sensitivity analysis in which one inspects how inferences vary across multiple competing such postulated models. However, it is not clear how to synthesize formally the results across models. Rigorous inferential methods for this purpose are being developed that explore simultaneously a range of plausible models in order to formalize evaluation of sensitivity of inferences.