Accounting for control mislabeling in case-control biomarker studies

Mattias Rantalainen; Chris C Holmes

doi:10.1021/pr200507b

Accounting for control mislabeling in case-control biomarker studies

J Proteome Res. 2011 Dec 2;10(12):5562-7. doi: 10.1021/pr200507b. Epub 2011 Nov 8.

Authors

Mattias Rantalainen¹, Chris C Holmes

Affiliation

¹ Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, United Kingdom.

Abstract

In biomarker discovery studies, uncertainty associated with case and control labels is often overlooked. By omitting to take into account label uncertainty, model parameters and the predictive risk can become biased, sometimes severely. The most common situation is when the control set contains an unknown number of undiagnosed, or future, cases. This has a marked impact in situations where the model needs to be well-calibrated, e.g., when the prediction performance of a biomarker panel is evaluated. Failing to account for class label uncertainty may lead to underestimation of classification performance and bias in parameter estimates. This can further impact on meta-analysis for combining evidence from multiple studies. Using a simulation study, we outline how conventional statistical models can be modified to address class label uncertainty leading to well-calibrated prediction performance estimates and reduced bias in meta-analysis. We focus on the problem of mislabeled control subjects in case-control studies, i.e., when some of the control subjects are undiagnosed cases, although the procedures we report are generic. The uncertainty in control status is a particular situation common in biomarker discovery studies in the context of genomic and molecular epidemiology, where control subjects are commonly sampled from the general population with an established expected disease incidence rate.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Bias*
Biomarkers / analysis
Biomarkers / chemistry*
Case-Control Studies*
Computer Simulation
Humans
Logistic Models
Meta-Analysis as Topic
ROC Curve
Reproducibility of Results
Risk Factors
Uncertainty

Substances

Biomarkers

Abstract

Publication types

MeSH terms

Substances

Grants and funding