Biomarker evaluation under imperfect nested case-control design

Xuan Wang; Yingye Zheng; Majken Karoline Jensen; Zeling He; Tianxi Cai

doi:10.1002/sim.9012

Biomarker evaluation under imperfect nested case-control design

Stat Med. 2021 Aug 15;40(18):4035-4052. doi: 10.1002/sim.9012. Epub 2021 Apr 29.

Authors

Xuan Wang¹, Yingye Zheng², Majken Karoline Jensen³, Zeling He¹, Tianxi Cai^{1

4}

Affiliations

¹ Department of Biostatistics, Harvard University, Boston, Massachusetts, USA.
² Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.
³ Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
⁴ Department of Biomedical Informatics, Harvard University, Boston, Massachusetts, USA.

Abstract

The nested case-control (NCC) design has been widely adopted as a cost-effective sampling design for biomarker research. Under the NCC design, markers are only measured for the NCC subcohort consisting of all cases and a fraction of the controls selected randomly from the matched risk sets of the cases. Robust methods for evaluating prediction performance of risk models have been derived under the inverse probability weighting framework. The probabilities of samples being included in the NCC cohort can be calculated based on the study design ``a previous study'' or estimated non-parametrically ``a previous study''. Neither strategy works well due to model mis-specification and the curse of dimensionality in practical settings where the sampling does not entirely follow the study design or depends on many factors. In this paper, we propose an alternative strategy to estimate the sampling probabilities based on a varying coefficient model, which attains a balance between robustness and the curse of dimensionality. The complex correlation structure induced by repeated finite risk set sampling makes the standard resampling procedure for variance estimation fail. We propose a perturbation resampling procedure that provides valid interval estimation for the proposed estimators. Simulation studies show that the proposed method performs well in finite samples. We apply the proposed method to the Nurses' Health Study II to develop and evaluate prediction models using clinical biomarkers for cardiovascular risk.

Keywords: finite population sampling; inverse probability weighting; nonparametric smoothing; resampling; risk prediction.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Biomarkers
Case-Control Studies*
Cohort Studies
Epidemiologic Studies
Humans
Probability

Substances

Biomarkers

Abstract

Publication types

MeSH terms

Substances

Grants and funding