In the era of big data, increasing availability of data makes combining different data sources to obtain more accurate estimations a popular topic. However, the development of data integration is often hindered by the heterogeneity in data forms across studies. In this paper, we focus on a case in survival analysis where we have primary study data with a continuous time-to-event outcome and complete covariate measurements, while the data from an external study contain an outcome observed at regular intervals, and only a subset of covariates is measured. To incorporate external information while accounting for the different data forms, we posit working models and obtain informative weights by empirical likelihood, which will be used to construct a weighted estimator in the main analysis. We have established the theory demonstrating that the new estimator has higher estimation efficiency compared to the conventional ones, and this advantage is robust to working model misspecification, as confirmed in our simulation studies. To assess its utility, we apply our method to accommodate data from the National Alzheimer's Coordinating Center to improve the analysis of the Alzheimer's Disease Neuroimaging Initiative Phase 1 study.
Keywords: data integration; dementia and Alzheimer’s disease; empirical likelihood; interval censoring; survival analysis.
© The Author(s) 2025. Published by Oxford University Press on behalf of The International Biometric Society.