Family data are useful for estimating disease risk in carriers of specific genotypes of a given gene (penetrance). Penetrance is frequently estimated assuming that relatives' phenotypes are independent, given their genotypes for the gene of interest. This assumption is unrealistic when multiple shared risk factors contribute to disease risk. In this setting, the phenotypes of relatives are correlated even after adjustment for the genotypes of any one gene (residual correlation). Many methods have been proposed to address this problem, but their performance has not been evaluated systematically. In simulations we generated genotypes for a rare (frequency 0.35%) allele of moderate penetrance, and a common (frequency 15%) allele of low penetrance, and then generated correlated disease survival times using the Clayton-Oakes copula model. We ascertained families using both population and clinic designs. We then compared the estimates of several methods to the optimal ones obtained from the model used to generate the data. We found that penetrance estimates for common low-risk genotypes were more robust to model misspecification than those for rare, moderate-risk genotypes. For the latter, penetrance estimates obtained ignoring residual disease correlation had large biases. Also biased were estimates based only on families that segregate the risk allele. In contrast, a method for accommodating phenotype correlation by assuming the presence of genetic heterogeneity performed nearly optimally, even when the survival data were coded as binary outcomes. We conclude that penetrance estimates that accommodate residual phenotype correlation (even only approximately) outperform those that ignore it, and that coding censored survival outcomes as binary does not substantially increase the mean-square error of the estimates, provided the censoring is not extensive.
(c) 2010 Wiley-Liss, Inc.