We describe a hierarchical regression modeling approach to selection of a subset of markers from the first stage of a genomewide association scan to carry forward to subsequent stages for testing on an independent set of subjects. Rather than simply selecting a subset of most significant marker-disease associations at some cutoff chosen to maximize the cost efficiency of a multistage design, we propose a prior model for the true noncentrality parameters of these associations composed of a large mass at zero and a continuous distribution of nonzero values. The prior probability of nonzero values and their prior means can be functions of various covariates characterizing each marker, such as their location relative to genes or evolutionary conserved regions, or prior linkage or association data. We propose to take the top ranked posterior expectations of the noncentrality parameters for confirmation in later stages of a genomewide scan. The statistical performance of this approach is compared with the traditional p-value ranking by simulation studies. We show that the ranking by posterior expectations performs better at selecting the true positive association than a simple ranking of p-values if at least some of the prior covariates have predictive value.
(c) 2007 Wiley-Liss, Inc.