Two-stage analysis for selecting fixed numbers of features in omics association studies

Stat Med. 2019 Jul 20;38(16):2956-2971. doi: 10.1002/sim.8150. Epub 2019 Mar 31.

Abstract

One of main roles of omics-based association studies with high-throughput technologies is to screen out relevant molecular features, such as genetic variants, genes, and proteins, from a large pool of such candidate features based on their associations with the phenotype of interest. Typically, screened features are subject to validation studies using more established or conventional assays, where the number of evaluable features is relatively limited, so that there may exist a fixed number of features measurable by these assays. Such a limitation necessitates narrowing a feature set down to a fixed size, following an initial screening analysis via multiple testing where adjustment for multiplicity is made. We propose a two-stage screening approach to control the false discovery rate (FDR) for a feature set with fixed size that is subject to validation studies, rather than for a feature set from the initial screening analysis. Out of the feature set selected in the first stage with a relaxed FDR level, a fraction of features with most statistical significance is firstly selected. For the remaining feature set, features are selected based on biological consideration only, without regard to any statistical information, which allows evaluating the FDR level for the finally selected feature set with fixed size. Improvement of the power is discussed in the proposed two-stage screening approach. Simulation experiments based on parametric models and real microarray datasets demonstrated substantial increment in the number of screened features for biological consideration compared with the standard screening approach, allowing for more extensive and in-depth biological investigations in omics association studies.

Keywords: biological and statistical significance; false discovery rate; feature screening; omics association studies; two-stage methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Biometry / methods*
  • Computer Simulation
  • Data Interpretation, Statistical
  • Early Detection of Cancer
  • False Positive Reactions*
  • Genetic Testing
  • Humans
  • Microarray Analysis
  • Models, Genetic
  • Phenotype