Improving false discovery rate estimation

Bioinformatics. 2004 Jul 22;20(11):1737-45. doi: 10.1093/bioinformatics/bth160. Epub 2004 Feb 26.

Abstract

Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR). However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the q-value as an estimate of the proportion of false discoveries among a set of significant findings. However, such an interpretation of the q-value may be unwarranted considering that the q-value is based on an unstable estimator of the positive FDR (pFDR). Another method proposes estimating the FDR by modeling p-values as arising from a beta-uniform mixture (BUM) distribution. Unfortunately, the BUM approach is reliable only in settings where the assumed model accurately represents the actual distribution of p-values.

Methods: A method called the spacings LOESS histogram (SPLOSH) is proposed for estimating the conditional FDR (cFDR), the expected proportion of false positives conditioned on having k 'significant' findings. SPLOSH is designed to be more stable than the q-value and applicable in a wider variety of settings than BUM.

Results: In a simulation study and data analysis example, SPLOSH exhibits the desired characteristics relative to the q-value and BUM.

Availability: The Web site www.stjuderesearch.org/statistics/splosh.html has links to freely available S-plus code to implement the proposed procedure.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Benchmarking / methods
  • Computer Simulation
  • False Positive Reactions*
  • Gene Expression Profiling / methods*
  • Gene Expression Profiling / standards
  • Models, Genetic
  • Models, Statistical*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Oligonucleotide Array Sequence Analysis / standards
  • Quality Control
  • Reproducibility of Results
  • Sensitivity and Specificity