How large should the next study be? Predictive power and sample size requirements for replication studies

Stat Med. 2022 Jul 20;41(16):3090-3101. doi: 10.1002/sim.9406. Epub 2022 Apr 8.

Abstract

We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given its observed (two-sided) P$$ P $$ -value. We find that an exact replication of a marginally significant result with P=.05$$ P=.05 $$ has less than 30% chance of again reaching significance. Moreover, the replication of a result with P=.005$$ P=.005 $$ still has only 50% chance of significance. We also compute the probability that the direction (sign) of the estimated effect is correct, which is closely related to the type S error of Gelman and Tuerlinckx. We find that if an estimated effect has P=.05$$ P=.05 $$ , there is a 93% probability that its sign is correct. If P=.005$$ P=.005 $$ , then that probability is 99%. Finally, we compute the required sample size for a replication study to achieve some specified power conditional on the p$$ p $$ -value of the original study. We find that the replication of a result with P=.05$$ P=.05 $$ requires a sample size more than 16 times larger than the original study to achieve 80% power, while P=.005$$ P=.005 $$ requires at least 3.5 times larger sample size. These findings confirm that failure to replicate the statistical significance of a trial does not necessarily indicate that the original result was a fluke.

Keywords: Cochrane Review; actual power; clinical trial; predictive power; type S error.

MeSH terms

  • Humans
  • Probability
  • Research Design*
  • Sample Size*
  • Statistics as Topic
  • Systematic Reviews as Topic