We use information derived from over 40K trials in the Cochrane Collaboration database of systematic reviews (CDSR) to compute the replication probability, or predictive power of an experiment given its observed (two-sided) -value. We find that an exact replication of a marginally significant result with has less than 30% chance of again reaching significance. Moreover, the replication of a result with still has only 50% chance of significance. We also compute the probability that the direction (sign) of the estimated effect is correct, which is closely related to the type S error of Gelman and Tuerlinckx. We find that if an estimated effect has , there is a 93% probability that its sign is correct. If , then that probability is 99%. Finally, we compute the required sample size for a replication study to achieve some specified power conditional on the -value of the original study. We find that the replication of a result with requires a sample size more than 16 times larger than the original study to achieve 80% power, while requires at least 3.5 times larger sample size. These findings confirm that failure to replicate the statistical significance of a trial does not necessarily indicate that the original result was a fluke.
Keywords: Cochrane Review; actual power; clinical trial; predictive power; type S error.
© 2022 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.