The reporting of p values, confidence intervals and statistical significance in Preventive Veterinary Medicine (1997-2017)

PeerJ. 2021 Nov 24:9:e12453. doi: 10.7717/peerj.12453. eCollection 2021.

Abstract

Background: Despite much discussion in the epidemiologic literature surrounding the use of null hypothesis significance testing (NHST) for inferences, the reporting practices of veterinary researchers have not been examined. We conducted a survey of articles published in Preventive Veterinary Medicine, a leading veterinary epidemiology journal, aimed at (a) estimating the frequency of reporting p values, confidence intervals and statistical significance between 1997 and 2017, (b) determining whether this varies by article section and (c) determining whether this varies over time.

Methods: We used systematic cluster sampling to select 985 original research articles from issues published in March, June, September and December of each year of the study period. Using the survey data analysis menu in Stata, we estimated overall and yearly proportions of article sections (abstracts, results-texts, results-tables and discussions) reporting p values, confidence intervals and statistical significance. Additionally, we estimated the proportion of p values less than 0.05 reported in each section, the proportion of article sections in which p values were reported as inequalities, and the proportion of article sections in which confidence intervals were interpreted as if they were significance tests. Finally, we used Generalised Estimating Equations to estimate prevalence odds ratios and 95% confidence intervals, comparing the occurrence of each of the above-mentioned reporting elements in one article section relative to another.

Results: Over the 20-year period, for every 100 published manuscripts, 31 abstracts (95% CI [28-35]), 65 results-texts (95% CI [61-68]), 23 sets of results-tables (95% CI [20-27]) and 59 discussion sections (95% CI [56-63]) reported statistical significance at least once. Only in the case of results-tables, were the numbers reporting p values (48; 95% CI [44-51]), and confidence intervals (44; 95% CI [41-48]) higher than those reporting statistical significance. We also found that a substantial proportion of p values were reported as inequalities and most were less than 0.05. The odds of a p value being less than 0.05 (OR = 4.5; 95% CI [2.3-9.0]) or being reported as an inequality (OR = 3.2; 95% CI [1.3-7.6]) was higher in the abstracts than in the results-texts. Additionally, when confidence intervals were interpreted, on most occasions they were used as surrogates for significance tests. Overall, no time trends in reporting were observed for any of the three reporting elements over the study period.

Conclusions: Despite the availability of superior approaches to statistical inference and abundant criticism of its use in the epidemiologic literature, NHST is substantially the most common means of inference in articles published in Preventive Veterinary Medicine. This pattern has not changed substantially between 1997 and 2017.

Keywords: Confidence intervals; NHST; Null hypothesis significance testing; Statistical significance; Veterinary epidemiology; Veterinary medicine; p Values.

Grants and funding

The authors received no funding for this work.