Diagnostic accuracy and inter-observer reliability of the O-RADS scoring system among staff radiologists in a North American academic clinical setting

Yeli Pi; Mitchell P Wilson; Prayash Katlariwala; Medica Sam; Thomas Ackerman; Lee Paskar; Vimal Patel; Gavin Low

doi:10.1007/s00261-021-03193-7

Diagnostic accuracy and inter-observer reliability of the O-RADS scoring system among staff radiologists in a North American academic clinical setting

Abdom Radiol (NY). 2021 Oct;46(10):4967-4973. doi: 10.1007/s00261-021-03193-7. Epub 2021 Jun 29.

Authors

Yeli Pi¹, Mitchell P Wilson², Prayash Katlariwala², Medica Sam², Thomas Ackerman², Lee Paskar², Vimal Patel², Gavin Low²

Affiliations

¹ Department of Radiology and Diagnostic Imaging, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada. pi@uaberta.ca.
² Department of Radiology and Diagnostic Imaging, Faculty of Medicine and Dentistry, University of Alberta, Edmonton, AB, Canada.

PMID: 34185128
DOI: 10.1007/s00261-021-03193-7

Abstract

Purpose: The objective of this study is to evaluate the diagnostic accuracy, interobserver variability, and common lexicon pitfalls of the ACR O-RADS scoring system among staff radiologists without prior experience to O-RADS.

Materials and methods: After independent review of the ACR O-RADS publications and 30 training cases, three fellowship-trained, board-certified staff radiologists scored 50 pelvic ultrasound exams using the O-RADS system. The diagnostic accuracy and area under receiver operating characteristic were analyzed for each reader. Overall agreement and pair-wise agreement between readers were also analyzed.

Results: Excellent specificities (92 to 100%), NPVs (92 to 100%), and variable sensitivities (72 to 100%), PPVs (66 to 100%) were observed. Considering O-RADS 4 and O-RADS 5 as predictors of malignancy, individual reader AUC values range from 0.94 to 0.98 (p < 0.001). Overall inter-reader agreement for all 3 readers was "very good," k = 0.82 (0.73 to 0.90, 95% CI, p < 0.001). Pair-wise agreement between readers were also "very good," k = 0.86-0.92. 14 out of 150 lesions were misclassified, with the most common error being down-scoring of a solid lesion with irregular outer contours.

Conclusion: Even without specific training, experienced ultrasound readers can achieve excellent diagnostic performance and high inter-reader reliability with self-directed review of guidelines and cases. The study highlights the effectiveness of ACR O-RADS as a stratification tool for radiologists and supports its continued use in practice.

Keywords: Accuracy; Inter-observer variability; O-RADS; Ovarian cysts; Reliability; Ultrasound.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Humans
North America
Observer Variation
Radiologists*
Reproducibility of Results
Retrospective Studies
Ultrasonography