Objective: Our study aim was to validate the use of computer-aided narrative content analysis in the extraction of standard diagnostic categories using an archived cytology database that included individually overread reference classification.
Design: A retrospective analysis of narrative anal cytology results collected on HIV-infected patients at the University of California, San Diego between January and December 2001.
Methods: We used computer-assisted content analysis extraction methodology using Wordstat 8.0 (Provalis Research) that operated using a classification dictionary that we developed for the following diagnostic categories: NAMC, ASCUS, LSIL, HSIL. We compared its accuracy to a physician overread manually extracted method: that classified each report into the most severe diagnostic category referenced in the narrative report. Agreement between content analysis mapped diagnostic categories and the reference category was evaluated using kappa agreement.
Results: During 2001, 901 patients underwent 997 anal cytological examinations as routine screening. By reference diagnostic category: 54 (5.4%) were unsatisfactory, 460 (46.1%) were NAMC, 291 (29.2%) were ASCUS, 131 (13.1%) were LSIL, and 61 (6.1%) were HSIL. Computer-aided content analysis extracted a single diagnosis from each report in 963 (96.2%) cases and two diagnoses in 38 (3.8%) cases. The Kappa agreement was 0.96 (0.019 s.e.). There were 29 cases classified ASCUS by reference category but LSIL by adjudicated content analysis. A focused review indicated that the over reader assigned reference category was in error.
Conclusion: Computer-aided narrative content analysis of anal cytology results yielded accurate and time-efficient classification into meaningful diagnostic categories that can be used to evaluate screening programs and modeling natural history.
Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.