A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy

Peter Yao; Dan Witte; Alexander German; Preethi Periyakoil; Yeo Eun Kim; Hortense Gimonet; Lucian Sulica; Hayley Born; Olivier Elemento; Josue Barnes; Anaïs Rameau

doi:10.1007/s00405-023-08190-8

A deep learning pipeline for automated classification of vocal fold polyps in flexible laryngoscopy

Eur Arch Otorhinolaryngol. 2024 Apr;281(4):2055-2062. doi: 10.1007/s00405-023-08190-8. Epub 2023 Sep 11.

Authors

Affiliations

¹ Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA.
² Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA.
³ Department of Otolaryngology-Head and Neck Surgery, Sean Parker Institute for the Voice, Weill Cornell Medicine, 240 East 59th St, New York, NY, 10022, USA. anr2783@med.cornell.edu.

^# Contributed equally.

PMID: 37695363
DOI: 10.1007/s00405-023-08190-8

Abstract

Purpose: To develop and validate a deep learning model for distinguishing healthy vocal folds (HVF) and vocal fold polyps (VFP) on laryngoscopy videos, while demonstrating the ability of a previously developed informative frame classifier in facilitating deep learning development.

Methods: Following retrospective extraction of image frames from 52 HVF and 77 unilateral VFP videos, two researchers manually labeled each frame as informative or uninformative. A previously developed informative frame classifier was used to extract informative frames from the same video set. Both sets of videos were independently divided into training (60%), validation (20%), and test (20%) by patient. Machine-labeled frames were independently verified by two researchers to assess the precision of the informative frame classifier. Two models, pre-trained on ResNet18, were trained to classify frames as containing HVF or VFP. The accuracy of the polyp classifier trained on machine-labeled frames was compared to that of the classifier trained on human-labeled frames. The performance was measured by accuracy and area under the receiver operating characteristic curve (AUROC).

Results: When evaluated on a hold-out test set, the polyp classifier trained on machine-labeled frames achieved an accuracy of 85% and AUROC of 0.84, whereas the classifier trained on human-labeled frames achieved an accuracy of 69% and AUROC of 0.66.

Conclusion: An accurate deep learning classifier for vocal fold polyp identification was developed and validated with the assistance of a peer-reviewed informative frame classifier for dataset assembly. The classifier trained on machine-labeled frames demonstrates improved performance compared to the classifier trained on human-labeled frames.

Keywords: Artificial intelligence; Computer vision; Convolutional neural network; Deep learning; Informative frames; Polyp classification.

MeSH terms

Deep Learning*
Humans
Laryngoscopy / methods
Machine Learning
Neural Networks, Computer
Polyps* / diagnostic imaging
Retrospective Studies
Vocal Cords / diagnostic imaging

Abstract

MeSH terms

Grants and funding