Background and objective: In contrast to respiratory sound classification, respiratory phase and adventitious sound event detection provides more detailed and accurate respiratory information, which is clinically important for respiratory disorders. However, current respiratory sound event detection models mainly use convolutional neural networks to generate frame-level predictions. 
A significant drawback of the frame-based model lies in its pursuit of optimal frame-level predictions rather than the best event-level ones. Moreover, it demands post-processing and is incapable of being trained in an entirely end-to-end fashion. Based on the above research status, this paper proposes an event-based Transformer method - Respiratory Events Detection Transformer (REDT) for multi-class respiratory sound event detection task to achieve efficient recognition and localization of the respiratory phase and adventitious sound events.
Approach: Firstly, REDT approach employs the Transformer for time-frequency analysis of respiratory sound signals to extract essential features. Secondly, REDT converts these features into timestamp representations and achieves sound event detection by predicting the location and category of timestamps.
Main results: Our method is validated on the public dataset HF_Lung_V1. The experimental results show that our F1 scores for inspiration, expiration, Continuous Adventitious Sound(CAS) and Discontinuous Adventitious Sound(DAS) are 90.5%, 77.3%, 78.9%, and 59.4%, respectively.
Significance: These results demonstrate the method's significant performance in respiratory sound event detection.
Keywords: Event-based detection; Fine-tuning pretrained model; Hierarchical Transformer; Respiratory sound event detection.
© 2025 Institute of Physics and Engineering in Medicine. All rights, including for text and data mining, AI training, and similar technologies, are reserved.