Vocal tract representation in the recognition of cerebral palsied speech

J Speech Lang Hear Res. 2012 Aug;55(4):1190-207. doi: 10.1044/1092-4388(2011/11-0223). Epub 2012 Jan 23.

Abstract

Purpose: In this study, the authors explored articulatory information as a means of improving the recognition of dysarthric speech by machine.

Method: Data were derived chiefly from the TORGO database of dysarthric articulation (Rudzicz, Namasivayam, & Wolff, 2011) in which motions of various points in the vocal tract are measured during speech. In the 1st experiment, the authors provided a baseline model indicating a relatively low performance with traditional automatic speech recognition (ASR) using only acoustic data from dysarthric individuals. In the 2nd experiment, the authors used various measures of entropy (statistical disorder) to determine whether characteristics of dysarthric articulation can reduce uncertainty in features of dysarthric acoustics. These findings led to the 3rd experiment, in which recorded dysarthric articulation was directly encoded into the speech recognition process.

Results: The authors found that 18.3% of the statistical disorder in the acoustics of speakers with dysarthria can be removed if articulatory parameters are known. Using articulatory models reduces phoneme recognition errors relatively by up to 6% for speakers with dysarthria in speaker-dependent systems.

Conclusions: Articulatory knowledge is useful in reducing rates of error in ASR for speakers with dysarthria and in reducing statistical uncertainty of their acoustic signals. These findings may help to guide clinical decisions related to the use of ASR in the future.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Cerebral Palsy / complications
  • Cerebral Palsy / physiopathology*
  • Databases, Factual
  • Dysarthria / diagnosis
  • Dysarthria / etiology
  • Dysarthria / physiopathology*
  • Entropy
  • Female
  • Humans
  • Linear Models
  • Male
  • Markov Chains
  • Middle Aged
  • Models, Biological*
  • Phonetics
  • Speech Acoustics
  • Speech Intelligibility / physiology*
  • Speech Production Measurement
  • Speech Recognition Software
  • Vocal Cords / physiopathology*
  • Young Adult