Towards direct speech synthesis from ECoG: A pilot study

Christian Herff; Garett Johnson; Lorenz Diener; Jerry Shih; Dean Krusienski; Tanja Schultz

doi:10.1109/EMBC.2016.7591004

Towards direct speech synthesis from ECoG: A pilot study

Annu Int Conf IEEE Eng Med Biol Soc. 2016 Aug:2016:1540-1543. doi: 10.1109/EMBC.2016.7591004.

Authors

Christian Herff, Garett Johnson, Lorenz Diener, Jerry Shih, Dean Krusienski, Tanja Schultz

PMID: 28268620
DOI: 10.1109/EMBC.2016.7591004

Abstract

Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time. In this pilot study with one participant, we demonstrate that electrocoticography (ECoG) intracranial activity from temporal areas can be used to resynthesize speech in real-time. This is accomplished by reconstructing the audio magnitude spectrogram from neural activity and subsequently creating the audio waveform from these reconstructed spectrograms. We show that significant correlations between the original and reconstructed spectrograms and temporal waveforms can be achieved. While this pilot study uses audibly spoken speech for the models, it represents a first step towards speech synthesis from speech imagery.

MeSH terms

Brain-Computer Interfaces
Electroencephalography
Evoked Potentials
Humans
Pilot Projects
Speech*