Synchrony capture filterbank: auditory-inspired signal processing for tracking individual frequency components in speech

J Acoust Soc Am. 2013 Jun;133(6):4290-310. doi: 10.1121/1.4802653.

Abstract

A processing scheme for speech signals is proposed that emulates synchrony capture in the auditory nerve. The role of stimulus-locked spike timing is important for representation of stimulus periodicity, low frequency spectrum, and spatial location. In synchrony capture, dominant single frequency components in each frequency region impress their time structures on temporal firing patterns of auditory nerve fibers with nearby characteristic frequencies (CFs). At low frequencies, for voiced sounds, synchrony capture divides the nerve into discrete CF territories associated with individual harmonics. An adaptive, synchrony capture filterbank (SCFB) consisting of a fixed array of traditional, passive linear (gammatone) filters cascaded with a bank of adaptively tunable, bandpass filter triplets is proposed. Differences in triplet output envelopes steer triplet center frequencies via voltage controlled oscillators (VCOs). The SCFB exhibits some cochlea-like responses, such as two-tone suppression and distortion products, and possesses many desirable properties for processing speech, music, and natural sounds. Strong signal components dominate relatively greater numbers of filter channels, thereby yielding robust encodings of relative component intensities. The VCOs precisely lock onto harmonics most important for formant tracking, pitch perception, and sound separation.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Cochlea / physiology
  • Cochlear Nerve / physiology*
  • Computer Simulation
  • Female
  • Humans
  • Male
  • Phonetics*
  • Pitch Perception / physiology*
  • Signal Processing, Computer-Assisted*
  • Sound Localization / physiology*
  • Sound Spectrography*
  • Speech Acoustics*
  • Speech Perception / physiology*