Statistical analysis of bilingual speaker's speech for cross-language voice conversion

M Abe; K Shikano; H Kuwabara

doi:10.1121/1.402284

Statistical analysis of bilingual speaker's speech for cross-language voice conversion

J Acoust Soc Am. 1991 Jul;90(1):76-82. doi: 10.1121/1.402284.

Authors

M Abe¹, K Shikano, H Kuwabara

Affiliation

¹ ATR Interpreting Telephony Research Laboratories, Kyoto, Japan.

PMID: 1831822
DOI: 10.1121/1.402284

Abstract

The goal of cross-language voice conversion is to preserve the speech characteristics of one speaker when that speaker's speech is translated and used to synthesize speech in another language. In this paper, two preliminary studies, i.e., a statistical analysis of spectrum differences in different languages and the first attempt at a cross-language voice conversion, are reported. Speech uttered by a bilingual speaker is analyzed to examine spectrum difference between English and Japanese. Experimental results are (1) the codebook size for mixed speech from English and Japanese should be almost twice the codebook size of either English or Japanese; (2) although many code vectors occurred in both English and Japanese, some have a tendency to predominate in one language or the other; (3) code vectors that predominantly occurred in English are contained in the phonemes /r/, /ae/, /f/, /s/, and code vectors that predominantly occurred in Japanese are contained in /i/, /u/, /N/; and (4) judged from listening tests, listeners cannot reliably indicate the distinction between English speech decoded by a Japanese codebook and English speech decoded by an English codebook. A voice conversion algorithm based on codebook mapping was applied to cross-language voice conversion, and its performance was somewhat less effective than for voice conversion in the same language.

MeSH terms

Algorithms
Communication Devices for People with Disabilities*
Humans
Japan
Language*
Phonetics
Psychoacoustics
Signal Processing, Computer-Assisted / instrumentation
Sound Spectrography / instrumentation
Telephone / instrumentation*
Translations*
Voice Quality*