Diphone
Also known as: Diphone Synthesis
A unit of speech used in text-to-speech synthesis, consisting of the transition from the middle of one phoneme to the middle of the next. Diphone-based synthesis works by recording a set of all possible phoneme-to-phoneme transitions in a language and concatenating the appropriate diphones to produce complete utterances. This approach produces reasonably natural speech while requiring a relatively small database — typically a few thousand recorded segments per language — making it practical for resource-constrained environments such as developing assistive technology for less widely supported languages. Diphone synthesis was an important step between early formant synthesis and modern unit selection or neural speech synthesis.
Category: Speech Technology · Natural Language Processing
Related: Text-to-Speech · Concatenative Synthesis · Phoneme · Speech Synthesis