Concatenated Speech Synthesis

Also known as: Concatenative Synthesis, Unit Selection Synthesis

A method of producing synthetic speech by connecting pre-recorded segments of human speech, typically diphones (transitions between phonemes) or demi-syllables, to form complete words and sentences. Concatenated speech synthesis produces more natural-sounding output than older formant synthesis methods because it uses actual recordings of human voices, but may exhibit audible discontinuities at segment boundaries where pitch, timing, or spectral characteristics do not perfectly match. In accessibility contexts, the quality of concatenated speech affects user acceptance of screen readers, smart home systems, and other assistive technologies, particularly among older adults who may have stronger preferences for natural-sounding voices.

Category: Speech Technology · Assistive Technology

Related: Text-to-Speech · Speech Synthesis · Formant Synthesis · Screen Reader

Sources

https://doi.org/10.1145/638249.638280