Older Adults' Evaluations of Speech Output
Lorna Lines, Kate S. Hone · 2002 · Proceedings of the Fifth International ACM Conference on Assistive Technologies (Assets 02) · doi:10.1145/638249.638280
Summary
This paper investigates older adults' subjective evaluations of different speech output voices in the context of an Intelligent Home System (IHS) designed to help older people live independently. Given that 66% of people with visual impairments in the UK are over 75, and that smart home systems increasingly rely on speech output for information delivery (alarm states, temperature, door status), the choice of voice characteristics is critical for user acceptance and comprehension. The study used a within-subjects 2x2 factorial design with 16 participants aged 65+ (9 females, 7 males) attending a local day center. Participants evaluated four voice samples: natural male, natural female, synthetic male, and synthetic female. The natural voices were recorded from speakers with standard English UK accent selected for "higher" social status, while the synthetic voices were generated using the Laureate concatenated speech synthesizer. Each voice presented one of four directional navigation scripts (~180 words, ~60 seconds) around a home environment, reflecting realistic IHS use. Participants evaluated each voice on 8 bipolar semantic differential scales: pleasant/unpleasant, intelligent/stupid, boring/interesting, fast/slow, irritating/soothing, young/old, natural/unnatural, and clear/muffled. A final question asked which voice they would choose for speech output in their own home.
Key findings
A between-subjects 2-factor ANOVA revealed significant main effects for both voice type (natural vs. synthetic) and voice gender (male vs. female) across all attributes (all p<0.01), with a significant interaction effect only for the fast-slow attribute (F(1,60)=21.000, p<0.01). The natural male voice received the most positive evaluations overall — rated as most pleasant, intelligent, natural, clear, and least boring, fast, and irritating. The synthetic female voice was evaluated most negatively — considered the most unpleasant, stupid, irritating, boring, and unnatural. When asked which voice they would choose for their home, 15 out of 16 participants selected the natural male voice; the remaining participant chose none. Natural voices were consistently preferred over synthetic voices, and male voices were preferred over female voices for both natural and synthetic types. Interestingly, the natural female voice was perceived as the fastest and youngest of the four voices and was considered slightly more irritating than the synthetic male voice. The synthetic male voice received only marginally more positive evaluations than the synthetic female voice. The authors suggest the male voice preference may relate to presbycusis — the age-related hearing condition that causes difficulty perceiving higher-frequency sounds, making lower-pitched male voices easier to hear. Additionally, older adults' greater familiarity with natural speech from telephone and radio may explain the strong preference for natural over synthetic voices.
Relevance
This paper addresses a design decision that significantly impacts the usability of speech-based assistive technologies for older adults: the choice of voice characteristics. While speech synthesis quality has improved dramatically since 2002 (with modern neural TTS voices being nearly indistinguishable from natural speech), the core findings remain relevant. The strong preference for natural-sounding voices validates the trend toward high-fidelity TTS in assistive devices. The gender preference finding is more complex — while the study found a male voice preference among older UK adults, this may be culturally and generationally specific. The presbycusis factor is physiologically relevant regardless of era: older users may genuinely perceive lower-pitched voices more easily, which has implications for choosing voice characteristics in screen readers, smart home assistants, and navigation aids designed for older populations. For practitioners designing speech interfaces for older adults, the key takeaway is that voice quality and characteristics directly affect user acceptance and willingness to adopt assistive technology — a system rejected because of an unpleasant voice may never get the chance to demonstrate its functional benefits. The study is limited by its small sample size (n=16), the dated quality of 2002-era synthetic speech, and the UK-specific cultural context.
Tags: speech output · older adults · aging · smart home · visual impairment · text-to-speech · voice preference · assistive technology