Speech-modulated Typography
Also known as: Speech-driven Typography, Prosody-driven Typography
A design technique in which the visual properties of text — typically font weight, width, or size on a variable-font axis — are modulated in real time by features extracted from a corresponding speech signal, such as pitch, loudness, rhythm, or an inferred emotional-arousal score. Speech-modulated typography is one of the primary mechanisms for producing expressive captions that convey prosodic or affective information unavailable in plain text. Research has explored both continuous mappings (per-word font weight tied to arousal or intensity) and categorical mappings (discrete type styles for different emotions). Challenges include readability, demographic bias in speech-emotion-recognition models, and cross-modal consistency with other visual or haptic cues.
Category: typography · captioning
Related: Variable Font · Expressive Captions · Prosody · Speech Emotion Recognition