Multimodal Natural Language Generation
Also known as: Multimodal NLG
Natural language generation systems that produce output coordinated across more than one modality — typically combinations of text or speech with graphics, maps, animation, gesture, or tactile output. Multimodal NLG systems decompose their output into several "channels" that are time-coordinated, each governed by a different processing component. Classic examples include driving-directions systems that combine spoken instructions with maps, and embodied conversational agents that coordinate speech with gesture and facial expression. Sign-language generation is an extreme case of multimodal NLG because it has no single dominant linguistic channel — meaning is carried simultaneously by the hands, eye gaze, facial expression, head tilt, and body posture. For accessibility, multimodal NLG is the foundation for signing-avatar systems, personalised captioning, and alternative-format generation.
Category: Natural Language Processing · Human-Computer Interaction · Multimodal · Accessibility Research
Related: Natural Language Generation · Embodied Conversational Agent · Signing Avatar · Sign Language Machine Translation