Designing a Generative AI-Assisted Music Psychotherapy Tool for Deaf and Hard-of-Hearing Individuals

Youjin Choi, JaeYoung Moon, JinYoung Yoo, Jennifer G. Kim, Jin-Hyuk Hong · 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems (CHI '26) · doi:10.1145/3772318.3791385

Summary

Choi and colleagues designed, built, and evaluated a generative-AI music psychotherapy tool co-designed with licensed Korean music therapists for deaf and hard-of-hearing (DHH) users. The work is motivated by the observation that music psychotherapy — which uses songwriting, improvisation, and reflective listening to address emotional rather than auditory needs — has largely excluded DHH individuals because therapy models are auditory-centric and because DHH-focused music interventions typically target cochlear-implant perception training rather than emotional well-being. The research proceeds in three stages: semi-structured interviews with six licensed Korean music therapists to identify current practices and needs (RQ1); a four-hour online co-design workshop with four of those therapists, using persona design, GenAI solution exploration, and therapy process design, to derive design rationales (RQ2); and a 100-minute usage study with 23 Korean cochlear-implant users who identify as Deaf and primarily communicate through Korean Sign Language and text (RQ3). The tool combines an LLM-based conversational agent (OpenAI GPT-4o) guiding a four-state therapy flow — therapeutic connection, making lyrics, making music, song discussion — with SUNO v3.5 for music generation and a custom music visualization interface that maps pitch, loudness, rhythm, instrument, and tempo to animated typography and color. Three CA conversational strategies (supportive empathy, example response options, visual-based metaphor and analogy) operationalize the design rationales.

Key findings

All 23 participants completed songwriting sessions (mean 40 minutes), reported high self-disclosure (M=6.02/7), and high music satisfaction (M=5.86/7). Therapists in interviews framed hearing loss as a 'hidden disability' with emotional sequelae (depression, low self-esteem, social isolation) that conventional perception training fails to address but that time and cost pressures push aside in clinical practice. In usage, participants described three distinct empathy types from the CA — mirroring, interpretive, and proactive — and specifically valued proactive empathy ('When you decided to pause dancing, you might have felt a mix of things') as more human than mirroring. Text-only interaction was preferred over spoken dialogue: the absence of voice and facial cues created psychological safety for users who feel self-conscious about how they sound. Visual-based metaphor prompts ('What kind of scene comes to mind?', 'What color best represents your current mood?') helped 20 participants translate emotions into concrete imagery, which then anchored lyrics. Mismatches between generated music and intent (seven participants) were sometimes reinterpreted productively — e.g., a request for 'calm' returning a sadder track led one participant to recognize underlying anxiety. Limits included direct emotional questions that felt pressuring (five participants), lack of emotional restructuring beyond mirroring, and uncontrollable music output parameters.

Relevance

This paper is a concrete demonstration of LLM-driven conversational design for mental-health accessibility with a population typically excluded from music-based interventions. For accessibility and HCI practitioners, the three CA strategies (supportive empathy, example response options, visual-based metaphor) are transferable to other expressive-writing and journaling tools, particularly for users who prefer text over voice. The state-step prompting framework with required-variable extraction is a useful pattern for building any multi-stage therapeutic or assessment CA. The finding that text-based, judgment-free interaction promotes openness contrasts with the common assumption that more human-like voices always help, and it aligns with broader evidence on self-disclosure to chatbots. Limitations: all participants are Korean cochlear-implant users with mid-to-high interest in music and prior GenAI exposure, so findings may not generalize to signing-primary DHH users, non-Korean sociolinguistic contexts, or users with lower digital literacy. Therapeutic efficacy over time was not assessed, and the tool was co-designed without DHH users in the initial phase — a gap the authors flag for future work.

Tags: music therapy · deaf accessibility · DHH · generative AI · conversational agent · songwriting · mental health · cochlear implant · LLM · co-design · multimodal · HCI