Mel Spectrogram

Also known as: Mel-frequency Spectrogram, Log Mel Spectrogram

A visual representation of sound that maps audio frequencies onto the mel scale, which approximates how humans perceive pitch — compressing higher frequencies and expanding lower ones to match the non-linear sensitivity of human hearing. Mel spectrograms convert audio signals into two-dimensional images where the x-axis represents time, the y-axis represents frequency (on the mel scale), and color intensity represents amplitude. In accessibility applications, mel spectrograms serve as input features for deep learning models that classify environmental sounds for deaf and hard of hearing users, enabling systems like smartwatch-based sound awareness tools. A key privacy property is that mel spectrograms are non-reconstructable — the original audio cannot be recovered from the spectrogram features, protecting conversational privacy in always-listening assistive devices.

Category: audio processing · machine learning · deaf accessibility

Related: Sound Classification · Sound Awareness · Sonification

Sources