Interspeech 2019: A Simple Technique for Fairer Speech Emotion Recognition

Dr. John Kane

Next week marks the beginning of the 2019 Interspeech Conference in Graz, Austria — Cogito researchers will present a paper which looks at the phenomenon of gender bias in speech emotion recognition. We propose a simple model training procedure which is both effective at mitigating bias and is more stable during training than a highly cited baseline method.

By using a very standard neural network model, based on 2D convolutional layers applied to Mel frequency coefficients, trained to recognize emotional activation on a dataset of 33,000+ naturally occurring utterances from radio shows — we demonstrate that model performance is more favorable for male speakers compared to females. A popular de-biasing approach, previously proposed by researchers at Google and Stanford (see paper), is found to be effective at improving fairness across gender but at the cost of introducing highly unstable model training and reduced accuracy.

