Towards Environmental Preference Based Speech Enhancement For Individualised Multi-Modal Hearing Aids
CoRR(2024)
Abstract
Since the advent of Deep Learning (DL), Speech Enhancement (SE) models have
performed well under a variety of noise conditions. However, such systems may
still introduce sonic artefacts, sound unnatural, and restrict the ability for
a user to hear ambient sound which may be of importance. Hearing Aid (HA) users
may wish to customise their SE systems to suit their personal preferences and
day-to-day lifestyle. In this paper, we introduce a preference learning based
SE (PLSE) model for future multi-modal HAs that can contextually exploit audio
information to improve listening comfort, based upon the preferences of the
user. The proposed system estimates the Signal-to-noise ratio (SNR) as a basic
objective speech quality measure which quantifies the relative amount of
background noise present in speech, and directly correlates to the
intelligibility of the signal. Additionally, to provide contextual information
we predict the acoustic scene in which the user is situated. These tasks are
achieved via a multi-task DL model, which surpasses the performance of
inferring the acoustic scene or SNR separately, by jointly leveraging a shared
encoded feature space. These environmental inferences are exploited in a
preference elicitation framework, which linearly learns a set of predictive
functions to determine the target SNR of an AV (Audio-Visual) SE system. By
greatly reducing noise in challenging listening conditions, and by novelly
scaling the output of the SE model, we are able to provide HA users with
contextually individualised SE. Preliminary results suggest an improvement over
the non-individualised baseline model in some participants.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined