ASR N-BEST FUSION NETS

Xinyue Liu,Mingda Li,Luoxin Chen,Prashan Wanigasekara,Weitong Ruan,Haidar Khan,Wael Hamza,Chengwei Su

2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)（2021）

Cited 49|Views31

No score

Abstract

Current spoken language understanding systems heavily rely on the best hypothesis (ASR 1-best) generated by automatic speech recognition, which is used as the input for downstream models such as natural language understanding (NLU) modules. However, the potential errors and misrecognition in ASR 1-best raise challenges to NLU. It is usually difficult for NLU models to recover from ASR errors without additional signals, which leads to suboptimal SLU performance. This paper proposes a fusion network to jointly consider ASR n-best hypotheses for enhanced robustness to ASR errors. Our experiments on Alexa data show that our model achieved 21.71% error reduction compared to baseline trained on transcription for domain classification.

Translated text

Key words

spoken language understanding system,automatic speech recognition,downstream models,natural language understanding,NLU models,suboptimal SLU performance,ASR N-best fusion nets,ASR 1-best

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined