Towards ASR Robust Spoken Language Understanding Through In-Context Learning With Word Confusion Networks
CoRR(2024)
Abstract
In the realm of spoken language understanding (SLU), numerous natural
language understanding (NLU) methodologies have been adapted by supplying large
language models (LLMs) with transcribed speech instead of conventional written
text. In real-world scenarios, prior to input into an LLM, an automated speech
recognition (ASR) system generates an output transcript hypothesis, where
inherent errors can degrade subsequent SLU tasks. Here we introduce a method
that utilizes the ASR system's lattice output instead of relying solely on the
top hypothesis, aiming to encapsulate speech ambiguities and enhance SLU
outcomes. Our in-context learning experiments, covering spoken question
answering and intent classification, underline the LLM's resilience to noisy
speech transcripts with the help of word confusion networks from lattices,
bridging the SLU performance gap between using the top ASR hypothesis and an
oracle upper bound. Additionally, we delve into the LLM's robustness to varying
ASR performance conditions and scrutinize the aspects of in-context learning
which prove the most influential.
MoreTranslated text
Key words
spoken language understanding,in-context learning,zero-shot learning,large language models. ASR confusion networks
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined