AG-LSEC: Audio Grounded Lexical Speaker Error Correction
arxiv(2024)
Abstract
Speaker Diarization (SD) systems are typically audio-based and operate
independently of the ASR system in traditional speech transcription pipelines
and can have speaker errors due to SD and/or ASR reconciliation, especially
around speaker turns and regions of speech overlap. To reduce these errors, a
Lexical Speaker Error Correction (LSEC), in which an external language model
provides lexical information to correct the speaker errors, was recently
proposed. Though the approach achieves good Word Diarization error rate (WDER)
improvements, it does not use any additional acoustic information and is prone
to miscorrections. In this paper, we propose to enhance and acoustically ground
the LSEC system with speaker scores directly derived from the existing SD
pipeline. This approach achieves significant relative WDER reductions in the
range of 25-40
by 15-25
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined