Automatic Speech Recognition System-Independent Word Error Rate Estimation
International Conference on Language Resources and Evaluation(2024)
摘要
Word error rate (WER) is a metric used to evaluate the quality of
transcriptions produced by Automatic Speech Recognition (ASR) systems. In many
applications, it is of interest to estimate WER given a pair of a speech
utterance and a transcript. Previous work on WER estimation focused on building
models that are trained with a specific ASR system in mind (referred to as ASR
system-dependent). These are also domain-dependent and inflexible in real-world
applications. In this paper, a hypothesis generation method for ASR
System-Independent WER estimation (SIWE) is proposed. In contrast to prior
work, the WER estimators are trained using data that simulates ASR system
output. Hypotheses are generated using phonetically similar or linguistically
more likely alternative words. In WER estimation experiments, the proposed
method reaches a similar performance to ASR system-dependent WER estimators on
in-domain data and achieves state-of-the-art performance on out-of-domain data.
On the out-of-domain data, the SIWE model outperformed the baseline estimators
in root mean square error and Pearson correlation coefficient by relative
17.58
was further improved when the WER of the training set was close to the WER of
the evaluation dataset.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要