An End-To-End Model from Speech to Clean Transcript for Parliamentary Meetings

2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)(2021)

引用 0|浏览3
暂无评分
摘要
This paper presents an end-to-end approach for generating readable and clean text directly from speech signal. While conventional automatic speech recognition (ASR) systems are designed to faithfully reproduce utterances word-by-word, we propose a model that emulates the way a human transcriber/editor creates a clean transcript from speech by skipping fillers, substituting colloquial expressions with more formal ones, inserting punctuation, and performing other types of corrections. An evaluation using 700-hour Japanese Parliamentary speech demonstrates the effectiveness of the proposed approach in generating clean texts suitable for human consumption. We also show that forward-backward decoding and multitask learning leveraging approximate faithful transcripts significantly improve the performance of the direct mapping.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要