Multi Transcription-Style Speech Transcription Using Attention-Based Encoder-Decoder Model.

2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)(2023)

引用 0|浏览2
暂无评分
摘要
Human professional transcription services provide a variety of transcription styles to customize different needs. To accommodate different users and facilitate seamless integration with downstream applications, we propose a framework to generate multi-style transcription in an attention-based encoder-decoder model (AED) using three different architectures: (A) style-dependent layers; (B) mixed-style output; (C) style-dependent prompt. In this framework, both the verbatim lexical transcription and the readable transcription of various styles can be generated simultaneously or separately, through a single decoding pass or multiple decoding passes on-demand. We conduct experiments in a large-scale AED-based speech transcription system trained with 50k hours speech. The proposed framework can achieve nearly on-par performance compared to the single-style AED with significant savings in model footprint and decoding cost. Moreover, it provides an efficient data sharing mechanism across different styles through knowledge transfer.
更多
查看译文
关键词
attention-based encoder-decoder model (AED),verbatim lexical transcription,readable transcription
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要