The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)(2022)

引用 33|浏览47
暂无评分
摘要
In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tackling speech processing tasks in different scenarios by introducing information about an additional modality (e.g., video, or text), which will hopefully lead to better environmental and speaker robustness in realistic applications. In the first MISP challenge, two benchmark datasets recorded in a real-home TV room with two reproducible open-source baseline systems have been released to promote research in audio-visual wake word spotting (AVWWS) and audio-visual speech recognition (AVSR). To our knowledge, MISP is the first open evaluation challenge to tackle real-world issues of AVWWS and AVSR in the home TV scenario.
更多
查看译文
关键词
MISP challenge,microphone array,audiovisual,automatic speech recognition,wake word spotting
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要