Chrome Extension
WeChat Mini Program
Use on ChatGLM

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)(2023)

Cited 0|Views14
No score
Abstract
We propose a modular pipeline for the single-channel separation, recognition, and diarization of meeting-style recordings and evaluate it on the Libri-CSS dataset. Using a Continuous Speech Separation (CSS) system with a TF-GridNet separation architecture, followed by a speaker-agnostic speech recognizer, we achieve state-of-the-art recognition performance in terms of Optimal Reference Combination Word Error Rate (ORC WER). Then, a d-vector-based diarization module is employed to extract speaker embeddings from the enhanced signals and to assign the CSS outputs to the correct speaker. Here, we propose a syntactically informed diarization using sentence- and word-level boundaries of the ASR module to support speaker turn detection. This results in a state-of-the-art Concatenated minimum-Permutation Word Error Rate (cpWER) for the full meeting recognition pipeline.
More
Translated text
Key words
Speech Separation,Speech Recognition,Diarization,Libri-CSS,Meeting Separation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined