Efficient Stuttering Event Detection Using Siamese Networks

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2023)

引用 0|浏览8
暂无评分
摘要
Speech disfluency research is pivotal to accommodating atypical speakers in mainstream conversational technology. However, the lack of publicly available labeled and unlabeled datasets is a significant bottleneck to such research. While many works use pseudo dysfluency data with proxy labels and formulate a self-supervised task, we see merit in using real-world data. In this work, we consolidate the corpora of publicly available speech disfluency datasets with and without labels and propose DisfluentSiam – an efficient siamese network-based small-scale pretraining pipeline using task-specific data from multiple domains with only 10M trainable parameters. We show that with DisfluentSiam, we achieve an average of 15% boost in performance across five types of dysfluency event detection compared to direct wav2vec 2.0 embeddings. In particular, with only 4-5 mins of labeled data for fine-tuning, the DisfluentSiam demonstrates the advantage of task-specific pretraining with up to 25% higher accuracy.
更多
查看译文
关键词
Dysfluency,Self-supervised Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要