Using iterative adaptation and dynamic mask for child speech extraction under real-world multilingual conditions

Speech Communication(2023)

引用 0|浏览33
暂无评分
摘要
We develop two improvements over our previously-proposed joint enhancement and separation (JES) framework for child speech extraction in real-world multilingual scenarios. First, we introduce an iterative adaptation based separation (IAS) technique to iteratively fine-tune our pre-trained separation model in JES using data from real scenes to adapt the model. Second, to purify the training data, we propose a dynamic mask separation (DMS) technique with variable lengths in movable windows to locate meaningful speech segments using a scale-invariant signal-to-noise ratio (SI-SNR) objective. With DMS on top of IAS, called DMS+IAS, the combined technique can remove a large number of noise backgrounds and correctly locate speech regions in utterances recorded under real-world scenarios. Evaluated on the BabyTrain corpus, our proposed IAS system achieves consistent extraction performance improvements when compared to our previously-proposed JES framework. Moreover, experimental results also show that the proposed DMS+IAS technique can further improve the quality of separated child speech in real-world scenarios and obtain a relatively good extraction performance in difficult situations where adult speech is mixed with child speech.
更多
查看译文
关键词
Speech extraction,Child speech process,Dynamic mask,Iterative adaptation based separation,Real-world multilingual condition
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要