MAuD: a multivariate audio database of samples collected from benchmark conferencing platforms

Tapas Chakraborty, Rudrajit Bhattacharyya,Nibaran Das,Subhadip Basu,Mita Nasipuri

Multimedia Tools and Applications（2023）

引用 0|浏览0

暂无评分

摘要

This paper presents an unique audio database, we named it Multivariate Audio Database (MAuD), where audio data has been collected in real life scenarios. MAuD contains 229 audio files, each of duration approx 5 minutes, collected across different conferencing apps, spoken languages, background noises and discussion topics. Various audio conferencing applications have been used for collecting these data e.g. Mobile conference calls, Zoom, Google Meet, Skype and Hangout. During this collection, speakers of different age, sex spoke in several languages and on various topics. Audio was recorded using devices of one of the speakers. Background noises were then introduced synthetically. Researchers may find this database useful as it can be used for several signal processing experiments e.g. conference app identification, background noise identification, speaker identification, identification of who speaks when. We have explored classification of some of the above mentioned mismatch cases (conference app and background noise). Pre-trained deep learning models (ResNet18 and DenseNet201) has been used for these purposes. We have achieved more than 98% accuracy in both the experiments that confirms MAuD contains high quality audio specific properties.

查看译文

关键词

Voice calling platforms,Audio conferencing,Deep learning,CNN,ResNet,DenseNet,Signal processing

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要