INVESTIGATING WAVEFORM AND SPECTROGRAM FEATURE FUSION FOR ACOUSTIC SCENE CLASSIFICATION Technical Report

semanticscholar(2021)

引用 0|浏览2
暂无评分
摘要
This technical report presents our submitted system for the DCASE 2021 Challenge Task1B: Audio-Visual Scene Classification. Focusing on the audio modality only, we investigate the use of two common feature representations within the audio understanding domain, the raw waveform and Mel-spectrogram, and measure their degree of complementarity when using both representations for fusion. We introduce a new model paradigm for acoustic scene classification by fusing features learned from Mel-spectrograms and the raw waveform from separate feature extraction branches. Our experimental results show that our proposed fusion model has a 4.5% increase in validation accuracy and a reduction of .14 in validation loss over the Task 1B baseline audio-only sub-network. We further show that learned features of raw waveforms and Mel-spectrograms are indeed complementary to each other and that there is a consistent classification performance improvement over models trained on Mel-spectrograms alone.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要