CNN Architectures for Large-Scale Audio Classification

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)(2017)

引用 2783|浏览559
暂无评分
摘要
Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.
更多
查看译文
关键词
Acoustic Event Detection,Acoustic Scene Classification,Convolutional Neural Networks,Deep Neural Networks,Video Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要