Acoustic Scene Classification Using Aggregation of Two-Scale Deep Embeddings.

Ho Ka Chon,Yanxiong Li,Wenchang Cao,Qisheng Huang,Wei Xie,Wen-Feng Pang,Jiyue Wang

ICCT（2021）

Cited 2|Views2

No score

Abstract

Acoustic scene classification (ASC) is a topic related to the field of machine listening whose important role is to recognize and categorize audio data in a predefined label which describes a scene location. In most of the state-of-the-art works for ASC, hand-crafted features and single-scale deep embeddings were adopted as the input of back-end classifiers. Inspired by the success of multi-scale deep embeddings in the field of computer vision, we propose an ASC method by aggregating two-scale deep embeddings that are independently learned by two convolutional neural networks (CNNs). We perform ASC experiments on two official datasets of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE), i.e., DCASE-2019 and DCASE-2017. Experimental results show that the proposed method using the aggregation of two-scale deep embeddings improves the performance of the ASC system. The proposed method obtains the improvement of classification accuracies by 0.11 and 0.09 on DCASE-2019 and DCASE-2017 respectively compared to the baseline system. Code is available: https://github.com/hokachon/Two-scale-Agg.

Translated text

Key words

acoustic scene classification,two-scale deep embedding,convolutional neural network

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined