Adaptive Hierarchical Pooling for Weakly-supervised Sound Event Detection
International Multimedia Conference(2022)
Abstract
ABSTRACTIn Weakly-supervised Sound Event Detection (WSED), the ground truth of training data contains the presence or absence of each sound event only at the clip-level (i.e., no frame-level annotations). Recently, WSED has been formulated under the multi-instance learning framework, and a critical component within this formulation is the design of the temporal pooling function. In this paper, we propose an adaptive hierarchical pooling (HiPool) for WSED, which combines the advantages of max pooling in audio tagging and weighted average pooling in audio localization through a novel hierarchical structure and learns event-wise optimal pooling functions through continuous relaxation-based joint optimization. Extensive experiments on benchmark datasets show that HiPool outperforms the current pooling methods and greatly improves the performance of WSED. HiPool also has great generality - ready to be plugged into any WSED models.
MoreTranslated text
Key words
sound event detection,adaptive hierarchical pooling,weakly-supervised
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined