Temporal Contrastive-Loss for Audio Event Detection

ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)（2022）

引用 6|浏览20

暂无评分

摘要

Temporal coherence is a feature-binding mechanism that ensures features that evolve together in time belong to the same object or event. Coherence has been extensively studied in biological systems, demonstrating how our brain leverages this mechanism to perform complex tasks in real environments and facilitate segregation of complex sensory signals (or wholes) into individual objects (or parts), following Gestalt principles. Although intuitive and computationally tractable, these concepts have rarely been leveraged in audio technologies. Audio event detection is an application that specifically deals with identifying sound events in an audio recording; hence is a natural avenue to explore principles of temporal coherence. In this study, we propose coherence-based learning, formulated as a contrastive loss, to train event detection models whereby embeddings driven by acoustic events are coherently constrained to maximize discriminability across events. This approach results in improved detection performance with no additional computational cost and a very small overhead during the training procedure.

查看译文

关键词

Audio event detection,temporal coherence,contrastive learning,DCASE challenge

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要