Temporal coding with magnitude-phase regularization for sound event detection

Conference of the International Speech Communication Association (INTERSPEECH)(2022)

引用 0|浏览0
暂无评分
摘要
Sound Event Detection (SED) is the challenge of identifying sound events into their temporal boundaries as well as sound category. With recent advances in deep learning, more effective SED techniques are investigated through the annual challenge of Detection and Classification of Acoustic Scenes and Events (DCASE). Most SED systems rely on data-driven learning where a deep neural network is trained to minimize the error between model prediction and the truth. While this framework is generally effective at identifying sound classes present in an audio recording, it results in unreliable estimates of temporal information for identifying sound boundaries. In order to heighten the temporal precision, this paper proposes a novel temporal coding of magnitude and phase for embedding vectors in an intermediate layer. This coding is reflected as a regularization term in the objective function for training the model. The regularization allows magnitude of embedding vectors to increase near event boundaries, which represent the onset and offset points. Simultaneously, each of the boundaries are distinguishable from others using phase difference between two neighboring vectors. This approach results in notable improvement in timing sensitivity compared to a baseline system tested on SED task in the context of DCASE2021 challenge.
更多
查看译文
关键词
sound,regularization,event,detection,magnitude-phase
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要