Representation Learning For Background Music Identification In Television Shows

2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE(2019)

引用 0|浏览4
暂无评分
摘要
Although audio fingerprinting has been widely used in various applications, the performances of audio fingerprinting methods are extremely decreased in case of identifying the background music mixed with speech in TV shows. To solve this, we present an approach to represent embeddings for background music identification using deep convolutional networks. We construct triplet dataset including the original songs, the same songs mixed with voices, and different songs. Then, we train the network with triplet loss function with adaptive margin. By nearest neighbor classifier, the closest embedding is found among the ones of original songs. As comparing top-1 accuracy of music identification, it is shown that our representation learning of the embedding from each music segment mixed with speech has meaningful information for music identification.
更多
查看译文
关键词
backgound music identification, Siamese Network, triplet loss
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要