Consumer-Level Multimedia Event Detection Through Unsupervised Audio Signal Modeling

13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3(2012)

引用 31|浏览17
暂无评分
摘要
In this work(1), a novel acoustic characterization approach to multimedia event detection (MED) task for unconstrained and unstructured consumer-level videos through audio signal modeling is proposed. The key idea is to characterize the acoustic space of interest with a set of fundamental acoustic units around which a set of acoustic segment models (ASMs) is built. A vector space modeling technique to address MED is here adopted, where an incoming audio signal is first decoded into a sequence of acoustic segments. Then, a feature vector is generated by using co-occurrence statistics of acoustic units, and the MED final decision is implemented with a vector space language classifier. Experimental evidence on the TRECVID2011 MED demonstrates the viability of the proposed approach. Furthermore, it better accounts for temporal dependencies than previously proposed MFCC bag-of-word approaches.
更多
查看译文
关键词
multimedia event detection,unsupervised audio modeling,acoustic segment models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要