Effective representations for leveraging language content in multimedia event detection

ICASSP(2014)

引用 1|浏览43
暂无评分
摘要
Language content in videos from speech and overlaid or inscene video text can provide high precision signals for video event detection and retrieval. However, sporadic occurrence, content that is unrelated to the events of interest, and high error rates of current speech and text recognition systems on consumer domain video make it difficult to exploit these channels. In this paper, we study different representations of language content to address these challenges. First, we utilize likelihood weighted word lattices obtained from a Hidden Markov Model (HMM) based decoding engine to encode many alternate hypotheses, rather than relying on noisy single best hypotheses. Second, we utilize an event-independent modified term frequency-inverse document frequency (TF-IDF) weighting scheme to obtain the final feature vector. We present detailed experimental results on the TRECVID MED 2013 dataset containing ~150000 videos, and show that our representation significantly outperforms alternate representations for both speech and video text.
更多
查看译文
关键词
video signal processing,multimedia computing,text recognition systems,video event detection,speech recognition,term frequency inverse document frequency,language content representation,video event retrieval,multimedia event detection,hmm,tf-idf,overlaid video text,lattices,hidden markov model,video text ocr,inscene video text,natural language processing,leveraging language content,text analysis,sporadic occurrence,hidden markov models,video retrieval,speech,tf idf
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要