Diverse Search Methods and Multi-Modal Fusion for High-Performance Video Retrieval.
Symposium on Information and Communication Technology(2023)
摘要
Querying events within extensive video datasets currently stands as a prominent research focus within the field of multimedia information retrieval. Achieving high-performance retrieval within such contexts necessitates the efficient extraction and effective storage of information from videos to expedite the retrieval process. These challenges become notably pronounced when handling substantial datasets. In this paper, we introduce a system tailored for event querying within video data. Our system is meticulously crafted to optimize information retrieval speed and to efficiently organize storage, harnessing the power of FAISS and ElasticSearch. It boasts the capability to process diverse forms of input information, including textual video descriptions, Optical Character Recognition (OCR) results, Automatic Speech Recognition (ASR) transcriptions, visually similar images, and details about objects within videos, encompassing aspects such as color and quantity. Moreover, our system can also extract information about the temporal sequence of events within videos—a particularly challenging task when extracting information from video frames. By amalgamating these various input types, our system delivers optimal results.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要