Indexed Dataset from YouTube for a Content-Based Video Search Engine

International Journal of Intelligent Computing and Information Sciences(2021)

引用 2|浏览4
暂无评分
摘要
Numerous researches on content-based video indexing and retrieval besides video search engines are tied to a large-scaled video dataset. Unfortunately, reduction in open-sourced datasets resulted in complications for novel approaches exploration. Although, video datasets that index video files located on public video streaming services have other purposes, such as annotation, learning, classification, and other computer vision areas, with little interest in indexing public video links for purpose of searching and retrieval. This paper introduces a novel large-scaled dataset based on YouTube video links to evaluate the proposed content-based video search engine, gathered 1088 videos, that represent more than 65 hours of video, 11,000 video shots, and 66,000 unmarked and marked keyframes, 80 different object names used for marking. Moreover, a state-of-the-art features vector, and combinational-based matching, beneficial to the accuracy, speed, and precision of the video retrieval process. Any video record in the dataset is represented by three features: temporal combination vector, object combination vector with shot annotations, and 6 keyframes, sideways with other metadata. Video classification for the dataset was also imposed to expand the efficiency of retrieval of video-based queries. A two-phased approach has been used based on object and event classification, storing video records in aggregations related to feature vectors extracted. While object aggregation stores video records with the maximal occurrence of extracted object/concept from all shots, event aggregation classify based on groups according to the number of shots per video. This study indexed 58 out of 80 different object/concept categories, each has 9 shot number groups.
更多
查看译文
关键词
Video Summarization,Content-Based Retrieval,Multimodal Indexing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要