Video captioning

Computer Vision and Image Understanding(2023)

引用 0|浏览9
暂无评分
摘要
Video captioning is the process of describing the content of a sequence of images capturing its semantic relationships and meanings. Dealing with this task with a single image is arduous, not to mention how difficult it is for a video (or image sequence). The amount and relevance of the applications of video captioning are vast, mainly to deal with a significant amount of video recordings in video surveillance, or assisting people visually impaired, to mention a few. To analyze where the efforts of our community to solve the video captioning task are, as well as what route could be better to follow, this manuscript presents an extensive review of more than 142 papers for the period of 2016 to 2022. As a result, the most-used datasets and metrics are identified and described. Also, the main approaches used and the best ones are analyzed and discussed. Furthermore, we compute a set of rankings based on several performance metrics to obtain, according to its reported performance, the best method with the best result on the video captioning task across of several datasets and metrics. Finally, some insights are concluded about which could be the next steps or opportunity areas to improve dealing with this complex task. Display Omitted • We analyze papers related to video captioning task, through a literature review. • We presented an overview of 142 papers between the years 2016 and 2022. • We ranked the top five per dataset best solutions through this review. • We calculate a unique best paper from the best solutions. • This literature analysis has concluded insights and improvement opportunities areas.
更多
查看译文
关键词
68T50,68T45,68U10,Natural language processing,Video captioning,Image understanding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要