Chrome Extension
WeChat Mini Program
Use on ChatGLM

Text-Video Completion Networks with Motion Compensation and Attention Aggregation

Jianan Wang,Zhiliang Wu,Hanyu Xuan, Yan

IEEE International Conference on Acoustics, Speech, and Signal Processing(2024)

Cited 0|Views26
No score
Abstract
The purpose of video inpainting is to fill a specified area with reasonable content. However, in the case of multiple targets and complex textures, current methods struggle to distinguish between feature information of the targets, leading to confusing or fuzzy inpainting results. In this paper, we design a new text-video completion network based on a motion compensation and temporal attention feature aggregation. Our network utilizes information from reference frames and target frames to complete the damaged region of the target frame. We first employ motion compensation to align the features of reference frames, and then use the temporal attention module to aggregate these features, resulting in accurate and reasonable content. To evaluate the effectiveness of our method, we introduce a new text video dataset with multiple text objects and complex textures, presenting a novel and challenging task for inpainting research. Through quantitative and qualitative comparison experiments, we demonstrate that our model outperforms existing baseline models in scenarios with multiple objects and complex textures.
More
Translated text
Key words
Text-Video Completion Network,Motion Compensation,Temporal Attention Feature Aggregation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined