Chrome Extension
WeChat Mini Program
Use on ChatGLM

Feature pre-inpainting enhanced transformer for video inpainting

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE(2023)

Cited 1|Views16
No score
Abstract
Transformer-based video inpainting methods aggregate coherent contents into missing regions by learning dependencies spatial-temporally. However, existing methods suffer from the inaccurate self-attention calcu-lation and excessive quadratic computational complexity, due to uninformative representations of missing regions and inefficient global self-attention mechanisms, respectively. To mitigate these problems, we propose a Feature pre-Inpainting enhanced Transformer (FITer) video inpainting method, in which the feature pre-inpainting network (FPNet) and local-global interleaving Transformer are designed. The FPNet pre-inpaints missing features before the Transformer by exploiting spatial context, and the representations of missing regions are thus enhanced with more informative content. Therefore, the interleaving Transformer can calculate more accurate self-attention weights and learns more effective dependencies between missing and valid regions. Since the interleaving Transformer involves both global and window-based local self-attention mechanisms, the proposed FITer method can effectively aggregate spatial-temporal features into missing regions while improving efficiency. Experiments on YouTube-VOS and DAVIS datasets demonstrate that the FITer method outperforms previous methods qualitatively and quantitatively.
More
Translated text
Key words
transformer,pre-inpainting
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined