VFIMamba: Video Frame Interpolation with State Space Models
arxiv(2024)
摘要
Inter-frame modeling is pivotal in generating intermediate frames for video
frame interpolation (VFI). Current approaches predominantly rely on convolution
or attention-based models, which often either lack sufficient receptive fields
or entail significant computational overheads. Recently, Selective State Space
Models (S6) have emerged, tailored specifically for long sequence modeling,
offering both linear complexity and data-dependent modeling capabilities. In
this paper, we propose VFIMamba, a novel frame interpolation method for
efficient and dynamic inter-frame modeling by harnessing the S6 model. Our
approach introduces the Mixed-SSM Block (MSB), which initially rearranges
tokens from adjacent frames in an interleaved fashion and subsequently applies
multi-directional S6 modeling. This design facilitates the efficient
transmission of information across frames while upholding linear complexity.
Furthermore, we introduce a novel curriculum learning strategy that
progressively cultivates proficiency in modeling inter-frame dynamics across
varying motion magnitudes, fully unleashing the potential of the S6 model.
Experimental findings showcase that our method attains state-of-the-art
performance across diverse benchmarks, particularly excelling in
high-resolution scenarios. In particular, on the X-TEST dataset, VFIMamba
demonstrates a noteworthy improvement of 0.80 dB for 4K frames and 0.96 dB for
2K frames.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要