Last updated on: 10:00 May 13, 2024
😅 I’m still working … I can still learn …. Zzz … 😴
Note: #
= Equal Contribution , *
= Corresponding Author.
2024
CVPR24 ExtDM: Distribution extrapolation diffusion model for video prediction
Zhicheng Zhang#, Junyao Hu#, Wentao Cheng*, Danda Paudel, Jufeng Yang
TL;DR: We present ExtDM, a new diffusion model that extrapolates video content from current frames by accurately modeling distribution shifts towards future frames.
📃 Paper 📃 中译版 📦 Code ⚒️ Project 📊 Poster 📅 Slide 🎞️ Video (Bilibili)
Details
Abstract: Video prediction is a challenging task due to its nature of uncertainty, especially for forecasting a long period. To model the temporal dynamics, advanced methods benefit from the recent success of diffusion models, and repeatedly refine the predicted future frames with 3D spatiotemporal U-Net. However, there exists a gap between the present and future and the repeated usage of U-Net brings a heavy computation burden. To address this, we propose a diffusion-based video prediction method that predicts future frames by extrapolating the present distribution of features, namely ExtDM. Specifically, our method consists of three components: (i) a motion autoencoder conducts a bijection transformation between video frames and motion cues; (ii) a layered distribution adaptor module extrapolates the present features in the guidance of Gaussian distribution; (iii) a 3D U-Net architecture specialized for jointly fusing guidance and features among the temporal dimension by spatiotemporal-window attention. Extensive experiments on five popular benchmarks covering short- and long-term video prediction verify the effectiveness of ExtDM.BibTex1 |
|