R.I.P.
π»
Ghosted
Controllable Video Generation: A Survey
July 22, 2025 Β· Declared Dead Β· π arXiv.org
Repo contents: README.md
Authors
Yue Ma, Kunyu Feng, Zhongyuan Hu, Xinyu Wang, Yucheng Wang, Mingzhe Zheng, Bingyuan Wang, Qinghe Wang, Xuanhua He, Hongfa Wang, Chenyang Zhu, Hongyu Liu, Yingqing He, Zeyu Wang, Zhifeng Li, Xiu Li, Sirui Han, Yike Guo, Wei Liu, Dan Xu, Linfeng Zhang, Qifeng Chen
arXiv ID
2507.16869
Category
cs.GR: Graphics
Cross-listed
cs.CV
Citations
30
Venue
arXiv.org
Repository
https://github.com/mayuelala/Awesome-Controllable-Video-Generation
β 676
Last Checked
1 month ago
Abstract
With the rapid development of AI-generated content (AIGC), video generation has emerged as one of its most dynamic and impactful subfields. In particular, the advancement of video generation foundation models has led to growing demand for controllable video generation methods that can more accurately reflect user intent. Most existing foundation models are designed for text-to-video generation, where text prompts alone are often insufficient to express complex, multi-modal, and fine-grained user requirements. This limitation makes it challenging for users to generate videos with precise control using current models. To address this issue, recent research has explored the integration of additional non-textual conditions, such as camera motion, depth maps, and human pose, to extend pretrained video generation models and enable more controllable video synthesis. These approaches aim to enhance the flexibility and practical applicability of AIGC-driven video generation systems. In this survey, we provide a systematic review of controllable video generation, covering both theoretical foundations and recent advances in the field. We begin by introducing the key concepts and commonly used open-source video generation models. We then focus on control mechanisms in video diffusion models, analyzing how different types of conditions can be incorporated into the denoising process to guide generation. Finally, we categorize existing methods based on the types of control signals they leverage, including single-condition generation, multi-condition generation, and universal controllable generation. For a complete list of the literature on controllable video generation reviewed, please visit our curated repository at https://github.com/mayuelala/Awesome-Controllable-Video-Generation.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Graphics
R.I.P.
π»
Ghosted
Everybody Dance Now
R.I.P.
π»
Ghosted
Deep Bilateral Learning for Real-Time Image Enhancement
R.I.P.
π»
Ghosted
Animating Human Athletics
R.I.P.
π»
Ghosted
BundleFusion: Real-time Globally Consistent 3D Reconstruction using On-the-fly Surface Re-integration
R.I.P.
π»
Ghosted
Shape Transformation Using Variational Implicit Functions
Died the same way β π Death by README
R.I.P.
π
Death by README
Momentum Contrast for Unsupervised Visual Representation Learning
R.I.P.
π
Death by README
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
R.I.P.
π
Death by README
Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach
R.I.P.
π
Death by README