Advanced Next-Generation Video and 3D Generation Technology Enhancing Immersion Through Precise Spatiotemporal Control and Narrative Consistency
Architectural Innovations Overcoming Performance Limits of Large Language Models Through Parallel Inference and Computational Efficiency Optimization

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

https://arxiv.org/abs/2512.08765

Wan-Move is a simple and scalable framework for providing precise motion control to video generation models, designed to solve the coarse control granularity and low scalability issues of existing methods. The core idea is to represent object motion as dense point trajectories, project these into latent space, and propagate first-frame features along trajectories to generate motion-aware feature maps that can be integrated into existing image-to-video models (e.g., Wan-I2V-14B) without architectural changes. This enables easy fine-tuning without a separate auxiliary motion encoder, and user studies demonstrate the ability to generate 5-second 480p videos with control capability comparable to Kling 1.5 Pro''s motion brush function, with excellence verified through the MoveBench benchmark containing large-scale data and precise annotations.