Maximization of Scientific Reasoning and Computational Efficiency Through Reinforcement Learning, Model Souping, and Interactive Scaling
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation
https://arxiv.org/abs/2511.14993
Kandinsky 5.0 is a family of state-of-the-art foundation models for high-resolution image and 10-second video synthesis, consisting of three core models: Image Lite (6B parameters), Video Lite (fast and lightweight 2B parameters), and Video Pro (19B parameters with excellent video generation quality). The research comprehensively reviews the entire data curation process from collection through filtering and clustering, and introduces a multi-stage training pipeline applying quality enhancement techniques such as supervised fine-tuning (SFT) and post-training based on reinforcement learning (RL). It demonstrates achieving high generation speed and performance through new architecture and inference optimization, and open-sources code and training checkpoints to support the research community''s advancement for use in a wide range of generative applications.
![[2025 Week 47] MetaX Weekly AI Paper Review](https://metax-images-bucket.s3.ap-southeast-2.amazonaws.com/defaults/aitech3.webp)

