Evolution of Foundation Models: Multimodal Reasoning and Expansion to Interactive Physical AI
High-Sparsity MoE, Hardware Quantization Innovation for AI Efficiency and Safety

MetaX Weekly AI Paper Review -- Week 45 of 2025. Key papers reviewed this week: "Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm" -- proposes using video generation models like Sora-2 to overcome static limitations of text/image-based reasoning; integrates dynamic processes and sequential changes within a unified temporal framework; developed VideoThinkBench benchmark where Sora-2 demonstrates strong reasoning in both vision and text-centered tasks -- showing video generation models have potential as integrated multimodal reasoners spanning text and vision. Additional papers covered high-sparsity Mixture of Experts (MoE) architectures, hardware-aware quantization techniques for inference efficiency, and AI safety mechanisms for large-scale model deployment.