AI Reasoning and Self-Learning Without External Data via CoT and RL
Multimodal Integration and Human Interaction, Leap Toward General AI (AGI)

This week''s META-X AI paper review covers autonomous AI learning, multimodal advances, and reasoning optimization.

AI Autonomous Learning and Reasoning: Absolute Zero proposes a paradigm where models generate their own reasoning tasks using a code executor for validation, training entirely without external data — achieving SOTA on coding and math reasoning, surpassing models trained on tens of thousands of human-labeled examples. "Grokking in the Wild" enables transformers to learn multi-step factual reasoning patterns (grokking) in sparse real-data environments through synthetic data augmentation. ZeroSearch transforms LLMs into their own search modules and strengthens search capability through RL, solving cost and instability problems.

Multimodal and Human-AI Interaction: Voila proposes a speech-language foundation model enabling real-time autonomous interaction and voice role-playing with low latency and rich emotional expression — open-sourced. "Unified Multimodal Understanding and Generation Models" survey analyzes attempts and challenges in overcoming the architecture gap between image understanding (autoregressive) and generation (diffusion) models. "On Path to Multimodal Generalist" project proposes "General-Level" framework and "General-Bench" benchmark measuring progress toward true multimodal AGI.

CoT and RL Optimization: UnifiedReward-Think integrates explicit CoT into multimodal reward models with RL fine-tuning. RM-R1 defines reward modeling as a reasoning task. Flow-GRPO first integrates online RL into flow matching image generation models, significantly improving complex image generation quality while minimizing reward hacking.