Implementation of Next-Generation Multimodal·Cross-Disciplinary Foundation Models Covering Vision, 3D, and Code Generation
This week''s META-X AI paper review covers advances in code generation, omnimodal models, reinforcement learning, and cross-disciplinary AI.
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation addresses LLMs'' difficulty generating entire software repositories by proposing an explicit Repository Planning Graph (RPG) eliminating natural language ambiguity. The ZeroRepo framework builds code systematically through proposal, concretization, and generation stages — generating 3.9x larger codebases than existing best-performing models with significantly higher test pass rates.
Qwen3-Omni Technical Report presents a unified multimodal model handling text, image, audio, and video without performance degradation in any domain through Thinker-Talker MoE architecture. Features near-real-time speech synthesis and achieves SOTA performance on multiple audio and audiovisual benchmarks, proving a single model can match specialized models across modalities.
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models addresses RL training instability and performance stagnation through variance-based curriculum design — automatically adjusting task difficulty based on model performance variance to maintain optimal learning gradient, improving both convergence speed and final performance on mathematical reasoning benchmarks.
Additional papers reviewed cover: tree-of-thought structured reasoning enabling more systematic problem decomposition; intelligent data sampling methods improving training efficiency by focusing compute on maximally informative examples; 3D generation models achieving high-fidelity object synthesis from single-view images; and cross-disciplinary foundation models demonstrating that architectures trained on diverse scientific data can achieve expert-level performance on domain-specific tasks previously requiring specialized models — suggesting the "one model for all domains" paradigm is becoming increasingly viable.
![[2025 Week 39] MetaX Weekly AI Paper Review](https://metax-images-bucket.s3.ap-southeast-2.amazonaws.com/articles/2025-39-metax-ai-1065592462813989/img-1.webp)