Revolutionary Advances in LLM Architecture and Training Efficiency, Multimodal and Reasoning Optimization
Web Agent, Low-bit Attention, 3D Scene Representation and Generation Technology Advances for Next-Generation AI

This week''s META-X AI paper review covers LLM architecture innovation, multimodal advances, reasoning optimization, and 3D generation.

LLM Architecture and Training Efficiency: Qwen3 Technical Report presents LLMs integrating "thinking mode" and "non-thinking mode" with "thinking budget" mechanism for dynamic mode switching and flexible compute allocation; expanded from 29 to 119 language support; achieves top performance on coding and math reasoning benchmarks with Apache 2.0 open-source release. Chain-of-Model (CoM/CoLM) introduces chain-form causal relationships to LLM training for improved efficiency and flexibility. Quantization-aware training (QAT) analysis provides foundation for 4-bit quantization efficient deployment.

Multimodal and Reasoning: BAGEL supports integrated text, image, and video learning for complex multimodal reasoning and generation. MMaDA achieves state-of-the-art performance across diverse multimodal tasks. MMLongBench provides evaluation benchmarks. GuardReasoner-VL improves VLM safety through RL-enhanced harmful content detection. AdaptThink uses RL to select optimal "thinking mode" based on problem difficulty. AdaCoT reduces costs by activating Chain-of-Thought only when complex reasoning is needed.

Web Agents and 3D Generation: Web-Shepherd introduces the first reward model specialized for web navigation. NovelSeek presents a multi-agent framework for autonomous scientific research from hypothesis to verification. SageAttention3 dramatically improves large model inference efficiency through FP4 Tensor Cores and 8-bit low-bit attention. 3D-4DGS uses hybrid 3D/4D Gaussian separation for efficient dynamic 3D scene representation. 3DTown generates realistic consistent 3D city scenes from a single top-down image without training.