LLM Training and Inference Optimization Through Long-Context Attention Separation, Lightweight Memory, RL Stabilization
Autonomous Data Science, Omnimodal, Training-Free 3D Editing, and New Evaluation Benchmarks

MetaX Weekly AI Paper Review -- Week 43 of 2025. Key paper: "A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning" -- first theoretical foundation for sampling-based test-time scaling methods improving LLM reasoning performance. Theoretically analyzes limitations of existing Self-Consistency (high estimation error) and Perplexity (large modeling error), then proposes RPC (Reasoning Pruning and Perplexity Consistency) hybrid method combining advantages of both. RPC eliminates low-probability reasoning paths and accelerates convergence -- achieving similar performance to existing Self-Consistency while reducing sampling costs by 50% and improving reliability. Additional papers covered: long-context attention decomposition enabling efficient processing of very long sequences, lightweight memory architectures for on-device inference, RL training stabilization techniques, autonomous data science agents, omnimodal models processing text/image/audio/video simultaneously, training-free 3D scene editing, and new evaluation benchmarks for multi-step reasoning.