MetaX Weekly AI Paper Review Week 25 2025 -- Active Research in Innovative Architecture for Context Expansion and Efficiency Improvement of Large Language Models, AI Performance Improvement Through Multilingual Multimodal Benchmark Development, Feedback Integration, and Test-Time Computation Optimization: MiniMax-M1: World first open-weight large-scale reasoning model supporting 1 million token context combining hybrid MoE architecture and lightning attention. MultiFinBen: First multilingual multimodal benchmark specialized in the financial domain evaluating real financial communication ability of LLMs. Scientists First Exam: Scientific MLLM benchmark evaluating scientific cognitive ability in three stages of signal recognition, attribute understanding, and comparative reasoning. DeepResearch Bench: Deep research agent benchmark consisting of 100 PhD-level research tasks evaluating web navigation, information retrieval, and synthesis capability. Scaling Test-time Compute for LLM Agents: Various test-time scaling strategies. Additional papers covered: context window extension techniques enabling models to process much longer inputs; efficiency improvements reducing memory and compute requirements for inference; multilingual capability evaluation ensuring AI systems work across diverse languages; and feedback integration methods enabling AI systems to improve based on human evaluation signals.