From General Agents Conquering 3D Open Worlds to Multi-Agent Drug Discovery
Small Model Reasoning Revolution, Memory Hallucination Evaluation, Creative Limits of Safety Alignment

Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds

https://arxiv.org/abs/2511.08892

''Lumine'' presents the first open recipe for developing generalist agents capable of completing long-duration complex missions in real-time within 3D open-world environments. The agent integrates perception, reasoning, and action end-to-end based on a vision-language model (VLM), converting 5Hz raw pixel input into precise 30Hz keyboard/mouse operations and adaptively performing reasoning when necessary. Trained in "Genshin Impact," Lumine completes 5 hours of main story content at human-level efficiency, follows natural language instructions across diverse tasks, and notably demonstrates excellent zero-shot generalization performance in other games like "Wuthering Waves" and "Honkai: Star Rail" without separate training.