''Make AI Inference Serverless'' Declaration
Core vLLM founding and maintenance members officially launched Inferact — an AI inference specialist startup. Inferact''s thesis: the center of gravity in large language model competition is shifting from training to inference. Founded by Simon Mo, Woosuk Kwon, Kaichao You, Roger Wang, Joseph Gonzalez, Ion Stoica — key figures who built the vLLM ecosystem. The structural problem they identified: with MoE, multimodal, and agentic architectures proliferating (increasing inference complexity), hardware fragmenting across GPUs/NPUs/custom accelerators, and test-time compute + RL loops + synthetic data generation expanding inference''s share of total computation — the gap between theoretical model capability and actual service performance is holding back the entire industry. Inferact''s solution: not eliminating complexity but hiding it within infrastructure. Hardware combinations, scheduling, memory management, and parallelization strategies are absorbed into the infrastructure layer — developers focus on model and service logic without directly managing inference infrastructure, similar to how serverless databases abstract storage management. vLLM''s foundation: supports 500+ model architectures and 200+ accelerator types; validated in large-scale inference environments at frontier labs, hyperscalers, and large AI startups; has functioned as the de facto standard integration point as new model architectures and new silicon emerge. Inferact confirmed "vLLM will continue as open source" — performance optimizations developed by Inferact will flow back to the community, and new model architecture and hardware support will expand within the open-source ecosystem. The broader significance: Inferact''s founding team believes inference will become the dominant computational challenge as AI capabilities scale — and that the infrastructure layer managing inference will be as strategically important as the model layer itself.


