Google Unveils ''Ironwood TPU''... Industry Proves the AI Infrastructure Transition

AI technology is entering the "Age of Inference" beyond the "Age of Training." In November 2025, Google unveiled its 7th-generation "Ironwood TPU" — the first TPU Google explicitly positioned as dedicated for the "Age of Inference," targeting both large language model training and real-world service inference processing. Performance: up to 10x improvement vs. TPU v5p; 4x+ vs. Trillium (v6e); up to 9,216 chips connectable in a single pod for theoretical 42.5 ExaFLOPS compute capacity. Designed with high-performance HBM memory and expanded inter-chip networks to make thousands of chips operate as a single supercomputer. Google plans to provide Ironwood to Anthropic and other external partners in addition to its own Gemini models — signaling the industry is moving from single-model competition to efficient, sustainable inference infrastructure as the central organizing principle. Ironwood TPU evolution: TPU v1 (2016, matrix multiplication) → v2 (2017, cloud-available, training support) → v3 (2018, HBM, 100+ PFLOPS) → v4 (2021, 2x v3 speed, data center optimization) → v5e/v5p (2023, inference+training, 2x v4) → Trillium/v6e (2024, 4.7x v5e) → Ironwood/v7 (2025, 10x v5p, 42.5 ExaFLOPS). The transition from training-centered to inference-centered infrastructure reflects a maturation of the AI industry: as frontier model development consolidates, the competitive advantage shifts to who can serve those models most efficiently and cost-effectively at scale.