IQ Scores Converted from Norway Mensa Test Results for 4 AI Models
Comparison of Results Obtained by Inputting Standardized Prompts to Each AI Model

Visual Capitalist and TrackingAI.org published comparative AI "intelligence" results from Norway Mensa's IQ test (35 non-verbal reasoning puzzles evaluating pattern recognition, logical reasoning, and spatial perception). Text-only models: prompts converted visual elements to text descriptions. Multimodal models: images provided directly as visual input. Results: OpenAI o3 scored IQ 135 ("Genius" by human standards, #1); Anthropic Claude 4 Sonnet scored 127; Google Gemini 2.0 Flash scored 126 (all in "highly gifted" range). Multimodal models performed significantly lower: GPT-4o Vision and Grok 3 Think Vision scored 60-70 range (below human average). The dramatic gap between text-only and multimodal performance on the same visual reasoning tasks reveals fundamental architectural differences — text-only models received linguistically-translated problem descriptions enabling logical reasoning about described relationships; multimodal models processed raw images directly, exposing limitations in current vision-language integration for abstract pattern recognition. What IQ scores do and don't capture for AI: the test measures specific reasoning patterns in human-designed abstract visual puzzles; AI models may excel at these while failing at tasks trivial for humans (understanding humor, navigating social contexts, physical world reasoning). Conversely, AI models performing at genius-level on standardized IQ tests while struggling with visual processing highlights how AI "intelligence" is domain-specific rather than general — a structurally different kind of cognition than human intelligence, which evolved for embodied, social, and physical-world reasoning.