The Benchmark Treadmill
GPT-5 launches with the now-familiar ceremony: a set of benchmark scores that beat every previous model, a list of capability demos that feel genuinely impressive in isolation, and a pricing structure that rewards high-volume enterprise users.
The benchmarks matter less than the commentary suggests. Every lab has learned to optimise for benchmark performance in ways that do not always translate to real-world usefulness. What matters is whether the model is actually better at the things you try to use it for every day.
What Is Actually New
The reasoning improvements are real. GPT-5 handles multi-step problems with significantly less hallucination than its predecessor. The context window is longer and the model degrades more gracefully at long contexts than GPT-4 did.
The multimodal capabilities are also meaningfully upgraded. Image understanding in particular feels qualitatively different — the model can reason about visual content rather than simply describe it.
The Competitive Landscape
OpenAI is releasing GPT-5 into a market that looks nothing like the one GPT-4 arrived in. Anthropic's Claude, Google's Gemini Ultra, and Meta's Llama series have all closed the capability gap substantially.
The frontier is now crowded. OpenAI's moat, if it still exists, is distribution — ChatGPT has over 180 million weekly active users. Model capability alone no longer differentiates.