A stronger model with an old problem still attached
OpenAI’s GPT-5.5 has arrived with the kind of headline that usually defines a major model release: it now sits at the top of the Artificial Analysis Intelligence Index, ahead of leading competitors from Anthropic and Google, according to the supplied source text. On the performance side, that makes the launch easy to summarize. The harder part is that the same report describes a persistent and serious weakness: hallucination.
The Decoder’s account presents GPT-5.5 as a model that improves the frontier price-performance picture without solving one of large language models’ most stubborn behavioral flaws. That combination is increasingly central to how advanced AI systems should be evaluated. Better scores and better efficiency matter. So does whether a model knows when it does not know.
What improved
The source says GPT-5.5 reaches 60 points on the Artificial Analysis Intelligence Index, putting it three points ahead of Claude Opus 4.7 and Gemini 3.1 Pro Preview, which were tied at 57. It also says the model uses about 40 percent fewer tokens than GPT-5.4. That token reduction is important because it changes the economics of the release.
Nominally, GPT-5.5’s API price doubled to $5 per million input tokens and $30 per million output tokens, compared with GPT-5.4. But lower token consumption softens that increase in practice. The source estimates the effective cost rise at about 20 percent once efficiency gains are accounted for. In benchmark terms, it also argues that GPT-5.5 can reach Claude Opus 4.7-level scores at medium compute for much less cost than Anthropic’s model at maximum settings.
That is the kind of tradeoff developers actually notice. The frontier model race is no longer just about who can top a leaderboard. It is about whether performance gains arrive with reasonable token usage, manageable latency, and enough reliability to justify production deployment. On those terms, GPT-5.5 appears to strengthen OpenAI’s position.






