OpenAI GPT-5.4 Thinking System Card Released

What Is GPT-5.4 Thinking?

OpenAI has released its latest frontier reasoning model, GPT-5.4 Thinking, alongside a detailed system card documenting the model's capabilities, safety evaluations, and limitations. The release marks another step in OpenAI's push to develop AI systems capable of tackling complex, multi-step problems through extended reasoning chains before delivering final answers to users.

Unlike standard language models that generate responses token-by-token without deliberation, GPT-5.4 Thinking uses chain-of-thought reasoning — working through problems internally before committing to an output. This architecture enables the model to handle mathematical proofs, complex coding tasks, scientific reasoning, and nuanced logical analysis with substantially greater accuracy than earlier systems.

The system card, which OpenAI publishes for all frontier models, provides a transparent view of how the company evaluates AI before deployment. It covers safety benchmarks, red-team results, potential misuse risks, and the specific mitigations implemented — giving researchers and enterprise customers the information they need to assess appropriate use cases for the new model.

Safety Evaluations and Red-Teaming Results

Safety testing for GPT-5.4 Thinking followed OpenAI's Preparedness Framework, evaluating the model across cybersecurity threats, biological and chemical weapons enablement, radiological risk, and autonomous resource acquisition. The system card places GPT-5.4 Thinking in the Medium overall risk category, meaning it can be deployed with standard safety mitigations in place without triggering additional restrictions.

Red-team evaluations tested the model's resistance to jailbreaks, indirect prompt injection, and multi-step adversarial manipulation. GPT-5.4 Thinking demonstrated improved resistance to many attack vectors compared to prior generations, though it remains imperfect against highly sophisticated adversarial inputs — a caveat that applies to all current AI systems regardless of training sophistication.

Evaluations of persuasion and manipulation capabilities found that the model's safety training substantially reduces its willingness to produce content designed to deceive or coerce users. OpenAI also evaluated behavior in agentic settings, where the model might take sequences of actions with real-world consequences, and found performance within acceptable safety parameters for the Medium classification threshold.

AI & Robotics

AI News reports that Google is testing a staff-only Gemini-linked agent called Remy, designed to take actions for users in work and daily tasks, with user control becoming a central design concern.

DT Editorial AI·May 7, 2026·via artificialintelligence-news.com

AI & Robotics

As pressure on the NHS shows little sign of easing, AI is increasingly being framed as a tool to reduce burdens on doctors and frontline staff rather than as a wholesale replacement for clinical work.

DT Editorial AI·May 7, 2026·via artificialintelligence-news.com

AI & Robotics

The U.S. government has added more AI suppliers for defense work and is reconsidering Anthropic’s role, signaling a shift toward a broader and more actively managed procurement posture.

DT Editorial AI·May 7, 2026·via artificialintelligence-news.com

AI & Robotics

Ahead of the AI & Big Data Expo in San Jose, HP used an interview with its AI and Data Science business development lead to frame its enterprise focus around AI, processing, and data strategy.

What GPT-5.4 Thinking Means for Enterprise AI

For organizations deploying AI in complex workflows, GPT-5.4 Thinking represents a meaningful capability upgrade over previous reasoning models. Improved reasoning makes it better suited for tasks that currently require extensive human review — contract analysis, scientific literature synthesis, complex debugging, and multi-document summarization with nuanced synthesis requirements.

Enterprise API access is available through OpenAI's standard pricing tiers. Extended thinking is available at higher token costs reflecting the additional compute involved, a tradeoff that organizations will need to evaluate against the quality improvements for their specific use cases. OpenAI has committed to ongoing safety monitoring and will update the system card as new capabilities or risks are discovered through deployment.

The release continues a pattern of OpenAI publishing detailed safety documentation alongside capability releases — a practice that sets a transparency standard other major AI developers are under increasing pressure to match. As reasoning models become core infrastructure for enterprise AI, the quality and depth of these evaluations will become an important factor in procurement and deployment decisions across industries.

This article is based on reporting by OpenAI. Read the original article.