OpenAI launches real-time voice, translation and transcription API features

OpenAI pushes further into real-time voice interfaces

OpenAI has added a set of new voice intelligence features to its API, expanding what developers can do with live audio in software products. The company says the new tools are designed to help applications talk with users, transcribe speech and translate conversations as they happen.

The release includes three main capabilities: GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. Together, they amount to a broader effort to move beyond simple voice input and output toward systems that can listen, reason, translate and respond in the flow of a live conversation.

What is new

The first model, GPT-Realtime-2, is presented as an upgraded voice model for realistic vocal interaction. OpenAI says it differs from the earlier GPT-Realtime-1.5 because it is built with GPT-5-class reasoning intended to handle more complicated user requests. That signals a push to make voice systems more capable in situations where a conversation is not just a sequence of short prompts, but an exchange requiring more context and decision-making.

The second launch, GPT-Realtime-Translate, is aimed at live translation. OpenAI says it can provide real-time translation that keeps pace with the speaker in a conversational setting. According to the supplied source text, it supports more than 70 input languages and 13 output languages.

The third tool, GPT-Realtime-Whisper, focuses on live speech-to-text transcription. OpenAI says it captures spoken interactions as they occur, giving developers a way to build immediate transcription into their applications.

News

Federal regulators are investigating Avride after identifying 16 crashes involving the company’s self-driving system, including one incident that caused a minor injury.

DT Editorial AI·May 8, 2026·via techcrunch.com

News

French authorities have opened a criminal investigation into Elon Musk, X, xAI, and former X chief Linda Yaccarino, escalating an inquiry into illegal content and raising the stakes after ignored summonses.

DT Editorial AI·May 8, 2026·via arstechnica.com

News

Micromobility company Lime has filed to go public, showing strong revenue growth but also a heavy near-term debt burden that it says raises substantial doubt about its ability to continue without fresh financing.

DT Editorial AI·May 8, 2026·via techcrunch.com

News

A Phoenix startup is targeting one of healthcare’s least visible choke points: the manual intake and scheduling work that often delays specialist access after a referral is written.

Why this matters for developers

Real-time audio has been a major technical and product challenge for AI developers because useful voice systems need to do more than recognize words. They have to manage latency, maintain conversational coherence and respond in ways that feel natural enough for users to keep talking. By bundling reasoning, translation and transcription into API products, OpenAI is trying to make that stack easier to access.

The company’s own description of the release is revealing. OpenAI said the models move real-time audio from simple call-and-response toward voice interfaces that can do work while a conversation unfolds. That is an important distinction. A voice bot that merely replies is one thing. A system that can listen, interpret, translate, transcribe and potentially act within the same interaction is a more ambitious platform component.

Customer service is the most obvious near-term use case, and OpenAI explicitly points to that category. But the company also says the tools could be useful in education, media, events and creator platforms. Those examples suggest a market not only for voice assistants but for multilingual live workflows and conversational applications that need a running transcript or translation layer.