OpenAI pushes further into real-time voice interfaces
OpenAI has added a set of new voice intelligence features to its API, expanding what developers can do with live audio in software products. The company says the new tools are designed to help applications talk with users, transcribe speech and translate conversations as they happen.
The release includes three main capabilities: GPT-Realtime-2, GPT-Realtime-Translate and GPT-Realtime-Whisper. Together, they amount to a broader effort to move beyond simple voice input and output toward systems that can listen, reason, translate and respond in the flow of a live conversation.
What is new
The first model, GPT-Realtime-2, is presented as an upgraded voice model for realistic vocal interaction. OpenAI says it differs from the earlier GPT-Realtime-1.5 because it is built with GPT-5-class reasoning intended to handle more complicated user requests. That signals a push to make voice systems more capable in situations where a conversation is not just a sequence of short prompts, but an exchange requiring more context and decision-making.
The second launch, GPT-Realtime-Translate, is aimed at live translation. OpenAI says it can provide real-time translation that keeps pace with the speaker in a conversational setting. According to the supplied source text, it supports more than 70 input languages and 13 output languages.
The third tool, GPT-Realtime-Whisper, focuses on live speech-to-text transcription. OpenAI says it captures spoken interactions as they occur, giving developers a way to build immediate transcription into their applications.







