WebSockets Articles | Developments Today

Speeding up agentic workflows with WebSockets in the Responses API

OpenAI Says Persistent WebSocket Sessions Cut Agent Loop Latency by Roughly 40%

OpenAI says a redesign of its Responses API agent loop, centered on persistent WebSocket connections and connection-scoped caching, reduced end-to-end latency by about 40% as model inference speeds climbed sharply.

Key Takeaways

OpenAI says agent loops using the Responses API became roughly 40% faster end to end.
The company says inference speed gains made API overhead a much larger bottleneck.

DT Editorial Team·Apr 26, 2026·via openai.com

#WebSockets

OpenAI Says Persistent WebSocket Sessions Cut Agent Loop Latency by Roughly 40%

OpenAI Says Persistent WebSocket Sessions Cut Agent Loop Latency by Roughly 40%