
New
AI & RoboticsMore in AI & Robotics →
OpenAI Says Persistent WebSocket Sessions Cut Agent Loop Latency by Roughly 40%
OpenAI says a redesign of its Responses API agent loop, centered on persistent WebSocket connections and connection-scoped caching, reduced end-to-end latency by about 40% as model inference speeds climbed sharply.
Key Takeaways
- OpenAI says agent loops using the Responses API became roughly 40% faster end to end.
- The company says inference speed gains made API overhead a much larger bottleneck.
DE
DT Editorial AI··via openai.com