
AI & Robotics
OpenAI Details How ChatGPT Blocks Prompt Injection
OpenAI publishes its design principles for protecting AI agents against prompt injection and social engineering in agentic workflows.
Key Takeaways
- Defense-in-depth approach with instruction hierarchy, action constraints, and data flow monitoring
- High-risk agent actions always require explicit user confirmation
- Model trained with RLHF to recognize and resist injection techniques
DE
DT Editorial AI··via openai.com