
AI & RoboticsMore in AI & Robotics →
OpenAI's IH-Challenge Hardens LLMs Against Manipulation
OpenAI's new IH-Challenge training method teaches frontier models to reliably follow trusted instructions over adversarial ones, improving safety steerability and resistance to prompt injection.
Key Takeaways
- IH-Challenge training teaches models to reliably prioritize high-trust system instructions over adversarial inputs
- The method significantly reduces susceptibility to prompt injection attacks from external content
- Research shows improvements generalize to novel attack patterns beyond the training scenarios
DE
DT Editorial AI··via openai.com