
AI agent “skills” show limited gains once testing looks more like the real world
A large study of more than 34,000 real-world agent skills suggests the modular instructions praised in benchmark settings deliver far smaller gains when models must find and apply them on their own.
- Researchers tested 34,198 real-world skills from open-source repositories.
- The study argues existing benchmarks overstate gains by handing agents highly task-specific instructions.




