
AI & RoboticsMore in AI & Robotics→
Benchmark Finds AI Systems Often Answer Correctly but Cite the Wrong Evidence
Key Takeaways
- CiteVQA measures both answer correctness and citation correctness in long documents.
- A correct answer with a wrong citation receives no credit under the benchmark’s strict metric.
- Gemini-3.1-Pro-Preview led the test with 76, while GPT-5.4 dropped sharply when citation accuracy was required.
- Researchers say weak attribution makes many systems risky for regulated domains.
DE
DT Editorial Team··via the-decoder.com



