
Benchmark Finds AI Systems Often Answer Correctly but Cite the Wrong Evidence
A new benchmark called CiteVQA shows that leading AI models frequently give accurate document answers while failing to identify the actual supporting passage, a gap researchers call attribution hallucination.
- CiteVQA measures both answer correctness and citation correctness in long documents.
- A correct answer with a wrong citation receives no credit under the benchmark’s strict metric.




