A publishing integrity problem is becoming measurable at scale
A large review of biomedical papers has found a steep increase in fabricated references since 2023, raising concern that AI-generated citation errors are slipping into literature that helps shape clinical understanding and, in some cases, guidelines. According to the supplied source text, researchers from Columbia University and other institutions examined 2.47 million papers in the open PubMed Central archive, covering publications from January 2023 through February 2026. Out of 97.1 million references checked, 4,046 were flagged as fabricated across 2,810 papers.
The scale of the dataset matters. Concerns about made-up citations from language models have circulated for years, but the supplied material frames this as the largest review yet of biomedical references. That turns an anecdotal problem into something closer to a systemic warning. If fabricated references are appearing across thousands of papers, the issue is no longer confined to isolated mistakes or amateur misuse. It becomes a challenge for scientific publishing workflows themselves.
The most striking point in the source text is the trend line. Throughout 2023, the rate reportedly stayed around four fabricated references per 10,000 papers. Beginning in mid-2024, it climbed sharply, reaching 51.3 per 10,000 papers by the end of 2025 and 56.9 per 10,000 in the first seven weeks of 2026. That is more than a twelvefold increase relative to the earlier baseline.
The timing strengthens the AI hypothesis, but does not prove exclusivity
The authors cited in the source text see an obvious likely connection to the widespread use of language models such as ChatGPT. Their reasoning is chronological as well as technical. Since general-purpose text generators became widely adopted after late 2022, and scholarly publication often takes 100 to 200 days from submission to appearance, the effect of AI-assisted drafting would be expected to become visible in archives like PubMed Central around mid-2024. That is precisely where the reported spike begins.
At the same time, the source material notes that the researchers do not rule out other causes. Paper-mill activity and changes in indexing practices are both mentioned as possible contributing factors. That caution is important. The data appear consistent with AI-driven citation fabrication becoming more common, but the source does not claim exclusive proof that language models alone explain every case.
Still, the logic is persuasive. Large language models are known to produce references that look plausible, follow the correct format, and even attach real researchers to nonexistent papers. In a high-throughput academic environment, those errors can survive if neither authors nor editors validate them carefully.
The problem is not just fake references, but credible-looking fake references
One of the most alarming details in the supplied material is how difficult these fabricated citations can be to detect by inspection. The source text says the false references often match the paper’s topic, use proper formatting, credit real researchers, and include believable publication years. In one cited example, a urology paper contained 18 fabricated references out of 30 checked.
That is what makes the issue especially dangerous in biomedical publishing. A visibly broken citation can be caught quickly. A polished but nonexistent one can move through peer review and into the published record if no one verifies it against trusted databases. The study’s definition of “fabricated” reflects that concern: a cited title was flagged if it could not be found in PubMed, Crossref, OpenAlex, or Google Scholar.
The source material also stresses where the risk becomes more consequential. Fabricated references are especially troubling when they appear in review articles, because those papers often synthesize evidence for broader audiences and can influence clinical guidelines. If the scaffolding of a review contains made-up literature, the downstream effects can extend beyond a single publication.
The proposed response is more automation, not less scrutiny
The researchers, according to the source text, call for automated reference checks before publication and retroactive screening of already published papers. That recommendation is practical because the problem itself is partly one of scale. Human reviewers cannot realistically verify every citation manually across millions of papers, especially when fake references are designed to look legitimate.
The source material notes that platforms such as arXiv have already introduced initial sanctions for AI-related errors. That signals movement toward stricter norms, but biomedical publishing likely needs more than warnings. Reference validation may have to become a routine technical step in editorial pipelines, much like plagiarism checks or image screening.
There is a broader lesson here as well. AI tools can lower the cost of drafting text, but they can also lower the cost of producing authoritative-looking falsehoods. In scientific communication, that tradeoff is especially dangerous because readers often assume that the citation apparatus has already been vetted. Once that assumption weakens, trust in the literature erodes.
The integrity challenge is now part of the AI adoption story
The new audit suggests fabricated citations are no longer a fringe issue in biomedical publishing. They are appearing often enough, and rising fast enough, to demand process changes. Whether the main driver is language-model misuse, paper mills, or a combination of causes, the practical implication is the same: references can no longer be treated as reliable simply because they look professional.
That is a serious problem for any field, but especially for one whose reviews and syntheses can help shape clinical guidance. The lesson from the source material is not that AI must be excluded from research workflows. It is that AI-assisted writing without rigorous verification can contaminate the evidentiary chain. Once that happens at scale, the credibility costs spread well beyond a single paper.
- An audit of 2.47 million biomedical papers found 4,046 fabricated references across 2,810 papers.
- The rate reportedly rose more than twelvefold from 2023 to early 2026.
- The researchers see language models as a likely driver, while not excluding other causes.
- Fake citations are especially risky in review articles that influence clinical understanding and guidelines.
- The study’s authors call for automated reference checks and retroactive screening.
This article is based on reporting by The Decoder. Read the original article.
Originally published on the-decoder.com





