The Reference Institutions Strike Back
Encyclopedia Britannica and Merriam-Webster, two of the most venerable reference publishers in the English-speaking world, have filed a lawsuit against OpenAI, alleging that the company trained its ChatGPT models on their copyrighted content without authorization. The suit argues that OpenAI has effectively turned the intellectual work of generations of editors, lexicographers, and subject-matter experts into training data for a commercial AI system — and is now using the resulting capabilities to compete directly with the original publishers for the web traffic and advertising revenue that sustains their operations.
The core claim is familiar from a growing body of AI copyright litigation: that training a large language model on copyrighted text constitutes copyright infringement, regardless of whether the model memorizes specific passages or merely incorporates patterns and knowledge from the training corpus. What distinguishes this suit is the directness of the competitive harm argument — these are organizations whose business model depends on users coming to their websites to look up information, users who are now getting their questions answered by ChatGPT instead.
The Traffic Cannibalization Problem
The plaintiffs allege that ChatGPT is cannibalizing their traffic — a blunt term for a phenomenon reshaping information economics across many sectors. When a user asks ChatGPT to explain a historical event, define a word, or summarize a topic, and receives a fluent comprehensive answer, there is no reason for that user to visit Britannica or Merriam-Webster. The reference lookup that might have generated a page view and advertising revenue now happens entirely within the ChatGPT interface.
This dynamic is existential for reference publishers in a way it might not be for news organizations or creative content creators. Britannica's business model — which pivoted from print encyclopedia sales to digital subscription after the internet emerged — depends on users having a reason to come to Britannica specifically. If AI assistants can answer encyclopedia-level questions reliably, the traffic rationale for the Britannica subscription may erode entirely.
Merriam-Webster faces a similar problem. Dictionary lookups have been a staple of web traffic since the early internet era, sustaining advertising-supported dictionary sites. AI models that can define words, explain etymology, provide usage examples, and clarify nuances of meaning — drawing on training data that almost certainly included Merriam-Webster's dictionary content — are a direct substitute for the product Merriam-Webster sells.
The Legal Theory and Its Precedents
The copyright infringement theory in AI training cases has been contested across multiple fronts since the New York Times filed its landmark suit against OpenAI and Microsoft in late 2023. OpenAI's primary defense — that training on publicly available content constitutes fair use — has not yet been fully adjudicated, and courts have issued mixed signals about the strength of the argument.
The fair use analysis involves four factors: the purpose and character of the use, the nature of the copyrighted work, the amount used, and the effect on the market for the original work. For reference publishers specifically, the fourth factor — market effect — may be the most compelling element of their case. If they can demonstrate measurable declines in traffic and revenue causally linked to OpenAI's training on their content, they have evidence that goes beyond speculation about hypothetical harm.
At the same time, OpenAI's fair use argument is stronger for factual reference content than it might be for creative works. Copyright protects expression, not facts — encyclopedias cannot claim copyright in historical events or scientific findings themselves, only in the specific language used to describe them. This may limit the scope of relief Britannica and Merriam-Webster can ultimately obtain even if their infringement claim succeeds.
A Broader Pattern of Publisher Resistance
The suit joins a substantial body of AI copyright litigation. The Authors Guild, various news organizations, record labels, visual artists, and code platforms have all filed or threatened suits. OpenAI has settled with some publishers, most notably the Associated Press, and has licensed content from others including News Corp and The Atlantic.
The pattern suggests that OpenAI is selectively settling with content creators whose ongoing cooperation has strategic value — news organizations whose content can keep models current — while contesting claims from parties where training data was historical rather than ongoing. Whether Britannica and Merriam-Webster fall into a category where settlement is more valuable than litigation will depend on negotiating leverage, litigation costs, and OpenAI's assessment of the legal risk the case poses to its broader fair use arguments.
This article is based on reporting by Gizmodo. Read the original article.

