Campbell Brown wants AI’s information layer judged by experts, not engagement metrics

Campbell Brown has spent years in the business of deciding how information is surfaced, checked, and trusted online. Now she is making the case that the next information bottleneck is not social media feeds but generative AI systems, and that the industry is still not treating the problem with enough seriousness. Her new company, Forum AI, is built around a simple premise: if large models are becoming a primary channel through which people understand the world, then their answers on sensitive subjects need to be tested against standards designed by domain experts.

Brown’s concern is not abstract. In remarks discussed by TechCrunch, she described AI as an increasingly central funnel for information and argued that performance on “high-stakes topics” remains weak. Those topics include geopolitics, mental health, finance, and hiring, areas where incomplete or distorted responses can have real-world consequences and where the correct answer is often not binary. That ambiguity is exactly why Brown believes the industry needs better evaluation tools rather than more confidence in model intuition.

Forum AI’s model is expert consensus translated into scalable testing

Forum AI’s approach starts with recruiting recognized specialists to design the benchmarks. Brown said the company identifies leading experts in a field, asks them to architect the evaluation framework, and then trains AI judges to score model outputs at scale. In its geopolitics work, Forum AI has assembled a strikingly high-profile roster that includes Niall Ferguson, Fareed Zakaria, former Secretary of State Tony Blinken, former House Speaker Kevin McCarthy, and Anne Neuberger, a former Obama administration cybersecurity official.

The operational goal is not to eliminate disagreement altogether. Brown said Forum AI aims to get its AI judges to about 90% consensus with human experts. By her account, the company has been able to reach that threshold. The implication is that Forum AI sees evaluation itself as a technical product: a system that can turn expert judgment, normally expensive and slow, into repeatable testing across many model outputs.

That matters because the most influential model companies are heavily measured on areas like coding and math, where automated benchmarking is easier. Brown’s critique is that the problems users encounter in daily life often sit somewhere else. Questions about politics, health, money, or employment are loaded with context, perspective, and value conflicts. They are harder to grade, but also harder to dismiss as peripheral.

The warning comes from someone who watched social platforms optimize for the wrong outcome

Brown’s argument carries extra weight because it is shaped by her experience at Facebook, where she served as the company’s first and only dedicated news chief. She told TechCrunch that she recognized the stakes soon after ChatGPT’s public release while she was still at Meta. The shift, in her view, was immediate: AI tools were poised to become a dominant path through which people seek and receive information.

That perspective also explains why she is focused on incentives. Brown said what frustrated her most was that accuracy did not appear to be a leading priority for foundation model companies. In her telling, the major labs are highly focused on coding and math performance, while informational accuracy is more difficult to standardize and therefore easier to defer. Her response is that difficulty does not make the problem optional.

The comparison to social media is direct. Brown said she saw firsthand what happens when a platform optimizes for the wrong goal, and she described Meta’s earlier efforts in news and fact-checking as having failed in important ways. The lesson she draws is not simply that moderation is hard. It is that systems built around engagement can drift away from social value, even when the damage becomes obvious in hindsight.

What Forum AI says current models are getting wrong

Brown’s criticism of current model behavior is specific enough to suggest the company sees consistent patterns rather than isolated hallucinations. She cited Gemini pulling from Chinese Communist Party websites for stories unrelated to China and said nearly all major models display a left-leaning political bias. She also pointed to subtler failures: missing context, missing perspectives, and arguments that straw-man opposing views without clearly signaling the weakness of the representation.

Those complaints speak to a broader problem in AI evaluation. A model can appear fluent, fast, and useful while still presenting information through a narrow or unstable lens. If the output omits relevant framing, fails to reflect the range of serious viewpoints, or leans on weak sourcing, users may receive something that sounds authoritative but is structurally misleading. Brown’s claim is that these are not cosmetic flaws. On high-stakes topics, they are product failures.

She also argued that many of the fixes are relatively straightforward. While she did not lay out a full technical blueprint in the cited discussion, the comment implies that some of the quality gap comes from priorities, testing design, and feedback loops rather than from unsolved frontier research alone.

A new front in the AI competition

Forum AI was founded 17 months ago in New York, which puts it in the middle of a fast-forming market for AI governance infrastructure. Companies building foundation models are under pressure from regulators, enterprise customers, and the public to demonstrate that their systems behave responsibly in areas that affect livelihoods, politics, health, and security. Brown is positioning Forum AI as a company that can quantify whether they do.

That is a notable shift in where value may accrue in the AI stack. The biggest labs still dominate model training and distribution, but a parallel layer is emerging around auditing, benchmarking, and independent evaluation. If Brown is right that AI systems are becoming the default route through which many users consume information, then tools that assess quality on contested topics could become as strategically important as the models themselves.

There is also a cultural split embedded in her comments. Brown said one conversation is happening in Silicon Valley while a very different one is taking place among consumers. The suggestion is that builders may still be preoccupied with performance metrics that do not map neatly to the anxieties of ordinary users, especially parents, voters, patients, and workers. Forum AI’s pitch is that those anxieties can be converted into a measurable standard.

The bigger question is who gets to define “good” AI information

Brown’s company does not resolve the philosophical problem at the heart of AI information systems: who should decide what counts as balanced, accurate, or sufficiently contextualized on topics where experts disagree. What Forum AI offers instead is a procedural answer. Pick recognized experts, build explicit benchmarks, train scoring systems against their judgment, and make the tradeoffs visible.

Whether that model becomes widely accepted is still an open question. But Brown has identified a weakness that is increasingly hard for the industry to avoid. Generative AI is no longer judged only by how well it writes code or solves equations. It is being judged by how it mediates understanding in messy, consequential domains. If that layer becomes the new gateway to public knowledge, then the struggle over benchmark design may turn out to be one of the most important fights in AI.

This article is based on reporting by TechCrunch. Read the original article.

Originally published on techcrunch.com