General Intuition makes a large bet on action-labeled video

General Intuition has raised $320 million in Series A funding, a round the company says will help it build AI systems that can perceive, predict, and act in both virtual and physical environments. The financing values the New York-based company at $2.3 billion and lifts its total funding to $454 million, following the $134 million it raised in October.

That headline number is notable on its own, but the more interesting part of the company’s pitch is the data strategy behind it. General Intuition says it is training its models not primarily on written text, conventional robotics datasets, or synthetic simulation output, but on billions of gameplay clips uploaded to Medal, the gaming platform co-founded by chief executive Pim de Witte.

Those clips do more than show what happened on screen. According to the company, they include embedded action labels that record which button a player pressed and when. That means the dataset links visual context to specific human actions over time. For a company trying to train systems that must interpret environments and choose what to do next, that pairing is central.

Why the dataset stands out

Much of the current AI industry is still organized around language. Large foundation models have been built on vast corpora of written words, and many systems extend that approach into images, audio, or code. General Intuition is arguing that this paradigm is not enough for what it calls physical AI.

The company’s stated view is that text descriptions alone cannot provide the kind of grounded, action-oriented learning needed for machines that interact with the world. In its framing, intelligence is not just about describing reality, but about perceiving a situation, deciding on an action, and experiencing the consequences. Gameplay footage, especially when paired with action metadata, offers repeated examples of that cycle across many settings.

This argument is important because it identifies a persistent gap in robotics and embodied AI. Real-world robot training data is expensive and slow to collect. High-quality simulation can help, but building synthetic environments with useful diversity is itself a major undertaking. General Intuition is trying to bypass that bottleneck by tapping a dataset that already captures humans navigating complex environments under changing objectives.

The source material does not claim that game footage is a direct substitute for real-world robotics data, and that distinction matters. Virtual action traces do not automatically solve contact dynamics, sensor noise, or deployment reliability in physical systems. But the company’s thesis is that they can provide large-scale priors for perception, prediction, and decision-making, especially during pretraining.

From words to worlds

General Intuition’s language around its technology is unusually explicit. The company says that truly intelligent machines must move “from words to worlds,” acquiring what it calls a general intuition of reality. In practice, that means developing models that do not merely label scenes or answer prompts, but anticipate how environments change when actions are taken.

To support that ambition, the company says it has been developing two major model classes since its founding in 2015. The first is action models, which decide what action to take. The second is world models, which predict the outcome of those actions. That distinction mirrors a growing split in advanced AI research between systems that choose and systems that simulate consequences.

The company also says it is testing world models as training environments for agentic models. If that approach works, it could create a feedback loop in which learned environment models help generate training opportunities for decision-making systems, reducing dependence on costly real-world data collection. The source text does not provide benchmarks or external validation, but the concept aligns with broader industry efforts to make embodied AI more sample-efficient.

Investors are backing the approach aggressively

The financing itself suggests that investors see the company’s premise as more than a niche experiment. General Catalyst led the round, with participation from Jeff Bezos and former Google chief executive Eric Schmidt. The size of the raise indicates that capital markets remain willing to fund ambitious embodied-AI bets, particularly when those bets combine a differentiated data source with a broad platform story.

General Intuition says it will use the new funding to expand compute capacity and pretrain the next version of its model. Those are expensive steps, but they fit the current economics of frontier AI development. Unique data may create the initial edge, yet turning that edge into useful models still requires substantial infrastructure, engineering, and iteration.

The company also plans to make its API more broadly available this summer, according to the supplied source text. That detail matters because it suggests General Intuition is not limiting itself to a research narrative. It is attempting to become an infrastructure layer others can build on, whether for robotics, agents in simulated environments, or systems that bridge the two.

What this means for robotics and embodied AI

The larger significance of the announcement is strategic. Robotics developers have long struggled with a mismatch between the complexity of real-world behavior and the scarcity of scalable training data. General Intuition’s answer is to use human gameplay as a bridge: a vast archive of perception-action examples collected outside the robotics industry but potentially useful to it.

If that works, it could expand the range of data pipelines available to embodied-AI companies. Instead of choosing mainly between expensive real-world collection and fully synthetic environments, developers might increasingly rely on hybrid approaches that exploit naturally occurring human interaction data in virtual settings.

There are still open questions not answered in the supplied material, including how well gameplay-derived models transfer into physical robots, what domains benefit most, and how performance is evaluated against more conventional approaches. But the company does not need to settle all of those questions immediately to influence the market. A $320 million Series A is itself a signal that investors believe the next phase of AI competition may be defined less by who has the most text and more by who has the richest action-grounded data.

For now, General Intuition has established three clear facts. It has raised a substantial new round, it is training on billions of gameplay clips with embedded action labels, and it is using that data to pursue models meant to perceive, predict, and act across virtual and physical environments. In a sector searching for scalable ways to train more capable machines, that is enough to make the company one of the more closely watched embodied-AI players of the moment.

This article is based on reporting by The Robot Report. Read the original article.

Originally published on therobotreport.com