Odyssey, a startup founded by self-driving pioneers Oliver Cameron and Jeff Hawke, has developed an AI model that lets users “interact” with streaming video.
Available on the web in an “early demo,” the model generates and streams video frames every 40 milliseconds. Via basic controls, viewers can explore areas within a video, similar to a 3D-rendered video game.
“Given the current state of the world, an incoming action, and a history of states and actions, the model attempts to predict the next state of the world,” explains Odyssey in a blog post. “Powering this is a new world model, demonstrating capabilities like generating pixels that feel realistic, maintaining spatial consistency, learning actions from video, and outputting coherent video streams for 5 minutes or more.”
A number of startups and big tech companies are chasing after world models, including DeepMind, influential AI researcher Fei-Fei Lee’s World Labs, Microsoft, and Decart. They believe that world models could one day be used to create interactive media, such as games and movies, and run realistic simulations like training environments for robots.
But creatives have mixed feelings about the tech. A recent Wired investigation found that game studios like Activision Blizzard, which has laid off scores of workers, are using AI to cut corners and combat attrition. And a 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, estimated that over 100,000 U.S.-based film, television, and animation jobs will be disrupted by AI in the coming months.
For its part, Odyssey is pledging to collaborate with creative professionals — not replace them.
“Interactive video […] opens the door to entirely new forms of entertainment, where stories can be generated and explored on demand, free from the constraints and costs of traditional production,” writes the company in its blog post. “Over time, we believe everything that is video today — entertainment, ads, education, training, travel, and more — will evolve into interactive video, all powered by Odyssey.”
Odyssey’s demo is a bit rough around the edges, which the company acknowledges in its post. The environments the model generates are blurry and distorted, and unstable in the sense that their layouts don’t always remain the same. Walk forward in one direction for a while or turn around, and the surroundings might suddenly look different.
But the company’s promising to rapidly improve upon the model, which can currently stream video at up to 30 frames per second from clusters of Nvidia H100 GPUs at the cost of $1-$2 per “user-hour.”
“Looking ahead, we’re researching richer world representations that capture dynamics far more faithfully, while increasing temporal stability and persistent state,” writes Odyssey in its post. “In parallel, we’re expanding the action space from motion to world interaction, learning open actions from large-scale video.”
Odyssey is taking a different approach than many AI labs in the world modeling space. It designed a 360-degree, backpack-mounted camera system to capture real-world landscapes, which Odyssey thinks can serve as a basis for higher-quality models than models trained solely on publicly available data.
To date, Odyssey has raised $27 million from investors including EQT Ventures, GV, and Air Street Capital. Ed Catmull, one of the co-founders of Pixar and former president of Walt Disney Animation Studios, is on the startup’s board of directors.
Last December, Odyssey said it was working on software that allows creators to load scenes generated by its models into tools such as Unreal Engine, Blender, and Adobe After Effects so that they can be hand-edited.