Odyssey, a startup founded by autonomous driving pioneers Oliver Cameron and Jeff Hawke, developed an AI model that allows users to "interact" with streaming videos.
The model is available on the network in "Early Demos" and video frames are generated and streamed every 40 milliseconds. With basic controls, viewers can explore areas in the video, similar to 3D rendered video games.
"Given the current state of the world, the upcoming actions, and the history of nations and actions, the model attempts to predict the next state of the world," Odyssey explained in a blog post. "Powering this is a model of a new world that demonstrates the capabilities, such as generating realistic pixels, maintaining spatial consistency, learning actions from videos, and outputting coherent video streams for 5 minutes or more."
Many startups and large tech companies are chasing the world model, including DeepMind, the world lab of influential AI researcher Fei-Fei Lee, Microsoft and vatart. They believe that one day, the world model could be used to create interactive media such as games and movies, and to run realistic simulations such as robot training environments.
But creatives have different feelings about technology. A recent Wired Investigation found that game studios such as Activision Blizzard have fired dozens of workers and are using AI to cut corners and combat losses. A 2024 study conducted by the Animation Guild, an union representing Hollywood animators and cartoonists, estimates that AI will disrupt more than 100,000 film, television and animation efforts in the United States in the coming months.
On its own terms, Odyssey promises to work with creative professionals, not replace them.
“Interactive video (…) opens the door to a completely new form of entertainment that generates and explores stories as needed without the limitations and costs of traditional production,” the company wrote in a blog post. “Over time, we believe that all content (entertainment, advertising, education, training, travel, etc.) of today’s video will evolve into interactive videos, all powered by Odyssey.”
The Odyssey demo is a bit rough around the edges, which the company acknowledges in its posts. The environments produced by this model are fuzzy and distorted, and in a sense, their layout does not always remain the same. Walk forward for a while or turn around and the surroundings may suddenly be different.
However, the company is expected to improve the model quickly, and can currently stream video from a cluster of NVIDIA H100 GPUs at a rate of 30 frames per second, at a cost of $1-$2 per "user hour".
“Looking forward, we are studying richer world representations that are more faithful while improving temporal stability and persistence,” Odyssey wrote in his post. “At the same time, we expand the action space from motion to world interaction, learning open action from large videos.”
Odyssey takes a different approach than many AI labs in the world modeling space. It designed a 360-degree, backpack-mounted camera system to capture real-world landscapes, which Odyssey believes can serve as the basis for higher quality models rather than just those trained on publicly available data.
So far, Odyssey has raised $27 million from investors including EQT Ventures, GV and Air Street Capital. Ed Catmull, one of Pixar’s co-founders and one of Walt Disney Animation Studios, is a member of the startup’s board of directors.
Last December, Odyssey said it was working on software that allows creators to load scenes generated by their model into tools such as Unreal Engine, Blender, and Adobe After Effects so they can be edited manually.