Meta’s AI chief says world models are key to ‘human-level AI’ — but it might be 10 years out



Are today’s AI models truly remembering, thinking, planning, and reasoning, just like a human brain would? Some AI labs would have you believe they are, but according to Meta’s chief AI scientist Yann LeCun, the answer is no. He thinks we could get there in a decade or so, however, by pursuing a new method called a “world model.”

Earlier this year, OpenAI released a new feature it calls “memory” that allows ChatGPT to “remember” your conversations. The startup’s latest generation of models, o1, displays the word “thinking” while generating an output, and OpenAI says the same models are capable of “complex reasoning.”

That all sounds like we’re pretty close to AGI. However, during a recent talk at the Hudson Forum, LeCun undercut AI optimists, such as xAI founder Elon Musk and Google DeepMind co-founder Shane Legg, who suggest human-level AI is just around the corner.

“We need machines that understand the world; [machines] that can remember things, that have intuition, have common sense, things that can reason and plan to the same level as humans,” said LeCun during the talk. “Despite what you might have heard from some of the most enthusiastic people, current AI systems are not capable of any of this.”

LeCun says today’s large language models, like those which power ChatGPT and Meta AI, are far from “human-level AI.” Humanity could be “years to decades” away from achieving such a thing, he later said. (That doesn’t stop his boss, Mark Zuckerberg, from asking him when AGI will happen, though.)

The reason why is straightforward: those LLMs work by predicting the next token (usually a few letters or a short word), and today’s image/video models are predicting the next pixel. In other words, language models are one-dimensional predictors, and AI image/video models are two-dimensional predictors. These models have become quite good at predicting in their respective dimensions, but they don’t really understand the three-dimensional world.

Because of this, modern AI systems cannot do simple tasks that most humans can. LeCun notes how humans learn to clear a dinner table by the age of 10, and drive a car by 17 – and learn both in a matter of hours. But even the world’s most advanced AI systems today, built on thousands or millions of hours of data, can’t reliably operate in the physical world.

In order to achieve more complex tasks, LeCun suggests we need to build three dimensional models that can perceive the world around you, and center around a new type of AI architecture: world models.

“A world model is your mental model of how the world behaves,” he explained. “You can imagine a sequence of actions you might take, and your world model will allow you to predict what the effect of the sequence of action will be on the world.”

Consider the “world model” in your own head. For example, imagine looking at a messy bedroom and wanting to make it clean. You can imagine how picking up all the clothes and putting them away would do the trick. You don’t need to try multiple methods, or learn how to clean a room first. Your brain observes the three-dimensional space, and creates an action plan to achieve your goal on the first try. That action plan is the secret sauce that AI world models promise.

Part of the benefit here is that world models can take in significantly more data than LLMs. That also makes them computationally intensive, which is why cloud providers are racing to partner with AI companies.

World models are the big idea that several AI labs are now chasing, and the term is quickly becoming the next buzzword to attract venture funding. A group of highly-regarded AI researchers, including Fei-Fei Li and Justin Johnson, just raised $230 million for their startup, World Labs. The “godmother of AI” and her team is also convinced world models will unlock significantly smarter AI systems. OpenAI also describes its unreleased Sora video generator as a world model, but hasn’t gotten into specifics.

LeCun outlined an idea for using world models to create human-level AI in a 2022 paper on “objective-driven AI,” though he notes the concept is over 60 years old. In short, a base representation of the world (such as video of a dirty room, for example) and memory are fed into an world model. Then, the world model predicts what the world will look like based on that information. Then you give the world model objectives, including an altered state of the world you’d like to achieve (such as a clean room) as well as guardrails to ensure the model doesn’t harm humans to achieve an objective (don’t kill me in the process of cleaning my room, please). Then the world model finds an action sequence to achieve these objectives.

Meta’s longterm AI research lab, FAIR or Fundamental AI Research, is actively working towards building objective-driven AI and world models, according to LeCun. FAIR used to work on AI for Meta’s upcoming products, but LeCun says the lab has shifted in recent years to focusing purely on longterm AI research. LeCun says FAIR doesn’t even use LLMs these days.

World models are an intriguing idea, but LeCun says we haven’t made much progress on bringing these systems to reality. There’s a lot of very hard problems to get from where we are today, and he says it’s certainly more complicated than we think.

“It’s going to take years before we can get everything here to work, if not a decade,” said Lecun. “Mark Zuckerberg keeps asking me how long it’s going to take.”




Source