An AI Learned to Play Atari 6,000 Times Faster by Reading the Instructions

Despite impressive progress, today’s AI models are very inefficient learners, taking huge amounts of time and data to solve problems humans pick up almost instantaneously. A new approach could drastically speed things up by getting AI to read instruction manuals before attempting a challenge.

One of the most promising approaches to creating AI that can solve a diverse range of problems is reinforcement learning, which involves setting a goal and rewarding the AI for taking actions that work towards that goal. This is the approach behind most of the major breakthroughs in game-playing AI, such as DeepMind’s AlphaGo.

As powerful as the technique is, it essentially relies on trial and error to find an effective strategy. This means these algorithms can spend the equivalent of several years blundering through video and board games until they hit on a winning formula.

Thanks to the power of modern computers, this can be done in a fraction of the time it would take a human. But this poor “sample-efficiency” means researchers need access to large numbers of expensive specialized AI chips, which restricts who can work on these problems. It also seriously limits the application of reinforcement learning to real-world situations where doing millions of run-throughs simply isn’t feasible.

Now a team from Carnegie Mellon University has found a way to help reinforcement learning algorithms learn much faster by combining them with a language model that can read instruction manuals. Their approach, outlined in a pre-print published on arXiv, taught an AI to play a challenging Atari video game thousands of times faster than a state-of-the-art model developed by DeepMind.

“Our work is the first to demonstrate the possibility of a fully-automated reinforcement learning framework to benefit from an instruction manual for a widely studied game,” said Yue Wu, who led the research. “We have been conducting experiments on other more complicated games like Minecraft, and have seen promising results. We believe our approach should apply to more complex problems.”

Atari video games have been a popular benchmark for studying reinforcement learning thanks to the controlled environment and the fact that the games have a scoring system, which can act as a reward for the algorithms. To give their AI a head start, though, the researchers wanted to give it some extra pointers.

First, they trained a language model to extract and summarize key information from the game’s official instruction manual. This information was then used to pose questions about the game to a pre-trained language model similar in size and capability to GPT-3. For instance, in the game PacMan this might be, “Should you hit a ghost if you want to win the game?”, for which the answer is no.

These answers are then used to create additional rewards for the reinforcement algorithm, beyond the game’s built-in scoring system. In the PacMan example, hitting a ghost would now attract a penalty of -5 points. These extra rewards are then fed into a well-established reinforcement learning algorithm to help it learn the game faster.

The researchers tested their approach on Skiing 6000, which is one of the hardest Atari games for AI to master. The 2D game requires players to slalom down a hill, navigating in between poles and avoiding obstacles. That might sound easy enough, but the leading AI had to run through 80 billion frames of the game to achieve comparable performance to a human.

In contrast, the new approach required just 13 million frames to get the hang of the game, although it was only able to achieve a score about half as good as the leading technique. That means it’s not as good as even the average human, but it did considerably better than several other leading reinforcement learning approaches that couldn’t get the hang of the game at all. That includes the well-established algorithm the new AI relies on.

The researchers say they have already begun testing their approach on more complex 3D games like Minecraft, with promising early results. But reinforcement learning has long struggled to make the leap from video games, where the computer has access to a complete model of the world, to the messy uncertainty of physical reality.

Wu says he is hopeful that rapidly improving capabilities in object detection and localization could soon put applications like autonomous driving or household automation within reach. Either way, the results suggest that rapid improvements in AI language models could act as a catalyst for progress elsewhere in the field.

Image Credit: StockSnap from Pixabay

* This article was originally published at Singularity Hub

Post a Comment