OpenAI’s New AI Learned to Play Minecraft by Watching 70,000 Hours of YouTube

minecraft openai machine-learning algorithm plays games youtube

In 2020, OpenAI’s machine learning algorithm GPT-3 blew people away when, after ingesting billions of words scraped from the internet, it began spitting out well-crafted sentences. This year, DALL-E 2, a cousin of GPT-3 trained on text and images, caused a similar stir online when it began whipping up surreal images of astronauts riding horses and, more recently, crafting weird, photorealistic faces of people that don’t exist.

Now, the company says its latest AI has learned to play Minecraft after watching some 70,000 hours of video showing people playing the game on YouTube.

School of Mines 

Compared to numerous prior Minecraft algorithms which operate in much simpler “sandbox” versions of the game, the new AI plays in the same environment as humans, using standard keyboard-and-mouse commands.

In a blog post and preprint detailing the work, the OpenAI team say that, out of the box, the algorithm learned basic skills, like chopping down trees, making planks, and building crafting tables. They also observed it swimming, hunting, cooking, and “pillar jumping.”

“To the best of our knowledge, there is no published work that operates in the full, unmodified human action space, which includes drag-and-drop inventory management and item crafting,” the authors wrote in their paper.

With fine-tuning—that is, training the model on a more focused data set—they found the algorithm more reliably performed all of these tasks, but also began to advance its technological prowess by fabricating wooden and stone tools and building basic shelters, exploring villages, and raiding chests.

After further fine-tuning with reinforcement learning, it learned to build a diamond pickaxe—a skill that takes human players some 20 minutes and 24,000 actions to accomplish.

This is a notable result. AI has long struggled with Minecraft’s wide-open gameplay. Games like chess and Go, which AI’s already mastered, have clear objectives, and progress toward those objectives can be measured. To conquer Go, researchers used reinforcement learning, where an algorithm is given a goal and rewarded for progress toward that goal. Minecraft, on the other hand, has any number of possible objectives, progress is less linear, and deep reinforcement learning algorithms are usually left spinning their wheels.

In the 2019 MineRL Minecraft competition for AI developers, for example, none of the 660 submissions achieved the competition’s relatively simple goal of mining diamonds.

It’s worth noting that to reward creativity and show that throwing computing power at a problem isn’t always the answer, the MineRL organizers placed strict limits on participants: they were allowed one NVIDIA GPU and 1,000 hours of recorded gameplay. Though the contestants performed admirably, the OpenAI result, achieved with more data and 720 NVIDIA GPUs, seems to show computing power still has its benefits.

AI Gets Crafty

With its video pre-training (VPT) algorithm for Minecraft, OpenAI returned to the approach it’s used with GPT-3 and DALL-E: pre-training an algorithm on a towering data set of human-created content. But the algorithm’s success wasn’t enabled by computing power or data alone. Training a Minecraft AI on that much video wasn’t practical before.

Raw video footage isn’t as useful for behavioral AIs as it is for content generators like GPT-3 and DALL-E. It shows what people are doing, but it doesn’t explain how they’re doing it. For the algorithm to link video to actions, it needs labels. A video frame showing a player’s collection of objects, for example, would need to be labeled “inventory” alongside the command key “E” which is used to open the inventory.

Labeling every frame in 70,000 hours of video would be…insane. So, the team paid Upwork contractors to record and label basic Minecraft skills. They used 2,000 hours of this video to teach a second algorithm how to label Minecraft videos, and that algorithm, IDM, annotated all 70,000 hours of YouTube footage. (The team says IDM was over 90 percent accurate when labeling keyboard and mouse commands.)

This approach of humans training a data-labeling algorithm to unlock behavioral data sets online may help AI learn other skills too. “VPT paves the path toward allowing agents to learn to act by watching the vast numbers of videos on the internet,” the researcher wrote. Beyond Minecraft, OpenAI thinks VPT can bring new real-world applications, like algorithms that operate computers at a prompt (imagine, for instance, asking your laptop to find a document and email it to your boss).

Diamonds Aren’t Forever

Much to the chagrin of the MineRL competition organizers perhaps, the results do seem to show that computing power and resources still move the needle on the most advanced AI.

Never mind the cost of computing, OpenAI said the Upwork contractors alone cost $160,000. Though to be fair, manually labeling the whole data set would’ve run into the millions and taken considerable time to complete. And while the computing power wasn’t negligible, the model was actually quite small. VPT’s hundreds of millions of parameters are orders of magnitude less than GPT-3’s hundreds of billions.

Still, the drive to find clever new approaches that use less data and computing is valid. A kid can learn Minecraft basics by watching one or two videos. Today’s AI requires far more to learn even simple skills. Making AI more efficient is a big, worthy challenge.

In any case, OpenAI is in a sharing mood this time. The researchers say VPT isn’t without risk—they’ve strictly controlled access to algorithms like GPT-3 and DALL-E partly to limit misuse—but the risk is minimal for now. They’ve open sourced the data, environment, and algorithm and are partnering with MineRL. This year’s contestants are free to use, modify, and fine-tune the latest in Minecraft AI.

Chances are good they’ll make it well past mining diamonds this time around.

Image Credit: SIMON LEE / Unsplash 

* This article was originally published at Singularity Hub

Post a Comment