Study: Google's artificial intelligence successfully plays "Minecraft"
According to a study, a DeepMind AI mines diamonds in "Minecraft", which is considered difficult. The learning method should also be able to be used for robots.
(Image: Microsoft)
– Unlike chess, poker or “Starcraft” – “Minecraft” has long been considered a challenge for artificial intelligence (AI). The open simulation game generates its world randomly on the computer. It therefore looks different every time, and an AI algorithm has to memorize more than a few fixed sequences of actions on the way to the goal. A team led by Google DeepMind has now presented DreamerV3, a program routine that has mined diamonds in a “Minecraft” research world designed for AI tests. This achievement was achieved without any special training for the game and without the use of human data.
According to experts, even experienced human players need over 20 minutes and around 24,000 “inputs” just to create a diamond pickaxe. For the experiment now described in the scientific journal Nature, the authors used the “Minecraft” research version Malmo and environments from the MineRL AI competition. An initial version of the study, which had not yet been peer-reviewed by independent researchers, was published on the preprint server Arxiv in 2023. The open-source solution DreamerV3 is based on reinforcement learning (RL). This “reinforcing” method mimics the learning process that humans use to achieve goals through trial and error.
“Dreamer learns a model of the environment and improves its behavior by imagining future scenarios,” explains the team. “Robustness techniques based on normalization, compensation, and transformations enable stable learning across domains.” Applied immediately, the third version of the algorithm is the first to “collect diamonds in 'Minecraft' from scratch without human data or curricula”. Programmers use mathematical functions to determine in advance what the AI sees as a reward as part of the learning process. DreamerV3 was given a little help: to mine resources, the character has to repeatedly hit a block. The authors specified a minimum number of hits for this action.
Divided response from independent researchers
Many AIs based on RL are particularly good in a specific domain to which the reward function is tailored. However, according to the study, DreamerV3 should be convincing in various environments: the algorithm performed better than various domain-specific models in several game and task types. This also applies to the Proximal Policy Optimization (PPO) algorithm known from OpenAI, which is also designed for different domains. The ChatGPT manufacturer also tested the Video PreTraining (VPT) model in 2022 as part of the Mine-RL competition, which should be able to produce a diamond hack in Minecraft. According to the analysis, DreamerV3 simulates several successive actions in advance with its world model and thus develops a strategy to solve set tasks in a customized manner.
Videos by heise
“The study is first-class and groundbreaking,” says Georg Martius, an expert in autonomous learning at the Max Planck Institute for Intelligent Systems in Tübingen, praising the work of his colleagues to the Science Media Center (SMC). Model-based RL has long been considered a promising method. But only this paper shows “that it can be used very broadly and efficiently”. Scenarios ranged from various video games to AI agents and simplified robot control. The special thing about DreamerV3 is that it solves all problems with the same settings (“hyperparameters”). This is an indication that the algorithm works out-of-the-box for new problems and does not need to be extensively adapted. Jan Peters, Professor of Intelligent Systems at TU Darmstadt, is less convinced: although the heuristic rules of thumb used achieve impressive empirical results, they are “intellectually unsatisfactory”. They are “probably of little use in the real world” and are only useful in simulations.
(mack)