AlphaTensor: AI system speeds up matrix multiplication with new algorithm

With Deep Reinforcement Learning, DeepMind has discovered an algorithm no human thought of. It is supposed to significantly accelerate matrix multiplication.

In Pocket speichern vorlesen Druckansicht 1 Kommentar lesen

(Bild: DeepMind)

Lesezeit: 12 Min.
Von
  • Silke Hahn
Inhaltsverzeichnis

(Diesen Beitrag gibt es auch auf Deutsch.)

With AlphaTensor, DeepMind Technologies has presented an AI system that is supposed to independently find novel, efficient and provably correct algorithms for complex mathematical tasks. AlphaTensor has already identified a new algorithm with which matrix multiplications can be carried out faster than before, as the research team explains in a paper published in the magazine Nature. The team gives the factor as a ten to twenty percent acceleration compared to previous standard methods.

AlphaTensor builds on AlphaZero, an AI agent that proved superior to human players in board games such as Chess, Go and Shogi. Founded by British AI researcher, neuroscientist and computer game developer Demosthenes "Demis" Hassabis, the company had already set milestones with AlphaGo and AlphaFold. Hassabis states that his company's goal is to develop AI that solves fundamental problems in science. With AlphaTensor, DeepMind is apparently achieving another breakthrough, namely making progress in open questions of mathematics tangible through artificial intelligence (AI).

The search for faster algorithms to multiply two matrices is not trivial and has been occupying experts for about 50 years: In 1969, the mathematician, physicist and philosopher Volker Strassen found an improved method to multiply matrices of size two faster than the standard methods known until then. The Strassen algorithm named after him is considered groundbreaking and is still often used in practice today.

Tensor for matrix multiplication and algorithms: here multiplication of 2 x 2 matrices. Entries equal to 1 are purple, 0 entries are semi-transparent. The tensor indicates which entries are to be read from the input matrices and where the result is to be written. The stacked factors U, V and W (green, purple, yellow) result in a rank-7 decomposition of the tensor. Correspondence between the arithmetic operations (b) and the factors (c) is shown in colour.

(Bild: DeepMind)

The team had translated the problem of finding useful algorithms for matrix multiplication into a single-player game. In the game, the "board" is a three-dimensional tensor (number range) that records how far an algorithm is from the correct result in each case. The player tries to modify the tensor so that the entries in it are "zeroed out". The possible moves are fixed and correspond to algorithmic instructions. If the player makes a good play, the result is a provably correct algorithm for matrix multiplication (for any pair of matrices). The efficiency is measured by the number of steps that were necessary to lead the tensor to zero.

Schematic representation of the single-player game: AlphaTensor played it with the aim of finding a correct algorithm for matrix multiplication. The cube-shaped number range contains numbers between 0 (grey), 1 (blue) and -1 (green).

(Bild: DeepMind)

As Hassabis and his team describe in the article, the game is extremely demanding. The number of possible algorithms to consider is perhaps "larger than the number of atoms in the universe", even for smaller cases of matrix multiplication [editor's note: however, the number of atoms in the universe is estimated to be about 1080 or more, so Hassabis' figures seem a little high – although discussions are ongoing that the number of digital bits could one day exceed the number of atoms on Earth]. Compared to Go, which AI researchers cut their teeth on for a long time before AlphaGo, the dimension of possible moves is thirty orders of magnitude larger for each move (1033). The team apparently used Deep Reinforcement Learning (DRL) to train an AI agent that was initially clueless about matrix multiplication algorithms and then improved its skills over time, tracking down historically existing algorithms for faster matrix multiplication (such as Strassen's) and increasingly delivering results that go beyond known human insight, and at pace.

According to the DeepMind team, the results presented hold out the prospect of a significant acceleration of numerous computing operations in computer science and AI, at least according to the authors of the blog post and the technical article. Above all, they are initially intended to be a contribution to complexity research. It should be noted that the paper deals specifically with matrix multiplication, a simple algebra operation that underlies many processes in IT and the digital world.

AlphaTensor at a glance: The neural network (lower box) takes a tensor as input and provides samples (u, v, w) from a distribution of potential next game actions including an estimate of future returns. The network is trained using two data sources: previously played rounds and synthetic demos. The updated network is sent to the players (top box), where the planner uses it to generate new games.

(Bild: DeepMind)

Recognising voice commands, creating computer game graphics, simulations for weather forecasting, compressing data and numerous other processes rely on it. Therefore, even small improvements in the basic computing process could have far-reaching effects. More efficient computing processes take less time and thus require less energy and resources, among other things.

On Twitter and in other channels, the community has already expressed itself partly euphorically in initial statements, whereby the opinions are not entirely unanimous, but pay tribute throughout. Some already see the study as a major breakthrough for the AI-supported further development of technology, mathematics and AI as well as machine learning. Two German researchers have also spoken out on the scope and potential of the model presented, and their detailed statements are available at the Science Media Center in German.

Markus Bläser is Professor of Computational Complexity at Saarland University in Saarbrücken, Germany, and is considered an expert in algebraic algorithms and complexity. According to him, the theoretical understanding of the complexity of matrix multiplication as well as the development of fast, practical algorithms is of great interest, since matrix multiplication serves as the basis of numerous operations and is widely used in applied mathematics, computer science, but also in engineering. According to Bläser, the present work of the DeepMind team finds some new upper bounds for the multiplication of small matrices using machine learning methods. This contributes to the theoretical understanding of matrix multiplication.

An improvement over Strassen's algorithm, however, is initially only the new upper bound for the multiplication of matrices of size four. Bläser perceives a limitation in the fact that the algorithm only works if certain calculation laws are in force, which, for example, do not apply to the real numbers. On the other hand, the complexity researcher finds the method described in the second part of the paper more interesting, namely to "automatically adapt multiplication algorithms directly to certain hardware structures and thus exploit optimisations within the system, which would be very difficult for a human being". However, according to Bläser, the sub-problems that researchers are currently investigating are currently so large that he does not see any possible applications for an automatic search here – at least at the present time.

Holger Hoos, Professor of Machine Learning at Leiden University and Chair of AI at RWTH Aachen University, also describes matrix multiplication as a fundamental operation for computer science applications. A practically relevant acceleration of matrix multiplication would therefore be "of considerable importance". In this context, he considers the Deep Reinforcement Learning method used by DeepMind to be new and interesting. Overall, however, he dampens the euphoria about the results of the study, which in his eyes are "easy to overestimate".

The automatic design of algorithms based on machine learning is not an entirely new development. The approach has been researched for more than ten years and has already led to new algorithms - among others for the "notoriously difficult Boolean satisfiability problem" in propositional logic, which is studied intensively in computer science. This problem has important applications in the verification and testing of hardware and software. Hoos knows other examples from the design of algorithms for solving mathematical optimisation questions from industry and science; mixed integer programming comes to his mind as a minefield. The application of automatic algorithm design to matrix multiplication is new, and the methodological approach seems promising. However, Hoos sees "no signs yet of a breakthrough in the field of automatic algorithm design". A practical distinction is probably significant here: research into improving matrix multiplication algorithms has so far concentrated primarily on methods that yield mathematically provable accelerations for arbitrarily large matrices. The algorithms found in this way are almost insignificant in practice, since their advantages "only become apparent for astronomically large matrices", as Hoos objects.

The approach of the authors of the AlphaTensor study, on the other hand, is aimed at algorithms for practically relevant matrix sizes. This distinguishes it from previous research, although numerical stability is "only marginally considered". The expert therefore does not have overly high expectations, but assumes that the automatically discovered algorithms will not be used directly in practical applications. Hoos is not taken aback by the acceleration given in the paper: it is relatively small compared to known methods. "For matrices of size 8192, it is just over four per cent, and for size 20,480 on a commercially available GPU, it is already just over two per cent," he explains in his statement. In contrast, in previously known examples of automatic algorithm construction, accelerations by a factor of two to fifty have often been measured, i.e. by hundreds to tens of thousands of per cent.

Speed increase of the algorithm discovered by AlphaTensor: tailored to a GPU (a) and a TPU (b), optimised for a matrix multiplication of size 8192 × 8192. The speed increases are measured in comparison to standard matrix multiplication on the same hardware.

(Bild: DeepMind)

Hoos sees a shortcoming of the published article in the fact that the authors "do not give any results regarding the effect of using the automatically constructed algorithms for actual applications". For the time being, therefore, the question remains open as to what the new findings actually mean for computer graphics, scientific simulation or machine learning. The work is "interesting but not groundbreaking" in the eyes of the experts consulted, as the methodological approach is specifically designed for matrix multiplication. "The work is undoubtedly of technical interest to experts in automatic algorithm design like myself and my colleagues," Hoos concludes his statement. Anyone doing research in the field will be able to do something with the work. However, he does not expect real breakthroughs in research and practice based on it.

The search term AlphaTensor opens up a kaleidoscope of short comments from research and the community on Twitter: anyone who feels called can join in the discussion. It would be interesting to know what Strassen himself thinks about the new discovery: since his retirement, however, the researcher, who was born in 1936, has been primarily concerned with the theory of quantum physics.

The results of the DeepMind team are a solid contribution on the way to researching automated algorithm designs, but according to experts like cognitive scientist and AI researcher Joscha Bach, they are only the beginning. Optimising entire algorithms holds potential, especially in the accompanying development of new hardware, Bach said. In any case, AlphaZero is a strong algorithm whose purpose extends beyond the development of games through extensions.

If you want to get a more detailed picture, you can read the research paper on AlphaTensor published in Nature for yourself – it is freely accessible and can be read in your browser or downloaded as a PDF. A more compact version can be found on the DeepMind blog. Holger Hoos has written an additional assessment on LinkedIn. We also recommend an article in the MIT Technology Review, which offers an overview: "DeepMind's game-playing AI has beaten a 50-year-old record in computer science". As we learned after publishing the German version of this article, a research team at the California Institute of Technology (CalTech) lead by Prof. Anima Anandkumar developed StrassenNet four years ago. Their article is worth reading:

(sih)