xLSTM: Extended Long Short-Term Memory – better AI models from Europe

The start-up NXAI, founded by Sepp Hochreiter, has presented its new architecture for language models. xLSTMs are supposed to be better than transformers.

(Image: PHOTOCREO Michal Bednarek / shutterstock.com)

May 8, 2024 at 3:26 pm CEST

4 min. read

By

Eva-Maria Weiß

This article was originally published in German and has been automatically translated.

The team at the Linz-based start-up NXAI, led by AI pioneer Sepp Hochreiter, has published a scientific paper in which it presents a more powerful architecture for language models that is said to be superior to the usual Transformer architecture. The so-called extended LSTM models (xLSTM) are said to outperform pure Transformer models in numerous benchmarks and be significantly more efficient.

Long Short-Term Memory (LSTM) is a special architecture for neural networks, which also forms the basis of AI models. AI researchers Sepp Hochreiter and Jürgen Schmidhuber have been developing it since the 1990s to process sequential data such as text. Unlike deep convolutional neural networks, which specialize in images, LSTMs have a kind of built-in short-term memory. They can therefore consider past context when creating or completing sentences. LSTMs were the basis for the success of voice assistants such as Siri and Alexa and also significantly improved machine translation; however, the results were still a long way from the level of human language and formulation.

Huge word cloud

Only the Transformer architecture with its best-known protagonist ChatGPT advanced into these spheres. Transformers have an attention mechanism that encodes words and parts of words in such a way that terms frequently used in context are close together. A text processed in this way can then be imagined as a huge, sorted word cloud. Transformers can therefore memorize much larger amounts of text and consider context that is further apart. The so-called xLSTM model is now set to become the most powerful Large Language Model (LLM) in the world, according to NXAI, which is conducting a research collaboration with the Johannes Keppler University Linz.

xLSTM im Aufbau. — The structure of an xLSTM model.

(Image: Screenshot aus dem Paper. )

xLSTM is actually a combination of transformer technology and Long Short-Term Memory. The research question of the associated paper is accordingly: "How far can we get in language modeling if we scale LSTMs to billions of parameters and use the latest techniques of modern LLMs, but mitigate the known limitations of LSTMs?" The result is an architecture that performs better in terms of performance and scalability compared to the transformers currently in use, the researchers write.

xLSTM has potential

An exponential gating is carried out in depth, various gates form the short-term memory, which, however, lasts a long time – hence the name of the model. In addition, the memory structure has been changed compared to the classic LSTM. The corresponding paper has been published. It concludes: "xLSTM has the potential to significantly influence other areas of deep learning – such as reinforcement learning, time series prediction or the modeling of physical systems."

However, it still has to prove its potential in further and more detailed benchmarks. For the initial tests, it was trained with 15 billion and 300 billion tokens of the SlimPajama dataset and compared with several transformer models, including Llama and GPT-3. How xLSTM would fare against the high-end LLMs GPT-4, GPT-4V or Google Gemini remains to be seen. In their paper, the authors themselves admit that an extensive optimization process is still necessary for the xLSTM architecture to reach its full potential.

Hochreiter, a German AI pioneer who conducts research in Austria, writes at X: "With NXAI, we have started to build our own European LLM. I am very proud of my team."

Hochreiter and his former lecturer Jürgen Schmidhuber were also involved in the development of the Transformer architecture as students. The latter appeared at this year's OMR Festival, where he spoke with Jonas Andrulis from Aleph Alpha about his assessment of the AI hype.

Read also

Thanks to the blue hour, this motif looks dynamic and interesting, the yellowish street lighting contrasts perfectly with the blue night sky. The darkness allows a longer exposure time of 1/20 of a second and therefore a dynamic pull-along. The blue light flashes recognizably, the vehicle headlights illuminate the road. The light switched on inside the cab makes the paramedic visible. During the day, he would not be recognizable through the reflective windscreen.Canon EOS R 29 mm ISO 3200 f/4.5 1/20 s

Missing link: AI and rescue workers – how artificial intelligence helps helpers

AI technology as a dubious messiah – an opinion

AI gadget Rabbit R1: Bumpy start in Germany due to security problems

CriticGPT: OpenAI wants to find bugs in ChatGPT with critical GPT-4 version

AI-generated boy in the commercial, the reasons of Toys "R" Us, with giraffe

${intro} ${title}