KI Navigator #8: GenAI cheat package Reflection Llama

The hype surrounding a supposed sensation shows how difficult it is to keep up with the rapid development of GenAI, says Christian Winkler.

listen Print view
Drawing of a llama in a rear-view mirror

(Image: erstellt mit KI (Dall-E) von iX-Redaktion)

5 min. read
By
  • Dr. Christian Winkler
Contents

Generative AI is the hype topic par excellence, with new models, technologies and breakthroughs constantly emerging. With such a high number of developments, it is difficult to keep up to date at all times.

The rapid development of GenAI began with GPT, followed by open models such as Llama and Mistral. Finally, the models came in different sizes, with different capabilities such as multimodal or multilingual and with different architectures: dense models, mixture of experts and other approaches. In addition, there are new models such as Phi-3.5 from Microsoft, Qwen from Alibaba and the Llama descendant Nemotron trained by Nivida. It is almost impossible to maintain an overview in this jungle.

Videos by heise

To make matters worse, almost all models exist in different quantization levels, meaning that there are variants with a lower number of bits that are optimized for smaller graphics cards or even CPUs. And there are also several variants of these quantization methods: GPTQ, AWQ, HQQ, 1.58 bit. Meta has now entered this race and offers the small Llama models in its own quantization, which are probably even more optimized – However, the good results shown by Meta have yet to be verified.

Even if most companies work seriously, marketing plays a role that should not be underestimated. The story of Reflection Llama shows that not everything that is published is always true.

On September 6, 2024, a message on X (formerly Twitter) announced a sensation. Matt Shumer, who until then had not made much of an appearance, announced that he had developed a new technique for fine-tuning language models, which he had named Reflection.

He proudly announced that the model created with this method from the open Llama model (70b) is better than any other open source model. He wants to apply the same technology to the Llama model with 405 billion parameters and expects it to be the best model ever –, i.e. also better than the GPT models from Open AI. As proof, he provided a service and uploaded the weights of the Llama model to Hugging Face.

The enthusiasm was great, and the tweet has over three million views. The service worked very well and gave excellent responses. The fact that these could not be directly reproduced by the model on Hugging Face was a little strange, but LLMs do not always give the same answers due to the hyperparameter temperature, which controls the randomness. After the community looked at various prompts and tasks on Reddit and elsewhere, skepticism grew.

Some experts came up with the idea of asking the service which model was behind it. The result was clear: it was Claude Sonnet 3.5, meaning that Shumer had not fine-tuned its own model with the revolutionary reflection method, but had merely built a façade in front of an existing (good) service.

The discussion then continued because it was not yet clear what the Llama reflection model loaded on Hugging Face could actually do. There is only conjecture here, but based on the answers to certain questions, one can conclude that the model is based on the (outdated) Llama 3.0, which Shumer has optimized using conventional fine-tuning methods.

In addition, he has introduced new tags such as <thinking>, <reflection> and <output>, which makes perfect sense and is also used in other models. This is exactly what made it difficult to recognize the scam at first.

A detailed analysis of the processes can be found on the DataCamp website.

How could this happen? The main culprit is probably the extremely rapid developments in the field of language models. Even experts find it difficult to keep up and check the models. The community can provide help, because many experts work together here and gain insights through discussions.

In addition to online discussions, such conversations can also be held particularly well offline. The KI Navigator conference organized by DOAG, heise medien and de'ge'pol in Nuremberg on 20 and 21 November 2024 will provide an opportunity for this.

(dmk)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.