Metas Sprachmodell Llama 2 ist flexibler als der Vorgänger

Seite 2: Die Antworten im Vergleich

Nun kann man das Modell aufrufen. Es ist interessant, die Ergebnisse mit denen des ursprünglichen Llama-Modells zu vergleichen:

./main -m 
  ${MODEL_PATH}/llama-2-7b-chat/ggml-model-q4_0.bin \
  -t 16 -n 256 --repeat_penalty 1.0 \
  -f prompts/chat-with-bob.txt --color -i -r "User:"

User: Who was Alan Turing?

Bob: Alan Turing was a British mathematician, computer scientist, and logician who lived from 1912 to 1954. He is widely considered one of the most important figures in the history of computer science and artificial intelligence.

Turing was born in England and studied mathematics at Cambridge University. During World War II, he worked at the British codebreaking center Bletchley Park, where he helped crack German naval codes. This work was instrumental in the Allied victory in the war.

After the war, Turing became interested in the study of artificial intelligence and the concept of the universal Turing machine, which is a theoretical model for a computer. He proposed the idea of a Turing machine, which is a simple, abstract model of a computer that can simulate any other computer. This work laid the foundation for modern computer science and the field of artificial intelligence.

Turing was also a convicted felon. In 1952, he was arrested and convicted of gross indecency for his homosexual relationship with a man. He was given a choice between imprisonment or chemical castration, and he chose the latter. Turing died two years later, in 1954, at the age of 41.

Despite his tragic personal life and early death, Turing's contributions to computer science and artificial intelligence have had a profound impact on modern technology. His work has inspired countless scientists and engineers, and his legacy continues to be celebrated and studied today.

Das ist deutlich besser, als das Ergebnis für das erste Llama. Was weiß Llama 2 über Heise?

User: Explain Heise Zeitschriftenverlag

Bob: Heise Zeitschriftenverlag is a German publishing company that specializes in science and technology publications. The company was founded in 1948 and has since become one of the leading publishers of science and technology magazines in Europe.

Heise publishes several magazines that cover a wide range of scientific and technological topics, including computing, mathematics, physics, and engineering. Some of its most popular magazines include:

* Heise Online: A news website that provides daily coverage of scientific and technological developments.

* c't: A magazine that focuses on computer technology and programming.

* CHIP: A magazine that covers consumer electronics and home computers.

* MANS: A magazine that focuses on computer science and programming.*

* Focus: A magazine that covers a wide range of scientific and technological topics, including space exploration, environmental issues, and health.

Heise also publishes several annuals and yearbooks that provide in-depth coverage of specific topics, such as computer graphics, artificial intelligence, and biotechnology.

Overall, Heise Zeitschriftenverlag is a leading publisher of science and technology magazines in Europe, providing readers with up-to-date information on the latest developments in these fields.

Das ist nicht ganz korrekt, CHIP, MANS und Focus gehören nicht zu den Heise-Zeitschriften. Man kann das nochmal mit einem größeren Modell testen:

./main -m \
  ${MODEL_PATH}/llama-2-13b-chat/ggml-model-q4_0.bin \
  -t 16 -n 256 --repeat_penalty 1.0 \ 
  -f prompts/chat-with-bob.txt --color -i -r "User:"

User: Explain Heise Zeitschriftenverlag!

Bob: Heise Zeitschriftenverlag is a German publisher of technical and specialist literature, established in 1969. Their publications cover a wide range of topics such as computer science, information technology, and engineering.

User: Which magazines does it publish?

Bob: Heise Zeitschriftenverlag publishes several magazines, including c't - Das Magazin für Computertechnik, Python - Das Magazin für Anwender, and Java - Das Magazin für Entwickler.

User: Who is the audience?

Bob: The audience for Heise Zeitschriftenverlag's magazines are primarily computer professionals, developers, and technology enthusiasts.

User: Erkläre Heise Zeitschriftenverlag!

Bob: Heise Zeitschriftenverlag ist ein deutscher Verleger von technischen und spezialisierten Publikationen, gegründet im Jahr 1969. Ihre Publikationen deckeln ein weites Spektrum von Themen ab, wie Computertechnik, Informationstechnologie, Telekommunikation und Internet. Heise Zeitschriftenverlag ist bekannt für seine führenden Fachzeitschriften wie c't, Computerwoche und IT-Business.

User: Erkläre c't Magazin!

Bob: Das c't Magazin ist eine monatlich erschienene Fachzeitschrift für Computer- und Technologie-Themen, verlegt von Heise Zeitschriftenverlag. Das Magazin richtet sich an Entwickler, Administratoren, IT-Spezialisten und andere Interessenten und bietet in-depth-Artikel, Tests und Analysen zu verschiedenen Themen wie Linux, Windows, Security, Datenschutz und mehr. c't Magazin ist ein führendes Fachmedium in Deutschland und hat eine lange Tradition in der Informationstechnik-Branche.

Das funktioniert sogar noch besser. Besonders gut erkennt man hier, dass sich das System den Kontext merkt und immer wieder auf den Heise Zeitschriftenverlag referenziert. Sehr interessant ist auch die nahezu wortwörtliche Übersetzung der englischen Antwort, wenn man den Prompt auf Deutsch schreibt. Llama 2 braucht in dieser Version nicht so viel Speicher:

$ top
top - 10:46:32 up 98 days, 21:19, 10 users,  load average: 8.32, 3.72, 1.64
Tasks: 562 total,   1 running, 559 sleeping,   0 stopped,   2 zombie
%Cpu(s):  1.5 us,  0.6 sy,  0.0 ni, 97.9 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem : 128739.8 total,    960.7 free,  44584.3 used,  83194.8 buff/cache
MiB Swap:   4091.0 total,      0.0 free,   4091.0 used.  82767.1 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3514330 cwinkler  20   0 9535672   7.2g   6.9g S   0.0   5.8  21:23.59 main

7,2 GByte sind schon ordentlich, aber für ein Modell mit 13 Milliarden Parametern genau im Rahmen der Erwartung.

Groß gegen größer

Leider lässt sich das 70B-Modell nicht genauso einfach verwenden, weil es mit Grouped-Query Attention arbeitet, die llama.cpp noch nicht versteht: Der Start mit dem experimentellen Parameter ist zwar möglich, aber damit dauert die Verarbeitung über eine Sekunde pro Token.

./main -m \
  ${MODEL_PATH}/llama-2-70b-chat/ggml-model-q4_0.bin \
  -t 16 -n 256 --repeat_penalty 1.0 \
  -f prompts/chat-with-bob.txt -gqa 8 --color -i -r "User:"

User: Explain Heise Zeitschriftenverlag!

Bob: Heise Zeitschriftenverlag is a German publisher of specialist magazines. It was founded in 1898 and is based in Hanover, Germany. The company publishes magazines on various topics such as computer science, electronics, automotive engineering, and telecommunications. Heise Zeitschriftenverlag is well-known for its high-quality publications and expert authors.

User: Which magazines does it publish?

Bob: Heise Zeitschriftenverlag publishes several magazines, including c't, iX, Makro, and Auto motor und sport.

User: Who was the founder?

Bob: The founder of Heise Zeitschriftenverlag was Eduard Heise.

User: Erkläre Heise Zeitschriftenverlag!

Bob: Heise Zeitschriftenverlag ist ein deutscher Zeitschriftenverlag, der 1898 von Eduard Heise gegründet wurde und seinen Sitz in Hannover hat. Das Unternehmen publiziert eine Vielzahl von Fachzeitschriften zu Themen wie Informatik, Elektronik, Automobiltechnik und Telekommunikation. Heise Zeitschriftenverlag ist bekannt für seine hochwertigen Veröffentlichungen und Experten-Autoren.

Das Ergebnis ist nicht besser als das des 13B-Modells. Eventuell funktioniert der GQA-Mechanismus noch nicht richtig in der quantisierten Version, oder Meta muss das Training noch optimieren.

Im Vergleich zur früheren Version, die für das erste Llama-Modell optimiert war, arbeitet llama.cpp nun effizienter. Es erzeugt die Tokens schneller und das System benötigt etwas weniger Speicher – im Fall des 70B-Modells sind es 36,5 GByte.

Neuerdings enthält llama.cpp einen Server, mit dem man webbasiert chatten kann:

./server -m \
  ${MODEL_PATH}/llama-2-13b-chat/ggml-model-q4_0.bin \
  -t 16 --port 8001

Danach ist der Server über http://localhost:8001/ erreichbar.

Die Webanwendung ist in llama.cpp integriert und erlaubt einfache Dialoge (Abb. 1).

Die Zukunft

Schon jetzt gibt es unzählige Projekte, die auf Llama 2 aufbauen. Da der Unterschied zum Vorgänger nicht groß ist, lässt sich die Software relativ leicht anpassen und nachtrainieren. Da das am besten in 16 Bit erfolgen sollte, sind dafür jedoch nach wie vor leistungsstarke Grafikkarten erforderlich.

Wie schon das ursprüngliche Llama hat auch der Nachfolger in kürzester Zeit ein ganzes Ökosystem an Tools hervorgebracht, die unter anderem zum Training oder Anpassen dienen. Ein guter Überblick findet sich bei dem deutschen Hugging-Face-Entwickler Phil Schmidt. Ein regelrechter Wettbewerb ist um das schlankeste Programm entbrannt, das Llama-2-Modelle verarbeiten kann. Aktuell klappt es mit einem kurzen C-Programm.