GraphRAG: Combination of graphs and text for LLMs

Retrieval Augmented Augmentation helps to optimize the output of LLMs. GraphRAG also brings a visual component into play.

(Image: Erstellt mit KI (Midjourney) durch iX-Redaktion)

Dec 3, 2024 at 10:03 am CET

14 min. read

Developer

By

Dr. Christian Winkler

Generative language models such as ChatGPT can answer almost any question immediately and are easy to use. However, a closer look reveals a few problems.

ist Data Scientist und Machine Learning Architect. Er promovierte in theoretischer Physik und arbeitet seit 20 Jahren im Bereich großer Datenmengen und Künstliche Intelligenz, insbesondere mit Fokus auf skalierbaren Systemen und intelligenten Algorithmen zur Massentextverarbeitung. Seit 2022 ist er Professor an der TH Nürnberg und konzentriert seine Forschung auf die Optimierung von User Experience mithilfe moderner Verfahren. Er ist Gründer der datanizing GmbH, Referent auf Konferenzen und Autor von Artikeln zu Machine Learning und Text Analytics.

First of all, there are the hallucinations: Not everything that language models say is always true. If the model does not know certain information, it adds some. The hallucinations are formulated so convincingly that they sound plausible. A first version of Llama, for example, quickly turned Heise-Verlag into the organizer of CeBIT. Presumably, the combination of publication, IT reference and the Hanover location was simply too obvious for the world's largest computer trade fair at the time. Due to the polished wording, one is inclined to believe such misinformation.

Videos by heise

Training large language models is also extremely time-consuming and can take several thousand GPU years of computing time. This is why providers rarely retrain the models. This means that the models do not know the latest information. Even for relatively new models such as Llama 3.1, the so-called knowledge cut-off is in the last year.

For statements about internal information, such as that of a company, the public language models fail because this content is not included in their training set. Generative models can be retrained (finetuned), but this also involves a great deal of effort (and would have to be repeated for each new document).

A good combination: LLM + RAG

A combination of generative models with modern information retrieval methods provides a remedy. Documents can be indexed using embedding models (these also belong to the class of large language models). Similarity metrics can then be used to find documents (or passages) that answer a question as well as possible. This "context" is then passed to a generative model, which summarizes the results and matches them to the exact question.

Such processes, called Retrieval Augmented Generation (RAG), are extremely popular. Last year, they triggered a small revolution in information retrieval because they can achieve much better results. Not least because of this, there are many frameworks that implement RAG.

Using RAG correctly is not trivial, as optimization is possible in different dimensions. You can work with different embedding models, use different rerankers and use different generative models. Choosing the right combination requires sufficient experience.

In addition, it is unfortunately not yet possible to extract (formalized) knowledge from the documents with pure RAG. However, this would be useful because the models could provide much better answers. Research into knowledge graphs has therefore been going on for a long time. So it would be nice if these two ideas could be combined.

Hierarchical access with GraphRAG

The term GraphRAG comes from Microsoft, and the introductory article describes the process as a hierarchical approach to RAG as opposed to a purely semantic search for text fragments. The individual steps consist of extracting the knowledge graph from the raw text and building a community hierarchy with content summaries. These structures can then be used for retrieval, enabling better answers to be formulated.

In contrast to many other Microsoft projects, however, the implementation remains hidden. Although there are Jupyter notebooks, they make intensive use of Azure and OpenAI and transfer all information to the cloud. Since much is hidden in the classes, it is difficult to understand what is happening behind the scenes.

Fortunately, there are alternative sources and implementations. We recommend the introductory article by neuml. Here you can see what happens in much more detail. An embedding model (intfloat/e5-base) makes it possible to make similarity queries. Existing Wikipedia embeddings, which neuml makes available via Hugging Face, serve as the database. The implementation indexes a subset (the top 100,000 articles) and returns a graph as a result for a query (in this case "machine learning"):

The tool neuml creates a knowledge graph for machine learning from the Wikipedia summaries (Fig. 1).

(Image: Christian Winkler)

The fact that the nodes in the upper part in Figure 1 are significantly more closely connected to each other than in the rest of the graph is an indication that the information density is likely to be highest there.

As the graph is modeled as a NetworkX graph in Python, its methods can also be called to determine the path between two nodes or to determine the nodes with the highest centrality, among other things. A glance at the result reveals that these are particularly relevant Wikipedia pages:

{'id': 'Machine learning', 
 'text': 'Machine learning (ML) is a field of study in artificial intelligence concerned with the development ...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 2, 
 'score': 0.9113607406616211}
{'id': 'Supervised learning', 
 'text': 'Supervised learning (SL) is a paradigm in machine learning where input objects (for example, a vecto...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 68, 
 'score': 0.8619827032089233}
{'id': 'Perceptron', 
 'text': 'In machine learning, the perceptron (or McCulloch–Pitts neuron) is an algorithm for supervised learn...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 70, 'score': 0.8862747550010681}
{'id': 'Autoencoder', 
 'text': 'An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled d...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 46, 
  'score': 0.8562962412834167}
{'id': 'Multilayer perceptron', 
 'text': 'A multilayer perceptron (MLP) is a misnomer for a modern feedforward artificial neural network, cons...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 89, 
 'score': 0.8532359004020691}
{'id': 'Unsupervised learning', 
 'text': 'Unsupervised learning is a paradigm in machine learning where, in contrast to supervised learning an...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 53, 
 'score': 0.8743622303009033}
{'id': 'Generative pre-trained transformer', 
 'text': 'Generative pre-trained transformers (GPT) are a type of large language model (LLM) and a prominent f...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 5, 
 'score': 0.8358747363090515}
{'id': 'Convolutional neural network', 
 'text': 'Convolutional neural network (CNN) is a regularized type of feed-forward neural network that learns ...', 
 'score': 0.8500866889953613}
{'id': 'Deep learning', 'text': 
 'Deep learning is the subset of machine learning methods based on artificial neural networks with rep...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 27, 
 'score': 0.8632184267044067}
{'id': 'Kernel method', 
 'text': 'In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best know...', 
 'topic': 'artificial_learning_intelligence_language', 
 'topicrank': 99, 
 'score': 0.8539000749588013}

The score comes from the similarity analysis, and you can clearly see that the order is different from the graph analysis. In the combination of both methods, the most relevant documents are found both for the query and in relation to the context.

The topics that appear in the results are also interesting. These were calculated during indexing –. The process is quite complex and, like the calculation of embeddings, is several orders of magnitude faster on a GPU than on a CPU.

Powerful hardware is also required for the final phase of RAG, text generation. The example from neuml uses a Mistral 7B OpenOrca model for this. To ensure that it also runs on less powerful GPUs, neuml has opted for a model downsampled with Activation-aware Weight Quantization (AWQ). The generation is fast and can extract the facts from the graph. The tool answers the query about "machine learning" with Wikipedia as follows:

Machine learning is a field of study in artificial intelligence that focuses on developing statistical algorithms that can learn from data and generalize to unseen data.
Generative artificial neural networks have surpassed many previous approaches in performance.
Machine learning algorithms are susceptible to inheriting and amplifying biases present in their training data, which can manifest in skewed representations or unfair treatment of different demographics.
Supervised learning is a paradigm in machine learning where input objects and a desired output value train a model, allowing the algorithm to correctly determine output values for unseen instances.
An autoencoder is a type of artificial neural network used for unsupervised learning, learning efficient codings of unlabeled data, and reducing dimensionality.

It is striking that the answer focuses on generative AI. The model reflects the information that can be found on Wikipedia. There is obviously much more about generative (language) models than about other ML topics. It is also astonishing that the model explains autoencoders in addition to supervised learning, but not unsupervised learning.

In addition to the fragments from the Jupyter notebook, neuml also offers a Streamlit application on GitHub for experimentation. The tool is powerful and also allows you to index your own documents.

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.

Beliebte Bestenlisten

Alle bestenlisten

Der beste Full-HD-Beamer für Heimkino, TV & Konsole

Die besten Tuner für DAB+ zum Nachrüsten für die Stereoanlage

Top 10: Die beste Smartwatch 2025 im Test

Alle Angebote

Newsletter heise-Bot Push Push-Nachrichten

${intro} ${title}