Vector database provider Qdrant presents new algorithm for hybrid searches

Qdrant introduces BM42, a new search algorithm tailored to Retrieval Augmented Generation that takes advantage of Transformers.

Save to Pocket listen Print view

(Image: iX / erstellt mit ChatGPT)

3 min. read
This article was originally published in German and has been automatically translated.

The vector database startup Qdrant wants to tailor its open-source database and vector search engine more specifically to modern use cases in the field of AI and search, in particular Retrieval Augmented Generation (RAG). To this end, the company is now presenting a new search algorithm under the name BM42, which is being positioned as an alternative to established variants such as BM25 or SPLADE.

According to the announcement by Qdrant CTO and co-founder Andrey Vasnetsov, BM42 takes an innovative approach that combines the strengths of the classic BM25 algorithm with the advantages of Transformer-based AI models: On the one hand, the simplicity and interpretability of BM25 and, on the other, the semantic intelligence of Transformer models.

In contrast to classic search applications, the documents in RAG systems are generally much shorter. The BM42 algorithm addresses this problem by replacing the term weighting within a document with semantic information from Transformer models. While it retains the Inverse Document Frequency (IDF) known from BM25, which measures the importance of a term in relation to the entire document collection, BM42 uses the attention values from Transformer models instead of the statistical term frequency within a document to determine the importance of a word for the entire document.

According to Vasnetsov, by using the attention values, BM42 can take into account the semantic meaning of words without relying on additional training. A special tokenization method is used, which is better suited for search tasks. The token [CLS] represents the entire sequence in the classification tasks. As can be seen in the listing below, the attention line for the token can be used to determine the meaning of each token in the document for the entire document.

sentences = "Hello, World - is the starting point in most programming languages"

features = transformer.tokenize(sentences)

# ...

attentions = transformer.auto_model(**features, output_attentions=True).attentions

weights = torch.mean(attentions[-1][0,:,0], axis=0)                       
#                ▲               ▲  ▲   ▲                                 
#                │               │  │   └─── [CLS] token is the first one
#                │               │  └─────── First item of the batch         
#                │               └────────── Last transformer layer       
#                └────────────────────────── Averate all 6 attention heads

for weight, token in zip(weights, tokens):
    print(f"{token}: {weight}")

# [CLS]       : 0.434 // Filter out the [CLS] token
# hello       : 0.039
# ,           : 0.039
# world       : 0.107 // <-- The most important token
# -           : 0.033
# is          : 0.024
# the         : 0.031
# starting    : 0.054
# point       : 0.028
# in          : 0.018
# most        : 0.016
# programming : 0.060 // <-- The third most important token
# languages   : 0.062 // <-- The second most important token
# [SEP]       : 0.047 // Filter out the [SEP] token

According to the announcement, in addition to the improved search results for short documents and their traceability and explainability, BM42 also offers a number of advantages that benefit developers. The algorithm can be easily integrated into existing systems and is characterized by high efficiency with low memory requirements. The ability to use different transformer models gives users more flexibility. According to Vasnetsov, the combination of BM42 with dense embeddings in a hybrid search approach currently enables the best possible results to be achieved, in which the sparse model is responsible for exact token matching and the dense model for semantic similarity.

Mehr Infos

(Image: DOAG)

On November 20 and 21, 2024, the AI Navigator will enter its second round. The event, organized by DOAG, Heise Medien and de'ge'pol, will once again take place at the Nuremberg Convention Center East. KI Navigator is the conference on the practice of AI in the three areas of IT, business and society. It is dedicated to the concrete application of artificial intelligence.

The program includes presentations on vector databases and retrieval augmented generation. Tickets are available at the early bird price until September 30.

More details on BM42 as well as technical background information on search algorithms such as BM25 or SPLADE can be found in Andrey Vasnetsov's announcement. If you would like to delve deeper into the discussion and get to know Qdrant projects, the Discord channel is open to you.

(map)