Typesense 30.0: Open-source search engine with global curation rules

The open-source search engine Typesense has been released in version 30.0. The new version brings global curation rules and synonyms, as well as IPv6 support.

listen Print view
Close,Up,Of,Search,Button.,Concept,Of,Popularity,Of,Search

(Image: Karramba Production/Shutterstock.com)

4 min. read
Contents

The developers of the open-source search engine Typesense have released version 30.0. The update brings fundamental API changes for synonyms and curation rules, as well as new features like Maximum Marginal Relevance (MMR) for diversifying search results. The previously collection-specific synonyms and curation rules are now global resources that can be shared between collections.

Administrators must create a snapshot before updating to version 30.0, as the new version performs an automatic migration. Existing collection-specific synonyms and overrides will be transferred to global synonym sets and curation sets. The API endpoints change from /collections/{collection}/synonyms/* to /synonym_sets/* and from /collections/{collection}/overrides/* to /curation_sets/*. Existing search queries will continue to work after the migration, but developers will need to adapt their applications to the new endpoints for read and write access to the new sets.

A central new feature is the diversification of search results via MMR. The algorithm diversifies the top 250 hits based on a predefined similarity metric. The MMR formula considers both the relevance of a document to the search query and its similarity to already selected results. The lambda parameter controls the balance between relevance and diversity, with a default value of 0.5. Administrators can configure MMR via Curation Sets with different similarity metrics such as Jaccard for arrays or Vector Distance for embeddings.

Videos by heise

The global structure of synonyms and curation rules reduces redundancy, as these resources no longer need to be created separately for each collection. This leads to lower memory requirements and potentially better cache hits through reuse. Synonym sets support both one-way and multi-way synonyms and can be configured language-specifically. Curation rules can now also use synonyms and stemming, and they support MMR diversification as well as dynamic filtering and sorting.

Version 30.0 significantly expands the JOIN features. The facet_by parameter now supports referenced fields from linked collections, such as facet_by=$Customers(product_price). Developers can use include_fields to retrieve the number of linked documents and apply sorting and limits to linked fields. Also new is the cascade_delete: false option, which prevents referenced documents from being automatically deleted when all references have been removed. This option requires async_reference: true in the schema.

For Natural Language Search and Auto-Embedding, Typesense 30.0 now supports OpenAI models from Azure as well as GCP Service Account Authentication. This enables integration into cloud environments with Azure and Google Cloud models. For vector-based image search, new CLIP Multilingual Models are available, enabling multilingual similarity search in images. The new IPv6 support allows binding and serving via IPv6 addresses, facilitating integration into modern dual-stack and IPv6-only networks.

Improvements include a truncate parameter for string fields for better exact match searching with long strings, and group_max_candidates for exact found values in group_by operations. The synonym matching logic has been improved and now sorts results by match quality. A transliterator pool speeds up tokenization for Cyrillic and Chinese characters. Union Search now supports group_by, pinned_hits, and a remove_duplicates flag.

Version 30.0 fixes numerous bugs, including issues with analytics IDs for different filter_by and analytics_tag parameters, as well as field-specific token separators in highlighting. Pagination parameters are now correctly passed to union search, and deadlocks with asynchronous references have been resolved. Highlighting has been adjusted so that only exact matches are marked for phrase queries, and the actual query is highlighted in natural language search.

Details about the new version can be found in the Release Notes on GitHub.

(fo)

Don't miss any news – follow us on Facebook, LinkedIn or Mastodon.

This article was originally published in German. It was translated with technical assistance and editorially reviewed before publication.