Hybrid Search with Semantic Ranker - Taking Document Retrieval in RAG to the Next Level

Introduction

Retrieval-Augmented Generation (RAG) is a very powerful Generative AI architecture that allows digital assistants to answer business-related questions based on all types of documents from your company, as for example PDF files or website content. This approach found its way into your Teneo® projects thanks to the release of our Generative QnA Template Solution. You can find more details about the solution itself in the recording of our Office Hours session, the release notes and our teneo.ai tutorial. Its implementation into OpenQuestion has also shown the benefits this approach can have for the contact center when providing immediate answers for both the end user as also for the contact center agent.

The RAG approach consists of two steps – Document Retrieval and Answer Generation. For the first, Teneo’s solution enabled by default Vector Search – a performant search algorithm based on embeddings with several pricing options on Azure for the setup of the connected vectorized database. Let’s look into the following images to get a better idea of Vector Search.

(Source: Intro to Dense Vectors for NLP and Vision by James Briggs)

In the above image we look into a three-dimensional vector space and see that the days of the week are located quite close together, since they are used in similar ways within our language. There are obviously still differences between the individual entries, if I say ‘Monday’ it is not the same as if I said ‘Thursday’. Thus, the semantic meaning of a word (or utterance) needs a very detailed representation, and we use Azure’s Ada-002 embedding model for this which turns a user input and also the content of the vectorized database into a vector space of 1,536 dimensions. As we can see in the images below, embeddings are capable of relating very nicely also implied relations between concepts in our language. In the example from OpenAI, we see that the vector representation of ‘canine companions say’ leads us (at least very close) to the vector representation of the word ‘woof’. As you can imagine, Vector Search can be powerful in a search for the right document since it encodes information about the semantic meaning of words and utterances.

(Source: Introducing text and code embeddings (openai.com) )

Now, sometimes it is necessary though to find an exact match on something, for example if we look for a certain product name. Let’s say we want to find the documentation for Teneo 7.3 – this means that we know the exact version and we only want to have answers for Teneo 7.3, neither Teneo 7.1 nor Teneo 7.2.

And that is the situation in which keyword-based searches find their value. Combining the power of both approaches (vector and keyword search) is what we want to do in today’s article.

In order to add further options for the document retrieval step to your Teneo project, we have prepared an overview and a step-by-step tutorial on how to update your Generative QnA Template Solution to the state-of-the-art approach: Hybrid Search with Semantic Ranker.

We apply then the power of both vector and keyword search, as hybrid search, and add to the mix Microsoft’s Semantic Ranker which utilizes sophisticated language models to enhance the relevance and quality of search outcomes. This feature is driven by cutting-edge technology developed in collaboration with Bing, employing vast data resources and extensive machine learning proficiency to prioritize document rankings. Capable of comprehending a user’s query intent and meaning, the semantic ranker unveils the most pertinent matches.

Why Is the Document Retrieval Step so Important?

Let’s take a look at the following image which visualizes a RAG architecture.

Below you can see an example in which the user asks about OpenQuestion and gets a generative answer grounded by the knowledge of teneo.ai.

As you see, the two involved tasks are interconnected, and the outcome of the retrieval task directly influences the result of the generative task. Garbage in, garbage out, as so often with data-based tasks.

You can easily imagine the situation as follows. If somebody hands you over a selection of documents and asks you to answer a question based on those, it will be difficult to answer correctly if those documents do not provide the required information. So, let’s make sure that our Generative Model receives what it needs to answer correctly.

Performance

Microsoft recently published on their tech community blog a nice comparison that shows the difference between the on Azure available search algorithms As you can see in the image below, Vector search clearly outperforms a traditional keyword-based search but getting the best of both worlds in a hybrid approach (often) leads to even better results.

Source: Azure Cognitive Search: Outperforming vector search with hybrid retrieval and ranking capabilities - Microsoft Community Hub

Let’s take a look at a few examples together to see which use cases the algorithms handle nicely.

  • Keyword Search: “Can you show me your discounts?” → can bring you up efficiently all results that contain the word discount
  • Vector Search: “Can you show me what you have on sale this week?” → embeddings are used to represent the meaning of the user input in a multi-dimensional vector and in this way they can also easily handle synonyms and different phrasing of the same meaning. For example, when using text-embedding-ada-002, we are talking about 1,536 dimensions.
  • Hybrid Search: Uses both seen approaches (keyword and vector search) and then performs a fusion step to select the best results from each technique. Azure AI Search (previously named Azure Cognitive Search) currently makes use of Reciprocal Rank Fusion (RRF) in order to produce a single result set.
  • Hybrid Search + Semantic Ranker: The Semantic Ranker computes higher quality relevance scores to reorder the result set, using similar technology to what is being used in Microsoft’s Bing Search Engine. You can find a detailed explanation of the approach here.

Please note that the first three search techniques provide a @search.score as single result regarding the order of the results, while the last option adds a further @search.rerankerScore as Semantic Ranking. The range of the latter is between 1.00 and 4.00 and can be used to compare the result to a project-specific threshold in order to decide when documents are considered as relevant to answer the user’s question and when not.

Tutorial

In the following, you find all details on how to add both Hybrid Search as standalone and Hybrid Search together with Semantic Ranker to your Generative QnA solution. Feel free to contact your Customer Success Manager for direct access to the updated GenerativeQnA and AzureCognitiveSearchClient Groovy files.

Hybrid Search (only)

Azure AI Search uses the BM25 algorithm for text search and the KNN or HNSW algorithm for vector search, then the RRF algorithm is used to combine the search results from multiple methods.
Reciprocal Rank Fusion (RRF) is an algorithm used to consolidate search scores from multiple queries executed simultaneously. It merges ranked result sets into a unified response. Based on reciprocal rank, RRF emphasizes the position of items in original rankings. This prioritizes items ranked higher in multiple lists, enhancing the overall quality and reliability of the final ranking.

Implement Hybrid search in Teneo

Add the following method in the groovy file AzureCognitiveSearchClient.groovy:

def hybridSearch(def text, def embedding, int numResults = 4, String filter) {
        return doPost("/docs/search?api-version=${API_VERSION}", [
                filter: filter,
	 search: text,
                vector: [
                        value : embedding,
                        k     : numResults,
                        fields: 'content_vector'
                ],
                select: 'title,content,metadata',
                top: numResults
        ])
    }

Add the following method in the groovy file GenerativeQnA.groovy:

static hybridSearch(def query, def embedding, int numResults = 4, String filter = null) {
        return searchClient.hybridSearch(query, embedding, numResults, filter)
    }

Use the following method:

GenerativeQnA.hybridSearch(query, embedding, NUMBER_OF_SEARCH_RESULTS) 

instead of

GenerativeQnA.embeddingSearch(embedding, NUMBER_OF_SEARCH_RESULTS)

in the Integration Retrieval-augmented Generative QnA in your Teneo solution. Please replace the number of search results if needed.

Hybrid Search with Semantic Ranker

Semantic Ranker is a new functionality by Azure AI Search. We leverage the capabilities of both vector and keyword searches in a hybrid approach. Additionally, we incorporate Microsoft’s semantic ranker into the equation, employing advanced language models to elevate the quality and relevance of search results. This functionality is powered by state-of-the-art technology, developed in partnership with Bing, utilizing extensive machine learning expertise and vast data resources to prioritize document rankings. The semantic ranker can grasp a user’s query intent and meaning, revealing the most relevant matches.

For more details about Semantic Ranking: Semantic ranking - Azure AI Search | Microsoft Learn

Enable Semantic Ranker in Azure

To enable Semantic Ranker in Azure, you need to open your Azure AI Search resource, and choose Semantic Ranker from the sidebar on the left. Then choose a pricing plan:

Then open the indexes and click on embeddings:

Select Semantic configurations and click on Add semantic configuration:

Finally, configure semantic ranking following this guide: Configure semantic ranking - Azure AI Search | Microsoft Learn

Implement Semantic Search in Teneo

Add the following method in the groovy file AzureCognitiveSearchClient.groovy:

def semanticSearch(def text, def embedding, def configuration, def language = "en-us", int numResults = 4, String filter) {
        return doPost("/docs/search?api-version=${API_VERSION}", [
                filter: filter,
				search: text,
                vector: [
                        value : embedding,
                        k     : numResults,
                        fields: 'content_vector'
                ],
                select: 'title,content,metadata',
                queryType: "semantic",
                semanticConfiguration: configuration,
                queryLanguage: language,
                captions: "extractive",
                answers: "extractive",
                top: numResults
        ])
    }

Add the following method in the groovy file GenerativeQnA.groovy:

static semanticSearch(def query, def embedding, def configuration, def language = "en-us", int numResults = 4, String filter = null) {
        return searchClient.semanticSearch(query, embedding, configuration, language, numResults, filter)
    }

Use the following method:

GenerativeQnA.semanticSearch(query, embedding, 'semantic_config', 'en-us', NUMBER_OF_SEARCH_RESULTS) 

instead of

GenerativeQnA.embeddingSearch(embedding, NUMBER_OF_SEARCH_RESULTS)

in the Integration Retrieval-augmented Generative QnA in your Teneo solution. Please replace the semantic config file name, the language code and the number of search results if needed.

Conclusion

The Generative AI world moves fast – having a flexible architecture makes it possible to upgrade to the latest technology at any time (if desired) and to have many options at hand. Every project has its specific requirements, and we want to make sure to give you all choices at hand to achieve the best performance. Hybrid Search (with Semantic Ranker) adds a lot of power to the document retrieval step of our RAG approach. Make sure to check for your project which algorithm makes the most sense, considering both the performance and the costs. We hope you enjoyed this article, feel free to drop any questions into the comments.

4 Likes