An Overview of Azure Cognitive Search Service
By: Rajendra Gupta | Updated: 2021-12-28 | Comments | Related: > Azure
Search engines make our lives easier. Google has proven to be a trustworthy option for people searching online as it takes them directly to the results they're looking for and typically returns them fast. We can search anything we want, whenever we want, and never have to worry about the best results popping up on the screen.
Imagine that you need to implement a "min-search-engine" similar to Google or Bing for your data. You would have an interface that allows users to get the essential information they require without fussing with settings and complicated input fields. All this is fine, but imagine going through all of implementing such a search function from scratch. We are all in agreement that it would be a very complex and time-consuming task.
This tutorial will explore how Azure Cognitive Search can help you implement a search service for heterogeneous data.
The Microsoft Azure Cognitive Search (Azure Search) provides a cloud search service in the search-as-a-service model. It provides a rich search experience over the content (on-prem or cloud) for your applications. Azure uses REST APIs or .Net SDK for search functionality and hides the internal implementation details from the developers.
The following diagram shows that the Azure Cognitive Search sits between your content (un-indexed data) and the client application. The client application sends the search request to the search service and handles the response.
The Azure Cognitive Search Service has the following parts.
- Data Source: The data source provider can be Azure SQL Database, Managed Instance, SQL Server on Azure VM, Cosmos DB, Azure blob storage container or Azure Table Storage, SharePoint Online (preview), Azure Data Lake Storage Gen2, or any dataset composed of JSON documents.
- Index: Azure Cognitive Search creates the index on the specified data. The index is a persistent store of documents that are used for filtered and full-text search. Internally, Azure processes data into tokens and stores them into the inverted indexes for faster scanning. The automated crawler process (Indexer) runs at predefined intervals and defines changes based on SQL Integrated change tracking or high water mark change detection for Azure SQL Database.
- Querying: Once the index is populated with search text, the client applications can send and receive a response from the search service. The search can include auto-complete, synonym matching, fuzzy matching, filter, sort, auto spell correction or pattern matching, Optical Character Recognition (OCR), and identification of visual features, such as facial detection, image interpretation, image recognition.
Let us go ahead and implement Azure Cognitive Search for the data stored in Azure SQL Database.
Create an Azure Cognitive Search Service in the Portal
To create the Azure Cognitive Search Service, navigate to the Azure portal and search for the keyword - Cognitive search.
The Create Search Service requires the following inputs.
- Subscription and resource group
- Service Name: You need to provide a service name in the instance details section. The service name is used for all API calls in the following format – https://<ServiceName>.search.windows.net. The service name should be unique in the search.windows.net namespace.
- Location: Azure Cognitive Search is available in most Azure regions. However, you can refer to Products available by region for your Azure region based on AI enrichment, business continuity, and disaster recovery requirements.
- Pricing tier: Azure Cognitive Search offers Free, Basic, Standard, or Storage Optimized pricing tiers with capabilities and limits. By default, it uses the Standard service tier. You can click on Change service tier and choose the required pricing tier by considering the indexes, indexers, storage, search units, partitions, and estimated search unit cost per month.
For this tip, we use the free pricing tier as shown below.
Click on Review + Create for validation and Azure Cognitive Service deployment.
The following page shows the Azure Cognitive Search dashboard.
Use Azure Portal for Creating an Azure Cognitive Search Index
To create an API connection to Azure Cognitive Services, open the Azure portal and navigate to the dashboard page.
Click Import Data on the connect your data bar to create and populate a search index. The import data page requires connecting with an existing data source such as Azure SQL Database, Azure Cosmos DB, Azure Storage, and SharePoint.
Learning the indexing concept in Azure Cognitive Services provides a few sample data sets as well. Click on Samples and choose the required dataset. The dataset type shows that it has samples for Azure SQL Database and Azure Cosmos DB.
For the tip, let's select data source - hotels-sample and Continue to the next page.
For the built-in sample index, a default index schema is already defined. You can run the queries in the target hotel-samples index for returning search data.
The Import data wizard simplifies the importing process by condensing steps into a basic importing configuration. At a minimum, you'll need to specify a name and a fields collection; one field should be marked as the document key to identifying each document uniquely. However, you're able to specify additional details (such as language analyzers or suggesters) if you want to autocomplete functionality or suggested queries.
As shown below, the index uses the HotelID column as an index key.
Each column has the following attributes as the checkbox.
- Retrievable: The retrievable defines a column to appear in the search result. For example, you might require limiting search result columns so that you can clear the checkbox from a column.
- Key: It is a unique document identifier column, and it is a mandatory field and must be a string.
- Filterable, Sortable, and Facetable: These attributes determine whether the column is used for filtering, sorting, or faceted navigation structure.
- Searchable: The searchable field defines to include the column for full-text search. Usually, the string columns are searchable, while the numeric, Boolean fields are not searchable.
By default, the Azure Cognitive Search Service sets the attributes as below.
- String columns: Retrievable, Searchable
- Images: Retrievable, Filterable, Sortable, and Facetable.
You can change the column attributes as required. Let's go with the default attributes in the sample data set and move to the next page: Create an indexer.
Enter a suitable name for the indexer and define the schedule. However, you cannot change or modify the schedule for sample data sets or existing data sources without tracking changes. It allows setting once, hourly, daily, or custom schedule. The description is field is optional.
Click Submit to configure and simultaneously run the indexer.
The wizard takes you to the indexer list, where a content analyst can review indexes, the number of documents scanned, and status. You can go to the overview page and click the indexers tab as well.
It might take a few minutes for the portal to update before you can see anything, but keep refreshing until the page shows the newly created indexer in the list - with a status of "in progress" or "success" then along with how many documents have been indexed.
The service overview page provides you with a list of links. Click Indexes to see the index you created. The indexer shows document counts and storage size.
Click on the index name and verify the fields with their attributes. Specific fields are greyed out, which means they cannot be modified or deleted.
You can also retrieve index definition in JSON format with an option – Index Definition (JSON).
Query Using Search Explorer
The Search Explorer handles REST API requests, and it works well with simple queries and full Lucene query parsers. You can launch search explorer in the following ways.
- Launch search explorer from the Azure Cognitive Service home page.
- Use the search explorer from the Index menu.
Specify the query string and click on search in the verbose JSON documents. You can specify the search keywords similar to Google or Bing search or specify a fully-specified query expression. Let's explore a few sample queries in the search explorer.
- String query
The search parameter gives input of a keyword for the full-text search. The following query returns data from sample data set for those container "coffee" in any of the searchable fields of the document.
The query returns all documents (records) marked as "retrievable" in the index.
The parameterized query returns the search result as per the specified conditions. To specify the parameters, use the following.
- Use the & symbol to append search parameters. You can specify search parameters in any order in the query.
- The query below uses $count=true for returning a total number of returned documents from the search. The result (value) appears at the top of search results.
- The $top=5 parameter returns the number of documents as per their rank. For example, my sample query returns the highest ranked document in the search result.
As shown below, the query returned 19 documents, and the highest-ranked document score is 4.961525.
The $filter parameter is used to specify the criteria for returning the results. For example, suppose we want to retrieve hotels whose rating is less than 4.
Query: search=wifi&$count=true&$filter=Rating lt 4
In the result set, we can verify that the search result includes documents satisfying the filter condition.
Facet the Query
The facet parameter returns aggregated count of documents matching a facet value, and it returns a navigation structure with the category and count.
The query returns facet for the rating based on the text search for wifi. To specify a file as a facet, it should be marked as filterable and to included in the results, and it needs to be retrievable.
As shown below, it groups the rating column and returns the result count at the top of the search result.
Highlight Search Results
The search query returns all columns specified as retrievable in the index configuration. If there are multiple columns in the search results, it might be challenging to find the corresponding column. Therefore, you can use the HIGHLIGHT keyword to format the matching text on the keyword. The query output highlights the field to make it easier to spot.
I shared this first tip on the new Azure Cognitive Search Service and created an index using the sample data set. In this tutorial, we explored the Cognitive Search Service with some use cases as a search solution. In the next tutorial, we will cover more details on getting results from Azure Cognitive Search. Stay Tuned.
- Refer to these tips related to Azure.
- Read Microsoft documentation on the Azure Cognitive Search service.
About the author
View all my tips
Article Last Updated: 2021-12-28