Sometimes it is necessary to search for specific content inside documents stored in a SQL Server database. Is it possible to do this in SQL Server? Can I run T-SQL queries and find content inside Microsoft Word files? Yes, now with SQL Server 2012 you can do a semantic search. You can create a T-SQL query and look for information inside a Word document or other documents. In this tip, we will look at how this works.
SQL Server 2012 introduces a new search feature beyond the relational platform called semantic search. This feature extends the semantic full-text search capabilities to external files.
The Semantic Search engine can be installed when you do the initial installation of SQL Server or it can be added later.
Here is a screenshot of the option when you do a normal installation:
If you did not install this when you did the initial installation you can run the SQL Server installer for the SemanticLanguageDatabase as shown below. This can be found in the SQL Server setup folder.
Verify Features Are Installed
To verify if you have Full Text and Semantic Search installed you can run this query:
The following query s used to verify that the semantic database is installed and registered:
SELECT * FROM sys.fulltext_semantic_language_statistics_database
Create Sample Database
To use this new feature we are going to use the FileTable feature in SQL Server 2012. First I am going to create a folder where we will store our database files. In this example I have created a folder called "C:\FileTable".
After this folder is created, open SQL Server Management Studio and create a new database called "FileSearchTest". This is created just like any other database except we are going to also enable FILESTREAM for this database as well.
The script below creates the database:
CREATE DATABASE FileSearchTest
NAME = N'FileSearchTest',
FILENAME = N'C:\FileTable\FileSearchTest.mdf'
FILEGROUP FilestreamFG CONTAINS FILESTREAM
NAME = MyFileStreamData,
NAME = N'FileSearchTest_Log',
FILENAME = N'C:\FileTable\FileSearchTest_log.ldf'
NON_TRANSACTED_ACCESS = FULL,
DIRECTORY_NAME = N'FileTable'
Create Sample Table
Create a new table as a FileTable inside the database.
CREATE TABLE DocumentSemantics AS FileTable
FileTable_Directory = 'DocumentSemantics',
FileTable_Collate_Filename = database_default
We just created a FileTable! This is a special table that is used to store non-relational data like Word files, text files, ppt files, etc... The below image shows this new feature in SSMS.
There is also a feature in SQL Server 2012 to explore the FileTable directory from within SSMS. In SSMS right click on the new FileTable you can see the "Explore FileTable Directory" option. This will show you the files that exist in this folder.
You can now copy files into the folder "C:\FileTable\Data", using Windows Explorer and then if you run a SELECT against the DocumentSemantics FileTable you can see a list of these files.
You can just run this command to see all of the details:
SELECT * FROM dbo.DocumentSemantics
You can also delete files from the folder using T-SQL as shown below:
USE FileSearchTest DELETE FROM dbo.DocumentSemantics WHERE dbo.DocumentSemantics.file_type <> 'docx'
Searching Content Within Files
In this example we are going to search for text and find the files with that content:
USE FileSearchTest GO SELECT name FROM dbo.DocumentSemantics WHERE FREETEXT (file_stream, 'After climbing a very high mountain, we discovered that there are many other mountains to climb')
The query will show the name of the documents that contain the following text: "After climbing a very high mountain, we discovered that there are many other mountains to climb".
The results displayed are the following:
If we open these documents in WordPad we can see this text as outlined below.
Finding Documents By Filename
Another option we have is to find filenames that include some text. In this example we are going to search for the word "system".
These are the files in folder "C:\FileTable\Data" that include the word "system":
We can run this T-SQL script to find the first 5 files that contain the word "system" in the filename:
SELECT TOP (5)
FROM dbo.DocumentSemantics D
INNER JOIN SEMANTICKEYPHRASETABLE
) AS KEYP_TBL
ON D.path_locator = KEYP_TBL.document_key
WHERE KEYP_TBL.keyphrase ='system'
ORDER BY KEYP_TBL.score DESC;
The results displayed by the query are as follows:
Semantic Search combined with FileTables is a powerful feature that will let us handle external files in SQL Server easily and we can query the content easily using T-SQL.
For more information, you can read these artilces:
I want to do "searching content within files" for PDF documents. I searched a lot.According to that I found ifilter can do this.I tried that. ifilter installed path is also visible in sql server query.But when use query to search content, doc types are visible.But not pdf documents.