Learn more about SQL Server tools



solving sql server problems for millions of dbas and developers since 2006 attend our next webcast













































   Got a SQL tip?
            We want to know!

SQL Server Semantic Search to Find Text in External Files

MSSQLTips author Daniel Calbimonte By:   |   Read Comments (3)   |   Related Tips: More > Full Text Search
Problem

Sometimes it is necessary to search for specific content inside documents stored in a SQL Server database. Is it possible to do this in SQL Server? Can I run T-SQL queries and find content inside Microsoft Word files?  Yes, now with SQL Server 2012 you can do a semantic search. You can create a T-SQL query and look for information inside a Word document or other documents.  In this tip, we will look at how this works.

Solution

SQL Server 2012 introduces a new search feature beyond the relational platform called semantic search. This feature extends the semantic full-text search capabilities to external files.

Installation

The Semantic Search engine can be installed when you do the initial installation of SQL Server or it can be added later.

Here is a screenshot of the option when you do a normal installation:

install full text and semantic search

If you did not install this when you did the initial installation you can run the SQL Server installer for the SemanticLanguageDatabase as shown below. This can be found in the SQL Server setup folder.

Open the SQL Server Management Studio

Verify Features Are Installed

To verify if you have Full Text and Semantic Search installed you can run this query:

SELECT SERVERPROPERTY('IsFullTextInstalled')

The following query s used to verify that the semantic database is installed and registered:

SELECT * FROM sys.fulltext_semantic_language_statistics_database

Create Sample Database

To use this new feature we are going to use the FileTable feature in SQL Server 2012.  First I am going to create a folder where we will store our database files.  In this example I have created a folder called "C:\FileTable".

After this folder is created, open SQL Server Management Studio and create a new database called "FileSearchTest".  This is created just like any other database except we are going to also enable FILESTREAM for this database as well. 

The script below creates the database:

CREATE DATABASE FileSearchTest
ON PRIMARY
(
    NAME = N'FileSearchTest',
    FILENAME = N'C:\FileTable\FileSearchTest.mdf'
),
FILEGROUP FilestreamFG CONTAINS FILESTREAM
(
    NAME = MyFileStreamData,
    FILENAME= 'C:\FileTable\Data'
)
LOG ON
(
    NAME = N'FileSearchTest_Log',
    FILENAME = N'C:\FileTable\FileSearchTest_log.ldf'
)
WITH FILESTREAM
(
    NON_TRANSACTED_ACCESS = FULL,
    DIRECTORY_NAME = N'FileTable'
)

Create Sample Table

Create a new table as a FileTable inside the database.

USE FileSearchTest
go
CREATE TABLE DocumentSemantics AS FileTable
WITH
(
    FileTable_Directory = 'DocumentSemantics',
    FileTable_Collate_Filename = database_default
);
GO

We just created a FileTable! This is a special table that is used to store non-relational data like Word files, text files, ppt files, etc... The below image shows this new feature in SSMS.

Open the SQL Server Management Studio

There is also a feature in SQL Server 2012 to explore the FileTable directory from within SSMS.  In SSMS right click on the new FileTable you can see the "Explore FileTable Directory" option.  This will show you the files that exist in this folder.

Open the SQL Server Management Studio

You can now copy files into the folder "C:\FileTable\Data", using Windows Explorer and then if you run a SELECT against the DocumentSemantics FileTable you can see a list of these files.

You can just run this command to see all of the details:

SELECT * FROM dbo.DocumentSemantics

Open the SQL Server Management Studio

Deleting Files

You can also delete files from the folder using T-SQL as shown below:

USE FileSearchTest
DELETE FROM dbo.DocumentSemantics
WHERE dbo.DocumentSemantics.file_type <> 'docx'

Searching Content Within Files

In this example we are going to search for text and find the files with that content:

USE FileSearchTest
GO
SELECT name
FROM dbo.DocumentSemantics
WHERE FREETEXT (file_stream, 'After climbing a very high mountain, we discovered that there are many other mountains to climb')

The query will show the name of the documents that contain the following text: "After climbing a very high mountain, we discovered that there are many other mountains to climb". 

The results displayed are the following:

Open the SQL Server Management Studio

If we open these documents in WordPad we can see this text as outlined below.

Open the SQL Server Management Studio

Finding Documents By Filename

Another option we have is to find filenames that include some text.  In this example we are going to search for the word "system".

These are the files in folder "C:\FileTable\Data" that include the word "system":

Open the SQL Server Management Studio

We can run this T-SQL script to find the first 5 files that contain the word "system" in the filename:

 USE FileSearchTest
SELECT TOP (5)
 D.name,
 D.cached_file_size,
 KEYP_TBL.score
FROM dbo.DocumentSemantics D
INNER JOIN SEMANTICKEYPHRASETABLE
 (
 dbo.DocumentSemantics, 
 (name, file_stream)
 ) AS KEYP_TBL
 ON D.path_locator = KEYP_TBL.document_key
WHERE KEYP_TBL.keyphrase ='system'
ORDER BY KEYP_TBL.score DESC;

The results displayed by the query are as follows:

Open the SQL Server Management Studio
Next Steps

Semantic Search combined with FileTables is a powerful feature that will let us handle external files in SQL Server easily and we can query the content easily using T-SQL.

For more information, you can read these artilces:



Last Update: 10/4/2012


About the author
MSSQLTips author Daniel Calbimonte
Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and Microsoft Certified IT Professional.

View all my tips


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools
Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Friday, August 01, 2014 - 9:58:37 AM - Anonymous Read The Tip

I want to do "searching content within files" for PDF documents. I searched a lot.According to that I found ifilter can do this.I tried that. ifilter installed path is also visible in sql server query.But when use query to search content, doc types are visible.But not pdf documents.

Can anybody help me.plzz  

 


Thursday, July 31, 2014 - 6:05:20 AM - Anonymous Read The Tip

Excellent post !! It works nicely.Thanks a lot  :)

 


Wednesday, October 24, 2012 - 4:57:13 AM - Yuval Klein Read The Tip

excelent post! 

the filetable feature with the FTS is really an excelent feature.

in the previous versions I needed to translate the word documents to text , load them into table and search inside. it was really a lot of job. now it looks so easy!




 
Sponsor Information







Copyright (c) 2006-2014 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | free t-shirt | user groups | community | events | first timer?
Some names and products listed are the registered trademarks of their respective owners.