Learn more about SQL Server tools

   
   




































Latest from MSSQLTips














SQL Server Full Text Search Noise Words and Thesaurus Configurations

MSSQLTips author Jeremy Kadlec By:   |   Read Comments   |   Related Tips: More > Full Text Search

Problem
I have heard that Full Text Search uses a noise words to eliminate meaningless words in searches.  I have also heard that a thesaurus is used, but I am not exactly sure how.  Can you provide some details related to how both of these technologies are used in Full Text Search?  I am interested in more of a background on these specific Full Text Search technologies as well as where the files are located and how I can update them.

Solution
Although the noise words and thesaurus have many similarities in terms of the implementation with Full Text Search, let's break apart each technology in order to explain them with more clarity.  Keep in mind that the information in this tip relates to SQL Server 2005, although SQL Server 2000 and 7.0 have some of the same properties.  Stay tuned for information on Full Text Search in SQL Server 2008 because some of the rules have changed.

Noise Words

  • Purpose: The noise words file has been established by Microsoft on a per language basis to determine the words which do not add value to the search.  For example, in the English language (United  States) noise words would include: because, been, before, being, between, both, but, etc.  In the Noiseenu.txt file (English US) over 100 noise words are setup by default and this file is used by default when querying with full text search.
  • Language Support: 20 different languages are supported from Chinese (Noisechs.txt) to Turkish (Noisetrk.txt).
  • Windows Directory: $SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\
  • Modifications: New noise words can be added to the file.  As we have had to make changes, we add them after the last entry which is 'your' by default.
  • Additional Information
    • If you make a change to a noise word file, it is necessary to repopulate all of the full text catalogs before any of the new noise words will be used.

Thesaurus

  • Purpose: Ability to define synonyms and use the synonyms in full text searches.  Specifically with the FORMSOF THESAURUS parameter with the CONTAINS or CONTAINSTABLE commands the thesaurus is used to determine synonyms of the search terms.  The same is true with the FREETEXT and FREETEXTTABLE commands.  With these commands the thesaurus is used to identify expressions or replacements for the searched terms.
  • Language Support: The thesaurus files are in XML format with a global file (tsGLOBAL.xml) and then 18 language specific files.  By default all of the files have sample XML that is commented out.  So by default no synonyms are setup when a full text search is issued.
  • Windows Directory: $SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATA\
  • Modifications: After reviewing the XML format, two types of thesaurus entries exist..  First are the expansion set and second are the replacement set.  The expansion set which can be considered synonyms or substitute terms.  The replacement set indicates that one string is replaced with another.
  • Additional Information:
    • All thesaurus files should be saved as Unicode.
    • The Full Text Search Engine Windows Service must be restarted to begin to use the new thesaurus files.

Next Steps



Last Update: 5/5/2008





About the author
MSSQLTips author Jeremy Kadlec
Since 2002, Jeremy Kadlec has delivered value to the global SQL Server community as an Edgewood Solutions SQL Server Consultant, MSSQLTips.com co-founder and Baltimore SSUG co-leader.

View all my tips


print tip Print  
Become a paid author


Learn more about SQL Server tools








Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     





 
More SQL Server Solutions



 

Follow

Get Free SQL Tips

Twitter

LinkedIn

Google+

Facebook

Pinterest

RSS

Learning

DBAs

Developers

BI Professionals

Careers

Q and A

Today's Tip

Resources

Tutorials

Webcasts

Whitepapers

Tools

Search

Tip Categories

Search By TipID

Authors

Community

First Timer?

Pictures

Free T-shirt

Contribute

Events

User Groups

Author of the Year

More Info

Join

About

Copyright

Privacy

Disclaimer

Feedback

Advertise

Copyright (c) 2006-2015 Edgewood Solutions, LLC All rights reserved
Some names and products listed are the registered trademarks of their respective owners.