![]() |
|
|
|
By: Seth Delconte | Read Comments (9) | Related Tips: More > XML |
We have XML data in a SQL Server database that we want to search efficiently, but are not sure of the best way to get the results we want. We need to find specific values in certain XML nodes- this will involve finding node values that match given substrings. Check out this tip to learn more.
We will be working with the AdventureWorks2012 sample database – querying the Demographics XML column in the Person.Person table. To see what the Demographics XML data looks like, let’s pull a sample Demographics field by selecting a random record:
--ordering by NEWID() function randomizes the sort order SELECT TOP 1 Demographics FROM AdventureWorks2012.Person.Person ORDER BY NEWID() GO

If we expand the XML contents of this field, we see:

It’s evident that we are dealing with typed XML, since we can see a namespace declaration (xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey") inside the root node. This means that the XML data is associated with an XML schema that enforces data type and validation rules. We want to get a count of the surveys where the individual’s education is listed as ‘Bachelors’. How should we do this? We can try a rather primitive method that finds the existence of substrings anywhere in the XML column by using the CHARINDEX function:
--filtering XML data using CHARINDEX function
SELECT COUNT(1)
FROM Person.Person
WHERE CHARINDEX('Bachelors',CONVERT(VARCHAR(MAX),Demographics),1) > 0

This does return a count of the number of times the word ‘Bachelors’ appears. However, it’s quite clumsy - and worse yet, doesn’t guarantee that the text ‘Bachelors’ is a real XML value in the table – it could also be part of a node name. A much cleaner solution is to use XQuery.
XQuery is a language designed for querying XML data, and is not proprietary to SQL Server – it is used by many relational database management systems (RDBMS). A simple implementation of an XQuery solution is as follows:
--filtering XML data using XQuery
;WITH XMLNAMESPACES
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[.="Bachelors"])')=1
GO
This may look a little complicated, but really isn’t too hard to decipher. The WITH XMLNAMESPACES statement is needed to explicitly declare the namespace for in order to reference any XML node element or attribute, which we did in the WHERE clause (the Education node element). Another way to declare the namespace is to put the declaration in-line with the actual XQuery method that is being used:
--filtering XML data using XQuery with in-line namespace declaration
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('declare namespace ns=
"http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey";
(/ns:IndividualSurvey/ns:Education[.="Bachelors"])')=1
GO
In both examples, we declared a namespace and aliased it as ‘ns’, which is used as a prefix for each path step or node reference. We listed the node path steps from root node (IndividualSurvey) to destination node (Education).
The exist() method is used in our query as the WHERE criterion to filter the XML data, finding XML records where the Education node’s value is equal to the text ‘Bachelors’. The exist() method returns a bit value of 1 if the method finds a non-empty node whose element or attribute value matches the given criteria. Upon running the query, however, we get a record count of zero:

What happened? We’ve assumed too much – that the value we are looking for in the Education node will exactly match the text ‘Bachelors’. If we look at the sample Demographics XML fragment closely, we see that there is a space after the word. So, if we modify our query and run it again, we get the following:
--filtering XML data using XQuery - adding a space to the string
;WITH XMLNAMESPACES
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[.="Bachelors "])')=1
GO

How can we prevent this oversight from happening again? XQuery in SQL Server makes use of a contains() function that finds a string as a whole substring in a node value. The contains() function requires two arguments:
Using the contains() function, we don’t have to be concerned with matching the string exactly:
--filtering XML data using the XQuery contains() function
;WITH XMLNAMESPACES
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[contains(.,"Bachelors")])')=1
GO

We can easily expand our filter by additionally specifying that only records where the Gender node contains ‘F’ are returned:
--filtering XML data using the XQuery contains() function in multiple WHERE criteria
;WITH XMLNAMESPACES
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT COUNT(1)
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[contains(.,"Bachelors")])')=1
AND Demographics.exist('(/ns:IndividualSurvey/ns:Gender[contains(.,"F")])')=1
GO

Our contains() function instances use what is called a self node axis abbreviation (‘.’) as their first argument. This specifies that we want to search in the current node context (Gender or Education). The second argument is for the actual string we are searching for (‘Bachelors’ or ‘F’).
We assume that our query is returning the records that meet our criteria, but what if we want to see the actual values to verify this? Another XQuery method we can use for this is value(). The syntax for the method is as follows:
We’ll employ this method to return node values for the Education and Gender nodes (only the first 3 records), along with the non-XML LastName field:
;WITH XMLNAMESPACES
('http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey' AS ns)
SELECT TOP 3
LastName,
Demographics.value('(/ns:IndividualSurvey/ns:Education)[1]', 'varchar(50)') AS Education,
Demographics.value('(/ns:IndividualSurvey/ns:Gender)[1]', 'char(1)') AS Gender
FROM Person.Person
WHERE Demographics.exist('(/ns:IndividualSurvey/ns:Education[contains(.,"Bachelors")])')=1
AND Demographics.exist('(/ns:IndividualSurvey/ns:Gender[contains(.,"F")])')=1
GO

The value() method requires an XQuery path argument like the exist() method, but also needs a SQL Server data type argument. We are using ‘varchar(50)’ and ‘char(1)’ as the SQL Server data types for the Education and Gender node values.
The value() method also enforces a singleton rule – requiring that the node element or attribute instance in a sequence be explicitly specified, even if there is only one. For example, there is only one Education node in our XML fragment, but we still have to designate a singleton by placing the ‘[1]’ at the end of the value() method, thus indicating that we want the first Education node.
We’ve looked briefly at some basic ways to query XML data using XQuery. We introduced the exist() method as an effective XQuery method to use in the WHERE clause. We also experimented with the XQuery contains()function, using it to count XML records having a precise node string value match. Finally, we used a very common XQuery method - the value() method - to return XML values.
| Wednesday, February 20, 2013 - 9:13:00 AM - Armando Prato | Read The Tip |
|
I enjoyed your previous tip on XML and I enjoyed this one just as much.... thanks! |
|
| Wednesday, February 20, 2013 - 9:26:26 AM - Seth Delconte | Read The Tip |
|
Thanks very much Armando! |
|
| Wednesday, February 20, 2013 - 10:19:32 AM - Cathy | Read The Tip |
|
This is an excellent and very understandable write-up. Thank you, Seth! |
|
| Thursday, February 21, 2013 - 11:39:19 AM - Bob | Read The Tip |
|
These kinds of posts are great for those of us who want to break into SQL Server DBA world. I am a DB2 DBA but I don'w work with XML much. Thank you, Seth. |
|
| Monday, February 25, 2013 - 11:46:37 PM - Muhammad Ikram | Read The Tip |
|
Nice informative article. God bless you, thanks for sharing knowledge. |
|
| Friday, March 22, 2013 - 9:36:02 AM - Khurram Cheema | Read The Tip |
|
Excellent, Thanks Seth |
|
| Tuesday, April 23, 2013 - 1:15:20 PM - Calvin Brown | Read The Tip |
|
Thank you Seth. This has been very helpful to me. I'm still searching for the other 2 methods ,nodes() and query() to gain a firm understanding in order to pass 70-433. |
|
| Wednesday, April 24, 2013 - 2:24:33 PM - Jeff Moden | Read The Tip |
|
Nice article Seth. Very well done. Not directed at Seth's article, I'm totally amazed that people would store such grossly denormalized data as XML in a table especially data like that found in AdventureWorks. It's not hierarchical and XML provides absolutely no advantage in this particular case. The data should be in separate columns in the same table. I'd have to shoot a designer if anyone ever proposed using XML for such a thing. I'd even settle for an EAV table instead of using XML in this fashion. |
|
| Tuesday, May 07, 2013 - 10:49:29 AM - Seth Delconte | Read The Tip |
|
@Calvin - thanks! This article shows some examples using both the nodes() and query() methods. Hope it helps! https://www.simple-talk.com/sql/database-administration/ad-hoc-xml-file-querying/ |
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |