solving sql server problems for millions of dbas and developers since 2006



SQL Server DBA Tips SQL Server Developer Tips SQL Server Business Intelligence Tips SQL Server Career Tips SQL Server Tip Categories SQL Server Tutorials SQL Server Webcasts SQL Server Whitepapers SQL Server Tools SQL Server Questions and Answers MSSQLTips Authors About MSSQLTips SQL Server User Groups MSSLQTips Giveaways MSSQLTips Advertising Options

MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Twitter Page MSSQLTips Google+ Page








Replacing OPENXML with the XML nodes() Function in SQL Server 2005

By: | Read Comments (1) | Print

Ray is a Principal Architect at RDA Corporation and a MSSQLTips.com BI Expert.

Related Tips: More

Problem
We have a number of instances where we coded stored procedures to handle transforming an input parameter that is an XML document into a rowset.  Prior to SQL Server 2005 we have always used the OPENXML rowset provider to parse the XML and produce a rowset.  What new functionality is there in SQL Server 2005 to accomplish this?

Solution
SQL Server 2005 includes native support for XML; there is now an XML column type with a nodes() function that can be used in lieu of OPENXML to transform an XML document into a rowset.  Let's take a simple example that uses OPENXML and show how to convert it to use the XML column type and the nodes() function in SQL Server 2005.

To keep our example simple, let's assume we need to perform some sort of processing based on a list of Purchase Orders that is specified in an XML document.  The XML document looks like this:

<polist>
  <po><ponumber>100</ponumber><podate>2008-09-10</podate></po>
  <po><ponumber>101</ponumber><podate>2008-09-11</podate></po>
</polist>

We can transform this XML into a rowset using the following OPENXML:

DECLARE @DocHandle int
DECLARE @XmlDocument nvarchar(1000)

SET @XmlDocument = N'<polist>
<po><ponumber>100</ponumber><podate>2008-09-10</podate></po>
<po><ponumber>101</ponumber><podate>2008-09-11</podate></po>
</polist>'

EXEC sp_xml_preparedocument @DocHandle OUTPUT, @XmlDocument

SELECT * FROM OPENXML (@DocHandle, '/polist/po',2)
WITH (ponumber nvarchar(10),
podate datetime)

EXEC sp_xml_removedocument @DocHandle

The following is the rowset from executing the above code snippet:

Here are the main points from the above code snippet:

  • The first step in using OPENXML is to "prepare" the XML document by calling the stored procedure sp_xml_preparedocument which returns an integer that identifies the prepared XML.
  • OPENXML is a rowset provider which means you can use it the same as if it were a table.  It is essentially a function that takes parameters that specify what to extract from the XML document.
  • The final step in using OPENXML is to "free" the prepared XML document by calling the stored procedure sp_xml_removedocument.
  • The above xml stored procedures utilize the common MSXML parser component.

The sp_xml_preparedocument stored procedure stores the prepared XML document in SQL Server's internal cache.  Calling the sp_xml_removedocument stored procedure is necessary in order to remove the prepared XML document from the cache.  According to SQL Server 2005 Books on Line, one-eighth of the total memory available to SQL Server may be used by the MSXML parser.  To avoid using up all of this memory, you should call the sp_xml_removedocument stored procedure as soon as you're done with the XML document.

The SQL Server 2005 version of the above code snippet is as follows:

DECLARE @xml xml
SET @xml = N'<polist>
               <po><ponumber>100</ponumber><podate>2008-09-10</podate></po>
               <po><ponumber>101</ponumber><podate>2008-09-11</podate></po>
             </polist>'
SELECT
 doc.col.value('ponumber[1]', 'nvarchar(10)') ponumber
,doc.col.value('podate[1]', 'datetime') podate 
FROM @xml.nodes('/polist/po') doc(col)

Here are the main points from the above code snippet:

  • We use the XML column type's nodes() function to transform the XML into a rowset.
  • The doc(col) in the nodes() function works like a table and column alias; there is nothing special about the choice of doc and col, i.e. they are not reserved words.
  • The rowset is exactly the same as the one produced above using OPENXML.

A minor variation on the SQL Server 2005 example is shown below; in this case the XML document has attributes instead of elements:

DECLARE @xml xml
SET @xml = N'<polist>
               <po ponumber="100" podate="2008-09-10" />
               <po ponumber="101" podate="2008-09-11" />
             </polist>'
SELECT
 doc.col.value('@ponumber', 'nvarchar(10)') ponumber
,doc.col.value('@podate', 'datetime') podate 
FROM @xml.nodes('/polist/po') doc(col)

You reference XML elements using the subscript notation (e.g. ponumber[1]); you reference XML attributes using the '@ponumber' notation.

As far as performance differences between OPENXML and the XML column type with the nodes() function, there doesn't seem to be a crystal clear answer.  If you scan through the SQL Server XML Forum you will find some threads where OPENXML is faster than the XML column's nodes() function and vice-versa.  As always the best approach may vary depending on your individual circumstances so don't blindly choose the XML column type and the nodes() function over OPENXML.

Next Steps

  • Take a look at the OPENXML topic in Books on Line to dig in to the details of this capability.
  • We have just scratched the surface on the capabilities of the XML column type in SQL Server 2005; take a look at the XML Data Type Methods topic in Books on Line to familiarize yourself with the other functions such as query(), value(), exist() and modify().


Related Tips: More | Become a paid author


Last Update: 10/21/2008

Share: Share 






Comments and Feedback:

Monday, September 20, 2010 - 6:01:06 PM - Steve Read The Tip
We've been using XML nodes() since 2006 (post SQL 2005 deployment), however as our transaction volumes have increased, we've begun to notice exponential SQL slowdowns as the size of the XML increases!  Our environment is rather complex, we operate a  financial services platform over a rather large/distributed environment, relying heavily on XML and Service Broker (which has performed flawlessly over the years.)

After diagnosing significant SQL performance drags, we've begun to revist the OPENXML debate and in our labs have found it to be significantly more efficient.  Keep in mind, hardware/horsepower is plentiful in our environments - large SQL clusters, > 32 procs w/128gb of RAM.  Since I can't post the full XML (it's > 50mb) I'll summarize as much as possible, the before & after.

In our case, we process very large transactional volumes in one data-center, via our GRID Computing architecture, then ship the results (via XML/Service Broker) to another environment for workflow & reporting purposes.  The XML can get very large (1-1000mb) in size, with several thousand nodes.  Initially, we began to break up the jobs into smaller batches, and that helped a bit, but due to the sheer volume, it was still too inefficient with large CPU spikes for a sustained period of time.

The XML structure is provided below, whereas we pass the XML (data) and update statement for dynamic execution for down-stream environments. [condensed XML provided]

 

declare @message_body XML

SET @message_body = '

 

 

 

    XXX

    XXX

    XXX

 

 

    XXX

    Publish Data from XXX to XXX

 

 

 

 

        2010

        2010-09-03T08:32:43.380

        1

        123

        456

         16208

        0

        11.00

        93112533

        2

        5

        1

        910016218

        3

        12

        789

        5

        11534576

        2BEAA4CC-FBBE-4BE8-9D1E-C53949D04E7B

 

 

        2010

        2010-09-03T08:32:43.380

        1

        123

        456

         2002

        0

        11.00

        93112533

        2

        5

        9

        910016218

        3

        12

        789

        5

        11534576

        0B428CD7-4854-4AE0-B502-5F0190130152

 

 

    INSERT INTO [XXX].[XXX].XXX ([fiscalYear], [glDate], [ledgerTypeID], [nationalID], [bu], [object], [subSid], [amount], [batchTypeID], [modID], [subModId], [docTypeID], [docNo], [periodID], [peCodeID], [createdBy], [subLedgerTypeId], [subLedgerValue], [glGUID], [notes]) SELECT T.c.value('fiscalYear[1]','INT'),T.c.value('glDate[1]','DATETIME'),T.c.value('ledgerTypeID[1]','INT'),T.c.value('nationalID[1]','INT'),T.c.value('bu[1]','INT'),T.c.value('object[1]','INT'),T.c.value('subSid[1]','VARCHAR(10)'),T.c.value('amount[1]','DECIMAL(18,2)'),T.c.value('batchTypeID[1]','INT'),T.c.value('modID[1]','INT'),T.c.value('subModId[1]','INT'),T.c.value('docTypeID[1]','INT'),T.c.value('docNo[1]','INT'),T.c.value('periodID[1]','INT'),T.c.value('peCodeID[1]','INT'),T.c.value('createdBy[1]','INT'),T.c.value('subLedgerTypeId[1]','INT'),T.c.value('subLedgerValue[1]','INT'),T.c.value('glGUID[1]','uniqueidentifier'),T.c.value('notes[1]','nvarchar(150)') FROM @xmlSourceData.nodes('/sqlData') T(c); '

 

'

 

 

You can infer the task is to do a bulk insert statement using XML nodes() for parsing the XML data.  Using a controlled dataset of 3000 nodes it would utilize ~20% CPU & take 46minutes to crunch.  Once we moved to an insert statement using OPENXML - < 10 seconds!  Needless to say, our team spent countless hours researching this issue, packing/unpacking and experimenting in our labs.  It goes to QA & UAC next week and we anti***te a significant uptick in performance.

- Hope this might help someone else, should they also need to p*** large/bulk XML.  We felt we were taking a step back in technology, but each circumstance demands unique solutions, and in this case it seemed like the right tool for the job.

 



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email   Notify for updates
Comments
*Enter Code refresh code


 
Sponsor Information
Find and fix SQL Server problems before they happen - SQL diagnostic manager now with predictive analysis!

Time-strapped database professionals use SQL Monitor to look after their servers. Try it online.

What grade do you think your SQL Servers get? Find out with Edgewood's SQL Server Health Check starting at $995.

Free Trial: Get Proactive Insight with Spotlight® for SQL Server Enterprise.

Solving SQL Server problems for millions of DBAs and Devs since 2006. Join now.

The SQL Server Security THREAT - It’s Closer Than You Think


Copyright (c) 2006-2012 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups
Some names and products listed are the registered trademarks of their respective owners.


Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com