Problem As far as data management and RDBMS rules are concerned, relational databases are the most adoptable way for maintaining data with a reasonable level of concurrency and while supporting data management activities such as backups, mass purges, changing data structures, etc.
One problem is the difference in programming languages in traditional applications. The SQL (Structured Query Language) language is a declarative language that became the "Data Language" used for describing "what I need" and "where to fetch it from" in most organizations. OOP (Object Oriented Programming) languages became the most common practice among developers widely adopted by R&D (Research and Development) organizations around the world. So how do we bridge the gap?
Solution Those two trends created a need for a "bridge" that will cover the gap by translating request from object oriented programming languages to SQL. In most cases the DAL (Data Access Layer) is used to describe the mechanism (mainly proprietary) built to manage all this "data bridging task" in a centralized way.
Since database vendors (Microsoft, Oracle, IBM, etc.) offer a very large set of proprietary commands in their specific flavor of in SQL, translation in the DAL has to support many options. What ends up happening is the implementation details sometimes lose many of the performance optimizations built into the engine. This has been causing many of these DAL's to be implemented in a very straight forward way, that being break down the request to smaller pieces translating each to it's equivalent SQL statements and building the "SELECT... FROM... WHERE..." clause that will figuratively do the job.
Over the years I have been observing some DAL's design and implementation and from my experience a "machine written SQL statement" sometimes tend to be very long text statements. Just the other day the question popped up "What is the maximum text length of queries should we aim for?", so I checked the Maximum Capacity Specifications for SQL Server 2005 and found that the length of a string containing SQL statements (i.e. batch size) for both 32 and 64 bit systems is defined as 65,536 * Network packet size. The default network packet size is 4096, so the SQL statement text is limited for 256 MB.
I suspected that long text queries (with much less than 256 MB) will create some challenge for the server's CPU. So I thought this should be tested and published in this tip. So for the scope of this tip we are trying to address these items:
Proving that long text queries will consume your CPU
Give some sense regarding the actual penalty expected on a medium sized server
Dual core CPU with 2GB of RAM and 4 x 10,000 RPM disks
Test Table Characteristics
For my test I am going to create a table (called t1000) with 200,000 records. This table has many different data types because I think it is a reasonable representation of many production tables work with on a regular basis. The characteristics of this table include:
A single integer column as a primary key (clustered by default).
A varchar column.
A char column simulating additional 1 KB of data.
Five integer columns that will be used for creating the long text query's in the WHERE clause.
Script: Create Test Table
create table t1000 ( c1 int not null constraint test_pk primary key, c2 varchar(10) not null, c3 char(1000), c4 int not null, c5 int not null, c6 int not null, c7 int not null, c8 int not null ) go
Since I am planning to test some very long queries, I will produce them in an automatic way. My first thought was to just print some long text to the screen and then paste it to a new SQL Server Management Studio's query window. What I found out was that long queries (hundreds of KB) are a bit heavy for the Management Studio (especially when word wrap is on) so I turned to the next best thing - files.
Writing text files can be done in various programming languages but since we deal with SQL Server I'll demonstrate a T-SQL way I learned from Reading and Writing Files in SQL Server using T-SQL - a great article by Phil Factor. I will use the following stored procedure, originally published by Phil here with a few tweaks for my needs.
create PROCEDURE spWriteStringToFile (@String Varchar(max), --8000 in SQL Server 2000 @Path VARCHAR(255), @Filename VARCHAR(100) ) AS
Since the stored procedure above uses OLE Automation you will need to enable it on your SQL Server because this option is turned off by default due to security concerns. It is still fun playing with it :). Turning on the Ole Automation is done with the following command:
EXEC sp_configure 'Ole Automation Procedures', 1
RECONFIGURE WITH OVERRIDE
In case your server does not have Ole Automation enabled, trying to run the above stored procedure will produce the following error:
Msg 15281, Level 16, State 1, Procedure sp_OACreate, Line 1
SQL Server blocked access to procedure 'sys.sp_OACreate' of component 'Ole Automation Procedures' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'Ole Automation Procedures' by using sp_configure. For more information about enabling 'Ole Automation Procedures', see "Surface Area Configuration" in SQL Server Books Online.
Script: What SQL statement should I use for my test?
I want to write a simple query that will return only one row but will have a very long WHERE clause. The following is an example query:
select top 1 c1 from t1000 where c1 > 0 or (c5 = 1) or (c6 = 2) or (c7 = 3) or (c8 = 4) or (c4 = 5) or (c5 = 6) or (c6 = 7) or (c7 = 8) or (c8 = 9) or (c4 = 10) GO
All that is left is to create a T-SQL block that will create a query similar to the one above:
/* 01 */ set nocount on
/* 02 */ declare @i as int
/* 03 */ declare @sql_stmt as varchar(max)
/* 04 */ declare @num_of_ors as int
/* 05 */ set @num_of_ors = 5000
/* 06 */ set @i = 0
/* 07 */ set @sql_stmt = 'select top 1 c1 from t1000 where c1 > 0'
As I mentioned earlier loading such long queries can be a bit heavy for Management Studio. So during my testing I ran the queries with the Sqlcmd Utility (new and much enhanced version of osql) which is perfect for running queries entered at the command line or from a parameter file.
Running the test queries with sqlcmd utility is very easy. For testing purposes I used the command below to connect to the SQL Server instance, read from an input file and write output to an file.
SQL Server parse and compile time: CPU time = 83829 ms, elapsed time = 83893 ms
These results show the following trends:
Running the query required almost no I/O resources.
Most of the time was "parse and compile time".
"parse and compile time" is mostly composed of CPU time.
The CPU time consumed raises as the query's text gets longer.
I have summarized the results into this Excel file; though the bottom line is included in the following table and graph:
Size of Query (KB)
CPU Time (ms)
Elapsed Time (ms)
The graphic below outlines the size of the query and the CPU time in milliseconds for the three tests conducted separated by a comma. Based on this information, I believe we have found out that lengthy SQL Server queries will consume your CPU.
When designing a DAL (Data Access Layer) or using one in your applications, you should take the following items into consideration:
Long SQL statements require a large amount of CPU to processes regardless of the actual record amount being fetched.
DAL architecture should address functionality challenges in addition to performance challenges including the length of the SQL query statement created. A good rule of thumb can be to make sure that all queries larger than 50KB are thoroughly tested to make sure they are not creating a performance issue.
DAL implementers might want to consider including a fuse for limiting the size of queries.
You might want to get more familiar with sqlcmd utility as this can be a key tool for testing, maintenance and general usage.
Last Update: 5/16/2008
About the author
Tal is a database expert currently working for HP holding various positions in IT and R&D departments.
Interesting experiment, but I think the effect you're seeing is more a function of the number of connectives you have in the predicate than the actual length of the query string in bytes.
Try rerunning your tests, but instead of increasing the query size by adding "OR"s, make your predicate like this:
SELECT c1 FROM t1000 WHERE c1 = 5555 OR '<ABCD Short String>' = '<WXYZ Short String>'
Then, vary the length of your query by increasing the length of the two strings after the OR, keeping them distinct so that they don't inform the final result.
First, Yes you are correct,
Second , as I wrote in my article, this is a true story Ė I found
myself in a design review meeting when the question popped up: "What
is the maximum text length of queries should we aim for?". You can assume
that in the practical world where an automatic DAL (Data Access Layer) writes
your query for you the lengthy query are results of a lot of conditions (poor design)in the criteria clause and not one that is
Anyway, just to prove your point for you I wrote the following
script that created the test you asked for:
insertinto t1001 (c1, c2)values(1, @txt)
set @sql_stmt ='select c1 from t1001 where
c2 = '''+@txt+''''