![]() |
|
|
|
By: Matteo Lorini | Read Comments (2) | Related Tips: More > Performance Tuning |
Is there a way to reproduce and diagnose query compilation and plan issues without actually having the data or the entire database?
A while ago we were experiencing query performance issues and upper management decided to contact Microsoft Customer Service and Support (CSS). At the begging I was quite hesitant to contact them, because I knew they would ask for a copy of our database to help troubleshoot the issue and I did not want to do that due to the size of the database and for privacy reasons.
To my surprise, CSS did not ask us to provide a copy of our database, but instead asked us to provide a “Clone Database” or better known as a “Statistics-Only” copy of the database.
What is it Clone Database and how is it used? A clone database is a new database you create that contains the schema and statistics of the original database; it is useful because it provides a way to reproduce and diagnose query compilations and plan issues without actually having the data or the entire database, which makes it much smaller and more portable.
For example, if we have a query that does not perform well and we want to allow Microsoft Support or a third party consultant investigate and troubleshot the query, we can provide a copy of the query plus our “Statistics-Only” copy of the database. The query, when run against the cloned database will still return the same query execution plans as it would if executed against the original database. The advantage is the size of the clone database is smaller the original database, plus the clone-database does not contain any data therefore privacy is maintained.
In the example below, I have created a clone of my test database WF. Please note the presence of the CREATE STATISTICS statements.
Now, if we check the execution plan of the following query against the clone database:
select a.* from dbo.tab7 a join dbo.tab7 b on a.c2=b.c2
We will notice the plan is exactly the same as the database that has the data. Table tab7 is empty on the clone database, however the optimizer estimated 54263.1 rows
Once the clone database has been created it can be used to reproduce and diagnose, in a separate environment, query compilations and performance issues because the optimizer will have the following information available to select a query plan.
The following table of values has been copied from http://support.microsoft.com/kb/914288 for your convenience.
|
Constraints |
The query optimizer frequently uses constraints to detect contradictions between the query and the underlying schema. For example, if the query has a "WHERE col = 5" clause and a "CHECK (col < 5)" check constraint exists, the query optimizer knows that no rows will match. |
|
Statistics |
The statistics information contains density and a histogram that shows the distribution of the leading column of the index and statistics key. Depending on the nature of the predicate, the query optimizer may use density, the histogram, or both to estimate the cardinality of a predicate. Up-to-date statistics are required for accurate cardinality estimates. The cardinality estimates are used as an input in estimating the cost of an operator. Therefore, you must have good cardinality estimates to obtain optimal query plans. |
|
Table size (number of rows and pages) |
The query optimizer uses the histograms and density to calculate the probability that a given predicate is true or false. The final cardinality estimate is calculated by multiplying the probability by the number of rows that are returned by the child operator. The number of pages in the table or the index is a factor in estimating the IO cost. The table size is used to calculate the cost of a scan, and it is useful when you estimate the number of pages that will be accessed during an index seek. |
|
Database options |
Several database options can affect optimization. The AUTO_CREATE_STATISTICS and AUTO_UPDATE_STATISTICS options affect whether the query optimizer will create new statistics or update statistics that are out of date. The PARAMETERIZATION level affects how the input query is parameterized before the input query is handed to the query optimizer. Parameterization can affect cardinality estimation and can also prevent matching against indexed views and other types of optimizations. The DATE_CORRELATION_OPTIMIZATION setting causes the optimizer to search for correlations between columns. This setting affects cardinality and cost estimation. |
|
Session SET options |
The ANSI_NULLS setting affects whether the "NULL = NULL" expression evaluates as true. Cardinality estimation for outer joins may change depending on the current setting. Additionally, ambiguous expressions may also change. For example, the "col = NULL" expression evaluates differently based on the setting. However, the "col IS NULL" expression always evaluates the same way. |
|
Hardware resources |
The cost for sort and hash operators depends on the relative amount of memory that is available to SQL Server. For example, if the size of the data is larger than the cache, the query optimizer knows that the data must always be spooled to disk. However, if the size of the data is much smaller than the cache, the operation is likely to be done in memory. SQL Server also considers different optimizations if the server has more than one processor and if parallelism has not been disabled by using a "MAXDOP" hint or the max degree of parallelism configuration option. |
|
SQL Server Stock Keeping Unit (SKU) |
Certain features are only enabled on specific editions of SQL Server 2005 or of SQL Server 2008. For example, matching a query against indexed views only occurs in SQL Server 2005 Enterprise Edition. Similarly, matching against plan guides is limited to SQL Server 2000 Standard Edition and SQL Server 2005 Enterprise Edition. |
| Wednesday, May 05, 2010 - 7:27:51 PM - bitbucket | Read The Tip |
|
This is quite suprising since a blog posting on this very same subject / method / instructions was published by Karen Delaney at: http://sqlblog.com/blogs/kalen_delaney/archive/2007/11/21/cloning-in-sql-server-2005.aspx And was presented as a Question Of the Day on SQLServerCentral.com on June 25, 2008 at: http://www.sqlservercentral.com/questions/Administration/62936/
|
|
| Tuesday, March 13, 2012 - 1:08:44 PM - Mike Casey | Read The Tip |
|
"the clone-database does not contain any data therefore privacy is maintained" Actually, as histograms for text columns contain sampled text values, privacy is not necessarily maintained, depending upon what is in the sampled text, eg a telephone number or name. A DBCC SHOW_STATISTICS command would reveal whatever was sampled. |
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |