By: Tibor Nagy | Last Updated: 2012-08-09 | Comments (13) | Indexing
I was checking the statistics objects in my Microsoft SQL Server database and found some statistics with strange names. Could you please explain what the _WA_Sys... statistics are? Why are they created? Can I safely delete them? Do they differ from the other statistics objects? Check out this tip to learn more.
In the SQL Server universe it vital to keep your statistics up-to-date otherwise you can face performance issues. When you set the AUTO_CREATE_STATISTICS option on at a database level, the Query Optimizer creates statistics on individual columns used in a predicate. These statistics are necessary to generate the query plan. They are created on columns that do not have a histogram in an existing statistics object. The name of the auto-created statistics includes the column name and the object ID in hexadecimal format: _WA_Sys_<column_name>_<XXXX>. These statistics are used by the Query Optimizer to determine the optimal Query Execution Plan.
Please note that the AUTO_CREATE_STATISTICS database option generates only single-column statistics for the full table and it does not generate filtered statistics.
How can you check if the Query Optimizer created such single-column statistics for a table? Just run the following query from SQL Server Management Studio:
SELECT stat.name AS 'Statistics',
OBJECT_NAME(stat.object_id) AS 'Object',
COL_NAME(scol.object_id, scol.column_id) AS 'Column'
FROM sys.stats AS stat (NOLOCK) Join sys.stats_columns AS scol (NOLOCK)
ON stat.stats_id = scol.stats_id AND stat.object_id = scol.object_id
INNER JOIN sys.tables AS tab (NOLOCK) on tab.object_id = stat.object_id
WHERE stat.name like '_WA%'
ORDER BY stat.name
The query will return the statistics name, the table name and the column name for all the statistics that have been added automatically by the Query Optimizer. This information can be useful to decide if you should add indexes to these columns or not.
Removing _WA_Sys... statistics in SQL Server
The _WA_Sys stats are created when the statistics are missing. You can drop these statistics, but it is worth having this statistical information in your database. If you drop them then they will be automatically created during the next query execution involving that column. This additional step negatively impacts the query performance.
I recommend converting the _WA_Sys stats to a statistics object with a more reasonable name. This can be done by dropping the auto-created statistics and running sp_createstats to create the statistics with a more user-friendly name:
EXEC sp_createstats @indexonly = 'NO', @fullscan = 'FULLSCAN', @norecompute ='NO'
After executing this command, the SQL Server will create all the missing single-column statistics for your database.
Comparison of _WA_Sys... statistics with the regular statistics objects
The Query Optimizer computes a histogram which is not 100% accurate for the _WA_Sys statistics. With the statistics, the histogram is used to describe the distribution of the values between certain ranges. It is usually represented by rectangles on a graph where the x-axis shows the ranges and the y-axis (the height of the rectangles) shows the number of data elements within the range. In SQL Server it measures the frequency of the occurrence for the distinct values in the given column. The statistical information used for the histogram is the following:
- RANGE_HI_KEY: the upper boundary for a histogram step
- RANGE_ROWS: number of rows falling within the histogram step (smaller then the actual upper boundary but greater than the previous step boundary)
- EQ_ROWS: number of rows which equal to the upper boundary
- DISTINCT_RANGE_ROWS: number of distinct values in the histogram step
- AVG_RANGE_ROWS: average number of rows with duplicate values in the histogram step
The histogram for a statistics object can be retrieved by the following query:
DBCC SHOW_STATISTICS (table_name, statistics_name) WITH HISTOGRAM
If you explicitly create statistics for an index with the FULLSCAN option then the Query Optimizer scans all the rows and returns exact statistical data. You can see this on the histogram of the statistics:
Let's check what happens if we drop the statistics and run a query to automatically create the statistics as shown above. In my example, for the _WA_Sys... statistics the histogram includes estimated values because the Query Optimizer does not sample all the rows. As such, you can see that not all the values are whole numbers:
- Check out these related tips to learn more about Statistics and Indexing:
- Read more tips by the author here.
Last Updated: 2012-08-09
About the author
View all my tips