Importance of Update Statistics in SQL Server
SQL Server query performance is something all database professionals are looking to improve. Many SQL Server DBAs and developers focus on ensuring that tables are correctly indexed to make sure that queries perform as fast as possible and while that is very important many often overlook the importance of statistics and how they can affect query optimization, cardinality and performance issues. In this tutorial we will try to answer the what, why, where, when of SQL Server Statistics.
All examples below are based on the AdventureWorks2017 database using Microsoft SQL Server 2019 (15.0.2070.41).
UPDATE STATISTICS in SQL Server
Statistics store information about the data distribution of the column value(s) in your tables as well as the total number of rows. For indexes it stores the distribution of key values. A histogram is created for the first table column in each statistics object which stores this data distribution or cardinality. For multi-column statistics objects information is also stored on the correlation of the values within the object. A density vector is also created for each column and this stores information about the number of duplicate values for each column.
How are SQL UPDATE STATISTICS used
Existing statistics are used by the query optimizer in order to generate the most efficient query plan for execution. The query optimizer uses the statistics to determine when an index should be used, how to access those indexes, how best to join tables, etc...
When are SQL Server Statistics Created
Statistics are created in a couple of different ways. The first way is when an index is created on a tables column(s). When this happens the Microsoft SQL Server database automatically creates a statistics object for the column(s) that make up the index. If we look at the HumanResources.Department table we can see there is one statistics object for each index on the table.
The next way is also done automatically and this happens if you have the "AUTO CREATE STATISTICS" database option set. This can be checked by running the following TSQL.
SELECT name,is_auto_create_stats_on FROM sys.databases WHERE name = 'AdventureWorks2017';
If it is not set it can be enabled by running the following TSQL but it should be enabled by default.
ALTER DATABASE AdventureWorks2017 SET AUTO_CREATE_STATISTICS ON;
When this database option is enabled, anytime a column is used as a query predicate the Microsoft SQL Server engine will automatically generate statistics for the referenced column. You can see from the screenshot below that there are no statistics other than the ones for each index that exists on the HumanResources.Department table. If we run the following query and then check again, we can see that SQL Server has created column statistics for the ModifiedDate column.
SELECT * FROM HumanResources.Department WHERE ModifiedDate > getdate()-3;
It is important to note that SQL Server will only ever create single column statistics and if you require anything other than single column statistics these have to be created manually using the CREATE STATISTICS command. This is the last way that statistics are created. Here is a simple example of how you can manually create statistics for the JobTitle column in the HumanResources.Department table.
CREATE STATISTICS STATS_Employee_JobTitle on HumanResources.Employee (JobTitle);
The same syntax is used when creating multi-column statistics and you simply have to list all the correlated columns you want in the column list as shown below.
CREATE STATISTICS STATS_SalesTerritory_Group_CountryRegionCode on Sales.SalesTerritory ([Group],CountryRegionCode);
You can also create filtered statistics if your data set has a subset of rows with a unique data distribution. This can be done by specifying a WHERE clause in your command as shown below.
CREATE STATISTICS STATS_Address_StateProvinceIDFiltered_City ON Person.Address ( City ) WHERE StateProvinceID = 79;
Options for manually created SQL Server Statistics
When you manually create statistics you also have the option of how much data should be sampled. Sampling can have a big impact on the amount of time it takes to generate statistics for very large tables. The obvious drawback of this though is that the less data you sample the less accurate your statistics become. For large tables though, it is sometimes the best option as a full scan would take too much time. Examples of the TSQL for both of these options is below as well as the two options available when sampling data. These options include specifying the number of rows or a percentage of the current table size to sample.
CREATE STATISTICS STATS_Employee_JobTitle_fullscan on HumanResources.Employee (JobTitle) WITH FULLSCAN; CREATE STATISTICS STATS_Employee_JobTitle_sample10rows on HumanResources.Employee (JobTitle) WITH SAMPLE 10 ROWS; CREATE STATISTICS STATS_Employee_JobTitle_sample50pct on HumanResources.Employee (JobTitle) WITH SAMPLE 50 PERCENT;
Where are SQL Server Statistics stored
Statistics in SQL Server are stored in binary large objects (BLOBs) and can be accessed using the following system catalog views and DMVs.
A much easier method for view object statistics, however, is to use the DBCC SHOW_STATISTICS command. Below are two examples for looking at statistics created first for an index and second for a column on the Sales.SalesOrderDetail table.
DBCC SHOW_STATISTICS ("Sales.SalesOrderHeader",IX_SalesOrderHeader_CustomerID) DBCC SHOW_STATISTICS ("Sales.SalesOrderHeader",OrderDate)
Explaining DBCC SHOW_STATISTICS output
Using the output from one of the examples from above we can see that the output is divided into 3 sections. The Statistics Header, Density Vector, and Histogram. Note: For this histogram I have only shown the first 5 buckets to save space. I won’t go into any more detail on each individual section as there is a very good tip which provides this information. How to Interpret DBCC SHOW_STATISTICS in SQL Server Management Studio.
When are SQL Server Statistics Updated
Just as there are two ways that statistics can be created, there are also a couple ways they can be updated, manually and automatically. To update statistics manually we can use the UPDATE STATISTICS command or the built-in stored procedure sp_updatestats. sp_updatestats will update the statistics for every user defined table in the database it is run against. The UPDATE STATISTICS command gives you the ability to use the same sampling options that were outlined above with the CREATE STATISTICS command. Here are a few examples of these commands that can be run in SQL Server Management Studio.
sp_updatestats UPDATE STATISTICS HumanResources.Employee STATS_Employee_JobTitle_fullscan; UPDATE STATISTICS HumanResources.Employee STATS_Employee_JobTitle_sample10rows WITH FULLSCAN; UPDATE STATISTICS Person.Address;
Note: When statistics are updated, queries in the plan cache referencing these objects will recompile the next time they are called so you do have to be careful not to update statistics too often as the performance impact of frequently recompiling queries can hurt your overall system performance.
The second way statistics can be updated is automatically when the "AUTO UPDATE STATISTICS" database option is set. If this option is set SQL Server will check how many rows have been modified in the table and if a certain threshold is exceeded it will update the statistics and create a new execution plan for the plan cache. The rules for this threshold can be found in the SQL Server documentation and are as follows:
By default, with this option on, the statistics are updated synchronously. That means that when SQL Server detects out of date statistics it will first update the statistics, generate a new execution plan, and then execute the query. In some cases, you don’t really want your query waiting for the new statistics to be generated as the time it would take for the statistics to be generated and the query to execute with the new plan would take longer than just running it with the old query plan. To avoid this behavior, you can set the "AUTO UPDATE STATISTICS ASYNCHRONOUSLY" database option by running the following command in your database.
ALTER DATABASE AdventureWorks SET AUTO_UPDATE_STATISTICS_ASYNC ON
With this option set your query will run with the currently cached query plan and the statistics will be updated in the background. Once the statistics update is complete the next query executed will generate and use a new query plan using the updated statistics.
It is also possible to update statistics with SQL Server Maintenance Plans.
Are SQL Server Statistics out of date
Determining whether statistics are out of date is a really difficult question to answer. There are many different factors that can affect table, index or column statistics and cause them to generate poor performing query plans. You could have a table with many inserts, updates or deletes but depending on which values were affected it might not have much if any impact on the statistics.
For example, let’s take a table with an index on an identity column with say 10,000 rows that has statistics generated using a full scan. Now we could delete 1000 rows from the table (every 5th value, 5,10,15,…) You can see how with this example even though we updated 20% of the table based on the values we have removed; the table statistics really wouldn’t be affected. The opposite could also be true. If we have a table with an identity column that has 1,000,000 rows and has had a statistics update done and then we insert 10,000 new rows we have only changed the table by 1%. If queries are only looking for new values (e.g. values not in the histogram) it could have a big impact on query plans and performance.
I guess to answer the question, how to tell if statistics are out of date, the best answer one can give is you need to understand how DML queries access your tables and base your decision on that information. With that said we can use the sys.dm_db_stats_properties DMV view to see when our statistics were last updated and how many modifications have been made since that last update. Below is the TSQL for this query as well as sample output.
SELECT name AS StatsName, last_updated, rows, modification_counter FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp OORDER BY modification_counter desc
SQL Server Update Statistics Script
As mentioned earlier in this tip you can use the sp_updatestats stored procedure to update the statistics for all the tables and indexes in your database but this brute force method really isn’t a good use of the resources on your system as there are many tables/indexes that do not have any DML executed against them and do not need their statistics updated. A better option is to use a query similar to the one we identified above when searching for outdated statistics and add some logic to only update the statistics when you deem it’s required. In the example below we add a cursor to find statistics that have either not been updated in a week (either automatically or by this procedure) or have had more than 10% of the table modified.
DECLARE @cmd NVARCHAR(2000) DECLARE stats_cursor CURSOR FOR SELECT 'UPDATE STATISTICS ' + OBJECT_SCHEMA_NAME(stat.object_id) + '.' + OBJECT_NAME(stat.object_id) + ' ' + name + ';' AS command FROM sys.stats AS stat CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp WHERE last_updated < GetDate()-7 OR modification_counter > rows*.10 OPEN stats_cursor FETCH NEXT FROM stats_cursor INTO @cmd WHILE @@FETCH_STATUS = 0 BEGIN EXEC sp_executesql @cmd FETCH NEXT FROM stats_cursor INTO @cmd END CLOSE stats_cursor DEALLOCATE stats_cursor
Using either of the above methods, combined with sp_msforeachdb to run these commands on all the databases in your instance,, you can create a SQL Server job that can be used to build yourself a scheduled statistics maintenance task to ensure statistics are always up to date.
- Read more on SQL Server Query Plan Tips
- Read more tips on statistics
About the author
View all my tips
Article Last Updated: 2021-02-03