Importance of Update Statistics in SQL Server


By:   |   Updated: 2021-02-03   |   Comments   |   Related: More > Indexing


Problem

SQL Server query performance is something all database professionals are looking to improve. Many SQL Server DBAs and developers focus on ensuring that tables are correctly indexed to make sure that queries perform as fast as possible and while that is very important many often overlook the importance of statistics and how they can affect query optimization, cardinality and performance issues. In this tutorial we will try to answer the what, why, where, when of SQL Server Statistics.

Solution

All examples below are based on the AdventureWorks2017 database using Microsoft SQL Server 2019 (15.0.2070.41).

UPDATE STATISTICS in SQL Server

Statistics store information about the data distribution of the column value(s) in your tables as well as the total number of rows. For indexes it stores the distribution of key values. A histogram is created for the first table column in each statistics object which stores this data distribution or cardinality. For multi-column statistics objects information is also stored on the correlation of the values within the object. A density vector is also created for each column and this stores information about the number of duplicate values for each column.

How are SQL UPDATE STATISTICS used

Existing statistics are used by the query optimizer in order to generate the most efficient query plan for execution. The query optimizer uses the statistics to determine when an index should be used, how to access those indexes, how best to join tables, etc...

When are SQL Server Statistics Created

Statistics are created in a couple of different ways. The first way is when an index is created on a tables column(s). When this happens the Microsoft SQL Server database automatically creates a statistics object for the column(s) that make up the index. If we look at the HumanResources.Department table we can see there is one statistics object for each index on the table.

database objects

The next way is also done automatically and this happens if you have the "AUTO CREATE STATISTICS" database option set. This can be checked by running the following TSQL.

SELECT name,is_auto_create_stats_on FROM sys.databases WHERE name = 'AdventureWorks2017';  
query results

If it is not set it can be enabled by running the following TSQL but it should be enabled by default.

ALTER DATABASE AdventureWorks2017 SET AUTO_CREATE_STATISTICS ON;

When this database option is enabled, anytime a column is used as a query predicate the Microsoft SQL Server engine will automatically generate statistics for the referenced column. You can see from the screenshot below that there are no statistics other than the ones for each index that exists on the HumanResources.Department table. If we run the following query and then check again, we can see that SQL Server has created column statistics for the ModifiedDate column.

SELECT * FROM HumanResources.Department WHERE ModifiedDate > getdate()-3;
sql statistics

It is important to note that SQL Server will only ever create single column statistics and if you require anything other than single column statistics these have to be created manually using the CREATE STATISTICS command. This is the last way that statistics are created. Here is a simple example of how you can manually create statistics for the JobTitle column in the HumanResources.Department table.

CREATE STATISTICS STATS_Employee_JobTitle on HumanResources.Employee (JobTitle);
table statistics

The same syntax is used when creating multi-column statistics and you simply have to list all the correlated columns you want in the column list as shown below.

CREATE STATISTICS STATS_SalesTerritory_Group_CountryRegionCode on 
     Sales.SalesTerritory ([Group],CountryRegionCode);
table statistics

You can also create filtered statistics if your data set has a subset of rows with a unique data distribution. This can be done by specifying a WHERE clause in your command as shown below.

CREATE STATISTICS STATS_Address_StateProvinceIDFiltered_City  
   ON Person.Address ( City ) WHERE StateProvinceID = 79;	
table statistics

Options for manually created SQL Server Statistics

When you manually create statistics you also have the option of how much data should be sampled. Sampling can have a big impact on the amount of time it takes to generate statistics for very large tables. The obvious drawback of this though is that the less data you sample the less accurate your statistics become. For large tables though, it is sometimes the best option as a full scan would take too much time. Examples of the TSQL for both of these options is below as well as the two options available when sampling data. These options include specifying the number of rows or a percentage of the current table size to sample.

CREATE STATISTICS STATS_Employee_JobTitle_fullscan 
  on HumanResources.Employee (JobTitle) WITH FULLSCAN; 
CREATE STATISTICS STATS_Employee_JobTitle_sample10rows 
  on HumanResources.Employee (JobTitle) WITH SAMPLE 10 ROWS; 
CREATE STATISTICS STATS_Employee_JobTitle_sample50pct 
  on HumanResources.Employee (JobTitle) WITH SAMPLE 50 PERCENT;

Where are SQL Server Statistics stored

Statistics in SQL Server are stored in binary large objects (BLOBs) and can be accessed using the following system catalog views and DMVs.

  • sys.stats
  • sys.stats_columns
  • sys.dm_db_stats_properties
  • sys.dm_db_stats_histogram

A much easier method for view object statistics, however, is to use the DBCC SHOW_STATISTICS command. Below are two examples for looking at statistics created first for an index and second for a column on the Sales.SalesOrderDetail table.

DBCC SHOW_STATISTICS ("Sales.SalesOrderHeader",IX_SalesOrderHeader_CustomerID) 
DBCC SHOW_STATISTICS ("Sales.SalesOrderHeader",OrderDate)

Explaining DBCC SHOW_STATISTICS output

Using the output from one of the examples from above we can see that the output is divided into 3 sections. The Statistics Header, Density Vector, and Histogram. Note: For this histogram I have only shown the first 5 buckets to save space. I won’t go into any more detail on each individual section as there is a very good tip which provides this information. How to Interpret DBCC SHOW_STATISTICS in SQL Server Management Studio.

dbcc show statistics

When are SQL Server Statistics Updated

Just as there are two ways that statistics can be created, there are also a couple ways they can be updated, manually and automatically. To update statistics manually we can use the UPDATE STATISTICS command or the built-in stored procedure sp_updatestats. sp_updatestats will update the statistics for every user defined table in the database it is run against. The UPDATE STATISTICS command gives you the ability to use the same sampling options that were outlined above with the CREATE STATISTICS command. Here are a few examples of these commands that can be run in SQL Server Management Studio.

sp_updatestats
UPDATE STATISTICS HumanResources.Employee STATS_Employee_JobTitle_fullscan;  
UPDATE STATISTICS HumanResources.Employee STATS_Employee_JobTitle_sample10rows WITH FULLSCAN;  
UPDATE STATISTICS Person.Address;

Note: When statistics are updated, queries in the plan cache referencing these objects will recompile the next time they are called so you do have to be careful not to update statistics too often as the performance impact of frequently recompiling queries can hurt your overall system performance.

The second way statistics can be updated is automatically when the "AUTO UPDATE STATISTICS" database option is set. If this option is set SQL Server will check how many rows have been modified in the table and if a certain threshold is exceeded it will update the statistics and create a new execution plan for the plan cache. The rules for this threshold can be found in the SQL Server documentation and are as follows:

sql server statistics infomation

By default, with this option on, the statistics are updated synchronously. That means that when SQL Server detects out of date statistics it will first update the statistics, generate a new execution plan, and then execute the query. In some cases, you don’t really want your query waiting for the new statistics to be generated as the time it would take for the statistics to be generated and the query to execute with the new plan would take longer than just running it with the old query plan. To avoid this behavior, you can set the "AUTO UPDATE STATISTICS ASYNCHRONOUSLY" database option by running the following command in your database.

ALTER DATABASE AdventureWorks SET AUTO_UPDATE_STATISTICS_ASYNC ON

With this option set your query will run with the currently cached query plan and the statistics will be updated in the background. Once the statistics update is complete the next query executed will generate and use a new query plan using the updated statistics.

It is also possible to update statistics with SQL Server Maintenance Plans.

Are SQL Server Statistics out of date

Determining whether statistics are out of date is a really difficult question to answer. There are many different factors that can affect table, index or column statistics and cause them to generate poor performing query plans. You could have a table with many inserts, updates or deletes but depending on which values were affected it might not have much if any impact on the statistics.

For example, let’s take a table with an index on an identity column with say 10,000 rows that has statistics generated using a full scan. Now we could delete 1000 rows from the table (every 5th value, 5,10,15,…) You can see how with this example even though we updated 20% of the table based on the values we have removed; the table statistics really wouldn’t be affected. The opposite could also be true. If we have a table with an identity column that has 1,000,000 rows and has had a statistics update done and then we insert 10,000 new rows we have only changed the table by 1%. If queries are only looking for new values (e.g. values not in the histogram) it could have a big impact on query plans and performance.

I guess to answer the question, how to tell if statistics are out of date, the best answer one can give is you need to understand how DML queries access your tables and base your decision on that information. With that said we can use the sys.dm_db_stats_properties DMV view to see when our statistics were last updated and how many modifications have been made since that last update. Below is the TSQL for this query as well as sample output.

SELECT name AS StatsName, last_updated, rows, modification_counter   
FROM sys.stats AS stat   
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp  
OORDER BY modification_counter desc 
query results

SQL Server Update Statistics Script

As mentioned earlier in this tip you can use the sp_updatestats stored procedure to update the statistics for all the tables and indexes in your database but this brute force method really isn’t a good use of the resources on your system as there are many tables/indexes that do not have any DML executed against them and do not need their statistics updated. A better option is to use a query similar to the one we identified above when searching for outdated statistics and add some logic to only update the statistics when you deem it’s required. In the example below we add a cursor to find statistics that have either not been updated in a week (either automatically or by this procedure) or have had more than 10% of the table modified.

DECLARE @cmd NVARCHAR(2000)
DECLARE stats_cursor CURSOR FOR 
SELECT 'UPDATE STATISTICS ' + OBJECT_SCHEMA_NAME(stat.object_id) + '.' + OBJECT_NAME(stat.object_id) + ' ' + name + ';' AS command
FROM sys.stats AS stat    
CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp   
WHERE last_updated < GetDate()-7 OR modification_counter > rows*.10
 
OPEN stats_cursor  
FETCH NEXT FROM stats_cursor INTO @cmd
 
WHILE @@FETCH_STATUS = 0  
BEGIN  
  EXEC sp_executesql @cmd
  FETCH NEXT FROM stats_cursor INTO @cmd 
END  
CLOSE stats_cursor  
DEALLOCATE stats_cursor

Using either of the above methods, combined with sp_msforeachdb to run these commands on all the databases in your instance,, you can create a SQL Server job that can be used to build yourself a scheduled statistics maintenance task to ensure statistics are always up to date.

Next Steps


Last Updated: 2021-02-03




About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

View all my tips
Related Resources



Comments For This Article





download





Recommended Reading

How to get index usage information in SQL Server

Building SQL Server Indexes in Ascending vs Descending Order

Script out all SQL Server Indexes in a Database using T-SQL

Difference between SQL Server Unique Indexes and Unique Constraints

Creating Indexes with SQL Server Management Studio














get free sql tips
agree to terms