Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Performance Advantages of SQL Server Filtered Statistics


By:   |   Read Comments (3)   |   Related Tips: More > Performance Tuning

New Whitepaper > 10 Ways We Can Steal Your Data >> download now


Problem

When running queries SQL Server may not always use the correct statistics to determine the best optimization for a query plan. This could be a performance hit for complex queries and have correct statistics is key to the optimizer. In this tip we look at how filtered statistics can improve query optimization.

Solution

One of the new features that was introduced with SQL 2008 is called Filtered Statistics. This is the definition from BOL "Filtered statistics can improve query performance for queries that select from well-defined subsets of data. Filtered statistics use a filter predicate in the WHERE clause to select the subset of data that is included in the statistics."

Let us show how we can use filtered statistics with an example. The below code creates a table and inserts some data into the table. The code for this tip was adopted with permission from an article by Jack Li.

CREATE TABLE MyRegionTable(id INT, Location NVARCHAR(100), USState CHAR(2))
GO
CREATE TABLE MySalesTable(id INT, detail INT, quantity INT)
GO
CREATE CLUSTERED INDEX IDX_d1 ON MyRegionTable(id)
GO
CREATE INDEX IDX_MyRegionTable_name ON MyRegionTable(Location)
GO
CREATE STATISTICS IDX_MyRegionTable_id_name ON MyRegionTable(id, Location)
GO
CREATE CLUSTERED INDEX IDX_MySalesTable_id_detail ON MySalesTable(id, detail)
GO
INSERT MyRegionTable VALUES(0, 'Atlanta', 'GA')
INSERT MyRegionTable VALUES(1, 'San Francisco', 'CA')
GO
SET NOCOUNT ON
-- MySalesTable will contain 1 row for Atlanta and 1000 rows for San Francisco
INSERT MySalesTable VALUES(0, 0, 50)
DECLARE @i INT
SET
@i = 1
WHILE @i <= 1000 BEGIN
INSERT
MySalesTable VALUES (1, @i, @i*3)
SET @i = @i + 1
END
GO
UPDATE STATISTICS MyRegionTable WITH fullscan
UPDATE STATISTICS MySalesTable WITH fullscan
GO

Before we run the following queries turn on "Include Actual Execution Plan" and run the following query for Atlanta.

-- this query will over estimate
-- its estimate will be 500.5 rows, but should be 1 row
SELECT detail FROM MyRegionTable JOIN MySalesTable ON MyRegionTable.id = MySalesTable.id
WHERE
Location='Atlanta' OPTION (recompile)

Even if we have only 1 row containing the Atlanta location, the optimizer over estimates and uses a value of 500.5 as shown below.

using the filtered statistics feature in sql server 2008

Now, let's run the following query for San Francisco.

-- this query will under estimate
-- its estimate is also 500.5 rows, but should be 1000 rows
SELECT detail FROM MyRegionTable JOIN MySalesTable ON MyRegionTable.id = MySalesTable.id
WHERE Location='San Francisco' OPTION (recompile)

In this case, the optimizer, under estimates the number of rows. It is interesting to note that we only have two values for the locations and 1001 rows in total and in both queries the optimizer estimate is 500.5 rows.

turn on include actual execution plan

Let's create our filtered statistics using the below code.

CREATE STATISTICS MyRegionTable_stats_id ON MyRegionTable (id)
WHERE Location = 'Atlanta'
GO
CREATE STATISTICS MyRegionTable_stats_id2 ON MyRegionTable (id)
WHERE Location = 'San Francisco'
GO

The image below shows that the filtered statistics have been created.


filtered statistics have been created

Let's run the same queries one more time.

SELECT detail FROM MyRegionTable JOIN MySalesTable ON MyRegionTable.id = MySalesTable.id
WHERE Location='Atlanta' OPTION (recompile)

We can now see that the estimated number of rows returned is now correct.

now the estimated number of rows is correct

Let's run the same query again for San Franciso.

SELECT detail FROM MyRegionTable JOIN MySalesTable ON MyRegionTable.id = MySalesTable.id
WHERE Location='San Francisco' OPTION (recompile)

And it is also correct for this query as well.

is also correct in this query for sql server 2008

This example shows how filtered statistics can help when the query plan is generated. By having the correct values SQL Server can make better optimization choices and therefore improve query performance.

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Tuesday, August 28, 2012 - 2:59:15 AM - Uday Gupta Back To Top

i have a table with 3.5 CR records with one auto/identity field. when I run

select max(fofeedid),count(*) from fofeed

Then it take around 5-10 mints in between.

please suggest me how to improve its performance. and also suggest buld insert of daily 30K records.

Thanks in advances.


Friday, June 24, 2011 - 1:45:19 PM - Bob Evans Back To Top

Are you able to elaborate on the performance impact on an OLTP based site?Reading the information on: http://www.sqlserver-dba.com/2011/05/auto-update-statistics-and-update-statistics-in-sql-server.html  , I can see the downside of relying on AutoUpdateStatistics , but as Filteres statistics tries to minimise this problem .


Saturday, April 30, 2011 - 3:10:50 PM - Kevin Back To Top

Very Good. Thanks


Learn more about SQL Server tools