Minimize SQL Server plan cache bloat

By:   |   Comments (7)   |   Related: > Query Plans


Problem

In some shops developers are quite inconsistent in how they write their T-SQL. Formatting styles differ, from whitespace and indenting, to upper- or lower-case keywords, to the use of schema and properly cased object and column names. When dealing with ad hoc SQL, since entries in the plan cache are case- and whitespace-sensitive, these inconsistencies can be quite problematic, leading to inefficient use of the plan cache (sometimes referred to as "plan cache bloat").  Check out this tip to minimize the SQL Server plan cache bloat.

Solution

There are a few things that you can do to minimize plan cache bloat. But first let's show a few examples where slight differences in queries can lead to multiple versions of the plan for what is essentially the exact same query. First let's set up a database with a couple of user objects, and a login with a default schema other than dbo:

USE [master];
GO
CREATE LOGIN login_a WITH password = 'a', CHECK_POLICY = OFF;
GO
CREATE DATABASE CacheTest;
GO
USE CacheTest;
GO
CREATE SCHEMA schema_a AUTHORIZATION dbo;
GO
CREATE USER user_a FROM LOGIN login_a WITH DEFAULT_SCHEMA = schema_a;
GO
GRANT SELECT ON SCHEMA::dbo TO user_a;
GO
SELECT * INTO dbo.columns FROM sys.all_columns;
ALTER TABLE dbo.columns ADD CONSTRAINT PK_c PRIMARY KEY([object_id], [column_id]);
SELECT * INTO dbo.objects FROM sys.all_objects;
ALTER TABLE dbo.objects ADD CONSTRAINT PK_o PRIMARY KEY([object_id]);
GO

Now, let's run a few queries against our user objects. We'll change up the spacing and case, and make sure to leave the schema prefix off for one query. Note that I am using comments and batch separators to clarify how these queries are different and to ensure that they are considered separately by the engine:

-- standard join query with 2-space indents:
GO
SELECT o.name
  FROM dbo.columnlist AS c
  INNER JOIN dbo.objectlist AS o
  ON c.[object_id] = o.[object_id];
GO
-- this one is identical but leaves out schema prefix:
GO
SELECT o.name
  FROM columnlist AS c
  INNER JOIN objectlist AS o
  ON c.[object_id] = o.[object_id];
GO
-- this one uses tabs instead of two-space indents:
GO
SELECT o.name
 FROM dbo.columnlist AS c
 INNER JOIN dbo.objectlist AS o
 ON c.[object_id] = o.[object_id];
GO
-- this one uses lower-case for all keywords:
GO
select o.name
  from dbo.columnlist as c
  inner join dbo.objectlist as o
  on c.[object_id] = o.[object_id];
GO
-- and this one uses different case for the object names:
GO
SELECT o.name
  FROM dbo.ColumnList AS c
  INNER JOIN dbo.ObjectList AS o
  ON c.[object_id] = o.[object_id];
GO

Now, if we check the plans that have been collected using sys.dm_exec_cached_plans:

SELECT 
    p.plan_handle,
    p.cacheobjtype,
    bucket = p.bucketid,
    used = p.usecounts,
    b = size_in_bytes,
    t.[text] 
  FROM sys.dm_exec_cached_plans AS p 
  CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
  WHERE t.[dbid] = DB_ID('CacheTest')
  AND t.[text] NOT LIKE '%cached_plans%' -- to prevent this query from showing up
  AND t.[text] LIKE '%list%';

We can see that there are 5 different entries, each stored in their own bucket, and each taking up just under 50kb in the plan cache:

SQL Server sys.dm_exec_cached_plans results

Further to this, if we connect to the database using the login login_a, and run the query without the explicit schema references:

-- run this as login_a, whose user_a has a different default schema:
GO
SELECT o.name
  FROM columnlist AS c
  INNER JOIN objectlist AS o
  ON c.[object_id] = o.[object_id];
GO

Then run our query against dm_exec_cached_plans again, we have another plan that has been cached:

SQL 2008 sys.dm_exec_cached_plans results

On a busy system, this can mean that some of these plans might get pushed out of the cache, and have to be recompiled - even though a perfectly valid plan already exists. For plans that end up only being used once, you can offset this significantly by using the Optimize for Ad Hoc Workloads setting. In this case the plan won't be fully cached until the exact same batch is executed a second time. So testing that option and clearing the buffers/cache:

EXEC sp_configure 'show advanced options', 1;
RECONFIGURE WITH OVERRIDE;
EXEC sp_configure 'optimize for ad hoc workloads', 1;
RECONFIGURE WITH OVERRIDE;
EXEC sp_configure 'show advanced options', 0;
RECONFIGURE WITH OVERRIDE;
DBCC FREEPROCCACHE;
DBCC DROPCLEANBUFFERS;

If you run the first set of five queries again, the output of the query against sys.dm_exec_cached_plans will be slightly different:

SQL Server 2008 sys.dm_exec_cached_plans results

Notice that we no longer have a Compiled Plan but rather a Compiled Plan Stub, and that the size it takes to represent this stub in the plan cache is much smaller. If you run the queries again, the results will be similar to above. But you will still see these different queries yielding different plans.

On the plus side, SQL Server 2008 added the use of a query plan hash. This makes it possible to identify when you have the same query that is being represented by different plans, and should show an opportunity to consolidate those plans by re-writing them at their source (or eliminating almost duplicate queries by creating a view or stored procedure). You can identify different plans with the same query plan hash using the following query:

;WITH h AS
(
  SELECT query_plan_hash 
    FROM sys.dm_exec_query_stats
    GROUP BY query_plan_hash
    HAVING COUNT(*) > 1
)
SELECT t.[text],p.plan_handle
  FROM h INNER JOIN sys.dm_exec_query_stats AS s
  ON s.query_plan_hash = h.query_plan_hash
  INNER JOIN sys.dm_exec_cached_plans AS p
  ON s.plan_handle = p.plan_handle
  CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
  WHERE t.[dbid] = DB_ID('your_db_name');
  

Conclusion

If you have different people contributing T-SQL queries to your codebase, encourage coding practices that will minimize plan cache usage, such as:

  • use consistent indenting, carriage returns and spacing
  • use consistent case for keywords
  • use the correct case for object and column names (let IntelliSense do this work for you)
  • always include the schema when referencing objects
  • centralize code in stored procedures or views to avoid having different versions of the same query
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Thursday, November 1, 2012 - 9:53:31 AM - Emily Back To Top (20172)

Whoops- I skipped over the "Problem" and just read the "Solution"! My bad.. nevermind!


Thursday, November 1, 2012 - 9:50:39 AM - Emily Back To Top (20171)

How does consistent case/indenting/spacing etc help minimize plan cache usage? Are these kinds of differences cached separately even  if the query text is the same? Never thought about that.. good to know!


Tuesday, April 24, 2012 - 6:01:48 PM - Ray Back To Top (17085)

Well, I guess you'd have to ask the developers to know for sure.  My last gig was in support of a development team of about 20 C++ programmers.  In three years I made very little progress trying to get them to avovid the default C++ wizard for constructing their queries.  Obviously it must be good because it came from MS.

I have seen similar issues with applications built using VB.Net/C#.Net and the various data adapter options.

Don't you think it is interesting that MS recommends stored procedures as a Best Practice and then

1. Provides development tools that make using stored procedures at least more difficult

2. Provides almost no on-line tutorial, documantation, user examples of data adapters using stored procedures.


Tuesday, April 24, 2012 - 1:29:57 PM - Aaron Bertrand Back To Top (17080)

Sorry, my comment above seems to be wrapped in a <pre>tag</pre>. Repeating here:

 

You'll still see all the individual plans here but for each row you'll see how many different versions of that plan are creating the same query_plan_hash. This can help identify the *types* of queries that are leading to your plan cache bloat, but I'm afraid with the information I have, I can only repeat my earlier suggestion to try out the 'optimize for ad hoc workloads' setting.


Tuesday, April 24, 2012 - 1:28:52 PM - Aaron Bertrand Back To Top (17079)

@Ray, I would say 'optimize for ad hoc workloads' is about all you can do here, if you find that you have many single-use plans in the cache (and that sounds likely). I have heard of folks trying to implement some kind of interpreter that would intercept vendor queries and "improve" them, but I don't think they got very far. It doesn't sound like you'd be able to control compilation characteristics through the use of stored procedures, which would be nice of course... rhetorical question I know, but why aren't your vendors using stored procedures?

@eric81 it will be difficult to group by plan, since they are likely going to be different, but if there are many queries that are similar, the last query I wrote (where it is grouped by query_plan_hash) could be used to identify the most frequently used plans:

;WITH h AS
(
  SELECT query_plan_hash, c = COUNT(*)
    FROM sys.dm_exec_query_stats
    GROUP BY query_plan_hash
    HAVING COUNT(*) > 1
)
SELECT t.[text],p.plan_handle,h.c
  FROM h INNER JOIN sys.dm_exec_query_stats AS s
  ON s.query_plan_hash = h.query_plan_hash
  INNER JOIN sys.dm_exec_cached_plans AS p
  ON s.plan_handle = p.plan_handle
  CROSS APPLY sys.dm_exec_sql_text(p.plan_handle) AS t
  WHERE t.[dbid] = DB_ID('your_db_name')
ORDER BY h.c DESC;

You'll still see all the individual plans here but for each row you'll see how many different versions of that plan are creating the same query_plan_hash. This can help identify the *types* of queries that are leading to your plan cache bloat, but I'm afraid with the information I have, I can only repeat my earlier suggestion to try out the 'optimize for ad hoc workloads' setting.

Tuesday, April 24, 2012 - 12:04:22 PM - eric81 Back To Top (17076)

I have a developer server with over 30 applications so the plan cache is huge.   I guess is there a way I can group queries by plan ?


Tuesday, April 24, 2012 - 11:46:52 AM - Ray Back To Top (17075)

This is great information with very clear examples and offers a nice approach to finding and correcting the issue for your own code.  My problem is that I have 3 separate vendor applications and for all practical purposes the applications submit lots of almost the same queries.  The column list might be in a different order or have a slightly different set of columns.  Most of the queries look like they were generated by some programming tool data access wizard.

Some of the applications are no longer supported.  For others the vendors are more interested in adding new features than they are in making the code "correct".

So how do I control plan cache bloat in that environment?

 















get free sql tips
agree to terms