Identify SQL Server Database Growth Rates

By: Marios Philippopoulos | Comments (2) | Related: More > Database Administration

Problem

Unchecked SQL Server database growth can hurt the bottom line of organizations in the form of poor application performance and increased infrastructure costs. Yet data-growth monitoring is often an afterthought in the day-to-day operations for SQL Server DBAs. To make matters worse, inclusion of data-archiving/pruning mechanisms is frequently not a requirement in the design and implementation phases of new application development. Consequently, live application databases often grow unnoticed to the point when unplanned measures have to be taken to prevent disruptions of service. Is there an easy way to proactively obtain a recent list of the fastest-growing databases in a SQL Server instance using information captured by default in the database engine?

Solution

T-SQL window functions were first introduced in SQL Server 2005, and have been enhanced in newer versions of the product. As their name suggests, these functions allow analysis over "windows" or ranges of data, something that is otherwise possible only through complex, and often poorly performing, subqueries or common-table expressions (CTEs). In addition to improved query performance and simplified code, window functions make it easy to extract data aggregations along with individual data points in the same result set, a point discussed by Fabiano Amorim here.

In this tip I show how to harness the power of T-SQL window functions to calculate the average growth rates of full database-backup sizes from system table msdb..backupset. Erin Stellato has previously demonstrated how to use backupset to trend the growth of individual databases over time. I am building on this approach by calculating differences between adjacent time points (backup size difference between today and yesterday, for example) and then averaging these differences for each database to come up with a list of the fastest-growing databases in a SQL Server instance. For this I am using the LAG window function, added in SQL Server 2012, so the solution presented here applies to SQL Server 2012 and higher.

Implementation

First I need to use the LAG function to calculate differences in backup sizes between adjacent time points for each database in the SQL Server instance. LAG provides access to a previous record in the same result set as the current record and preceding the current record by a specified offset. The code snippet looks like this:

-----------
--Script 1:
-----------
...
,	LAG( s.[backup_size] ) 
		OVER ( PARTITION BY s.[database_name] ORDER BY s.[backup_start_date] ) AS [Previous Backup Size]
FROM
	[msdb]..[backupset] s
...

In the above script I apply the LAG function on the [backup_size] column of the [backupset] table with the default offset of 1 to get the immediately previous row in the result set by [backup_start_date]. The "OVER ( PARTITION BY ... ORDER BY ... )" construct ensures meaningful comparisons by restricting each result set to one database and ordering these records by backup start date to achieve chronological sequence. I can similarly partition by database name to obtain the COUNT of records (sample size) per database in the data set. Here is the full query returning current and previous row (in megabytes), as well as sample size, for all databases:

-----------
--Script 2:
-----------
SELECT
	s.[database_name]
,	s.[backup_start_date]
,	COUNT(*) OVER ( PARTITION BY s.[database_name] ) AS [Sample Size]
,	CAST( ( s.[backup_size] / 1024 / 1024 ) AS INT ) AS [Backup Size (MB)]
,	CAST( ( LAG( s.[backup_size] ) 
		OVER ( PARTITION BY s.[database_name] ORDER BY s.[backup_start_date] ) / 1024 / 1024 ) AS INT ) AS [Previous Backup Size (MB)]
FROM
	[msdb]..[backupset] s
WHERE
	s.[type] = 'D' --full backup
ORDER BY
	s.[database_name]
, 	s.[backup_start_date];
GO

In my SQL Server instance a subset of the results looks as follows:

database_name	backup_start_date	Sample Size	Backup Size (MB)	Previous Backup Size (MB)
...	...	...	...	...
Analytics	2015-06-05 22:39:21.000	30	639002	NULL
Analytics	2015-06-06 22:43:16.000	30	639505	639002
Analytics	2015-06-07 22:45:48.000	30	637662	639505
...	...	...	...	...
Operations	2015-06-06 02:08:09.000	30	9892	NULL
Operations	2015-06-07 04:36:35.000	30	9895	9892
Operations	2015-06-08 01:45:40.000	30	9916	9895
...	...	...	...	...
Sales	2015-06-05 22:00:01.000	30	102695	NULL
Sales	2015-06-06 22:00:00.000	30	102813	102695
Sales	2015-06-07 22:00:02.000	30	102532	102813
...	...	...	...	...

NULL values for the [Previous Backup Size (MB)] column reflect the fact that those are the very first records in sequence for that database, so there are no "previous" values. The Sample-Size value of 30 means that there are 30 records of successful backups for that database in backupset.

Now that I have this intermediate set of data, I am ready to return the average, min and max differences between "current" and "previous" backup size. To restrict each result to data from a single database, I again use the OVER ( PARTITION BY ... ) construct, partitioning by database name.

Here is the final script:

-----------
--Script 3:
-----------
SELECT DISTINCT
	A.[database_name]
,	AVG( A.[Backup Size (MB)] - A.[Previous Backup Size (MB)] ) OVER ( PARTITION BY A.[database_name] ) AS [Avg Size Diff From Previous (MB)]
,	MAX( A.[Backup Size (MB)] - A.[Previous Backup Size (MB)] ) OVER ( PARTITION BY A.[database_name] ) AS [Max Size Diff From Previous (MB)]
,	MIN( A.[Backup Size (MB)] - A.[Previous Backup Size (MB)] ) OVER ( PARTITION BY A.[database_name] ) AS [Min Size Diff From Previous (MB)]
,	A.[Sample Size]
FROM 
(
	SELECT
		s.[database_name]
	--,	s.[backup_start_date]
	,	COUNT(*) OVER ( PARTITION BY s.[database_name] ) AS [Sample Size]
	,	CAST ( ( s.[backup_size] / 1024 / 1024 ) AS INT ) AS [Backup Size (MB)]
	,	CAST ( ( LAG(s.[backup_size] ) 
			OVER ( PARTITION BY s.[database_name] ORDER BY s.[backup_start_date] ) / 1024 / 1024 ) AS INT ) AS [Previous Backup Size (MB)]
	FROM 
		[msdb]..[backupset] s
	WHERE
		s.[type] = 'D' --full backup
	--ORDER BY
	--	s.[database_name]
	--,	s.[backup_start_date]
) AS A
ORDER BY
	[Avg Size Diff From Previous (MB)] DESC;
GO

Results:

database_name	Avg Size Diff From Previous (MB)	Max Size Diff From Previous (MB)	Min Size Diff From Previous (MB)	Sample Size
Analytics	561	6016	-5050	30
Sales	49	955	-371	30
Operations	2	13	-2	30
...	...	...	...	...

Over the retention time span of data in the backupset table (30 records of daily backups in this example), the Analytics database exhibits the highest average daily increase (growth rate) in backup size at 561 MB. Analytics is therefore the fastest-growing database in the SQL Server instance over the past 30 days. The Sales and Operations databases come second and third, respectively. If performance issues with the Analytics database had started creeping up recently, its large growth rate would be one of the clues to pursue further. I could now begin to dig deeper into specific tables within Analytics driving this growth, and would work with application developers to come up with archiving/pruning strategies to mitigate it in the long term.

Conclusion

Proactively addressing issues with database growth not only helps rein in infrastructure costs, but can be crucial for optimal application performance. In this tip I have shown how to easily monitor growth by combining information from the msdb..backupset table with the LAG window function to come up with a snapshot of the fastest-growing databases in a SQL Server (2012 and higher) instance. Depending on the retention interval of backupset, one may wish to poll these results at scheduled intervals and persist them in a permanent table for analysis over longer periods of time.

Next Steps

Apply the solution outlined in this tip to your environment to obtain a record of the fastest-growing databases in your SQL Server instances.
Review these related links to learn more about backupset and T-SQL window functions:

About the author

Marios Philippopoulos has been a SQL Server DBA for over 10 years. He is based in the Toronto area, Canada.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips