SQL Server Monitoring Checklist
Every good SQL Server DBA goes through some list of checks when they get to the office in the morning to make sure all of their systems are running smoothly. This tip will highlight some of the more important items that every DBA should be monitoring either manually or by using some form of scheduled scripting.
Note: All of the following queries have been tested on SQL Server 2008 and may not work correctly with previous versions of SQL Server
Item 1 - Are all of your SQL Server services running?
Obviously once you connect to your instance you know that the database service is up and running but you can use the extended stored procedure xp_servicecontrol to check if any service is up and running. Below are a few examples of what you can check.
exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'MSSQLServer' exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'SQLServerAgent' exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'SQLBrowser'
Item 2 - Did all of your SQL Agent Jobs run successfully?
This item can be checked with a fairly straightforward query of the msdb database. The first part of the query checks for any failed job steps and the second part is only concerned with the overall job status. This is also checked because a step could be set to continue even on failure, but should probably still be looked at in the morning. Also, if you are using the SQL Server Agent to backup your databases then this is also a good way to check if any backup jobs failed.
use msdb go select 'FAILED' as Status, cast(sj.name as varchar(100)) as "Job Name", cast(sjs.step_id as varchar(5)) as "Step ID", cast(sjs.step_name as varchar(30)) as "Step Name", cast(REPLACE(CONVERT(varchar,convert(datetime,convert(varchar,sjh.run_date)),102),'.','-')+' '+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),1,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),3,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),5,2) as varchar(30)) 'Start Date Time', sjh.message as "Message" from sysjobs sj join sysjobsteps sjs on sj.job_id = sjs.job_id join sysjobhistory sjh on sj.job_id = sjh.job_id and sjs.step_id = sjh.step_id where sjh.run_status <> 1 and cast(sjh.run_date as float)*1000000+sjh.run_time > cast(convert(varchar(8), getdate()-1, 112) as float)*1000000+70000 --yesterday at 7am union select 'FAILED',cast(sj.name as varchar(100)) as "Job Name", 'MAIN' as "Step ID", 'MAIN' as "Step Name", cast(REPLACE(CONVERT(varchar,convert(datetime,convert(varchar,sjh.run_date)),102),'.','-')+' '+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),1,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),3,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),5,2) as varchar(30)) 'Start Date Time', sjh.message as "Message" from sysjobs sj join sysjobhistory sjh on sj.job_id = sjh.job_id where sjh.run_status <> 1 and sjh.step_id=0 and cast(sjh.run_date as float)*1000000+sjh.run_time > cast(convert(varchar(8), getdate()-1, 112) as float)*1000000+70000 --yesterday at 7am
- SQL Server System Databases
- Failed SQL Server Agent Jobs
- Do you know if your SQL Server database backups are successful
- SQL Server Agent Tips
Item 3 - Do you have a recent backup of all your SQL Server databases?
The two queries below will list any database that either does not have any backup or has not been backed up in the last 24 hours. The first query checks your full backups and the second query checks your transaction log backups (only for those databases in full recovery mode).
SELECT d.name AS "Database", ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Full Backup" FROM sys.databases d LEFT JOIN (SELECT database_name,type,MAX(backup_finish_date) backupdate FROM backupset WHERE type LIKE 'D' GROUP BY database_name,type) b on d.name=b.database_name WHERE (backupdate IS NULL OR backupdate < getdate()-1) SELECT d.name AS "Database", ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Log Backup" FROM sys.databases d LEFT JOIN (SELECT database_name,type,MAX(backup_finish_date) backupdate FROM backupset WHERE type LIKE 'L' GROUP BY database_name,type) b on d.name=b.database_name WHERE recovery_model = 1 AND (backupdate IS NULL OR backupdate < getdate()-1)
Item 4 - Are there any errors in your SQL Server Error Log?
In order to check the SQL Server Error Log we are going to use the undocumented extended stored procedure, xp_readerrorlog. This query will look at the current log and go back a maximum of 2 days looking for any errors during that time frame.
declare @Time_Start datetime; declare @Time_End datetime; set @Time_Start=getdate()-2; set @Time_End=getdate(); -- Create the temporary table CREATE TABLE #ErrorLog (logdate datetime , processinfo varchar(255) , Message varchar(500)) -- Populate the temporary table INSERT #ErrorLog (logdate, processinfo, Message) EXEC master.dbo.xp_readerrorlog 0, 1, null, null , @Time_Start, @Time_End, N'desc'; -- Filter the temporary table SELECT LogDate, Message FROM #ErrorLog WHERE (Message LIKE '%error%' OR Message LIKE '%failed%') AND processinfo NOT LIKE 'logon' ORDER BY logdate DESC -- Drop the temporary table DROP TABLE #ErrorLog
Item 5 - Are you running out of space on any of your disks on your SQL Server?
You can used the extended stored procedure xp_fixeddrives to get a quick look at the space left on your drives.
- SQLCLR function to return free space for all drives on a server
- Determine Free Disk Space in SQL Server with TSQL Code
- Script to Get Available and Free Disk Space for SQL Server
Item 6 - Are you running low on server memory for SQL Server?
To check the memory on your server we can use the dynamic management view dm_os_sys_memory.
SELECT available_physical_memory_kb/1024 as "Total Memory MB", available_physical_memory_kb/(total_physical_memory_kb*1.0)*100 AS "% Memory Free" FROM sys.dm_os_sys_memory
Item 7 - Are there any SQL Server statements in the cache that could use tuning?
The following query will identify any poor performing SQL statements. You can alter the "order by" clause depending on what you are most concerned with (IO vs. CPU vs. Elapsed Time).
SELECT top 10 text as "SQL Statement", last_execution_time as "Last Execution Time", (total_logical_reads+total_physical_reads+total_logical_writes)/execution_count as [Average IO], (total_worker_time/execution_count)/1000000.0 as [Average CPU Time (sec)], (total_elapsed_time/execution_count)/1000000.0 as [Average Elapsed Time (sec)], execution_count as "Execution Count", qp.query_plan as "Query Plan" FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp order by total_elapsed_time/execution_count desc
Item 8 - How many connections do you have to your SQL Server instance?
This query on its own does not provide too much information other than show you if there is some blocking in the system. However, once you get a baseline for your applications through running this query, you'll be able to see if you have a higher than normal number of connections. This can be an early sign that there may be a problem.
SELECT spid, kpid, blocked, d.name, open_tran, status, hostname, cmd, login_time, loginame, net_library FROM sys.sysprocesses p INNER JOIN sys.databases d on p.dbid=d.database_id
Item 9 - How many requests is your SQL Server processing?
As with the previous query, checking the number of requests coming into your SQL Server does not tell you too much. If you capture this number during normal operation you can use it as a baseline for comparison later on. Generally speaking around 1000/sec is a busy SQL Server but this number depends a lot on the hardware you are running on. For others 100/sec may be too much for their instance to handle. Also, using the below query as a template, you can query other O/S performance counters related to SQL Server.
DECLARE @BRPS BIGINT SELECT @BRPS=cntr_value FROM sys.dm_os_performance_counters WHERE counter_name LIKE 'Batch Requests/sec%' WAITFOR DELAY '000:00:10' SELECT (cntr_value-@BRPS)/10.0 AS "Batch Requests/sec" FROM sys.dm_os_performance_counters WHERE counter_name LIKE 'Batch Requests/sec%'
- SQL Server Performance Counter DMV sys.dm_os_performance_counters
- Trending Buffer Pool Performance Using DMV sys.dm_os_performance_counters
- Collecting SQL Server performance counter data for trending
- Collect and store historical SQL Server performance counter data with DMVs
- Run the above queries on a group of servers concurrently using SSMS multi-server query functionality
- Investigate third party software to perform monitoring more efficiently
- Use Microsoft SQL Server Best practice Analyzer - 2005 - 2008
- Collect and store performance data for trend analysis - Part 1 - Part 2
About the author
View all my tips