SQL Server Monitoring Checklist

By:   |   Comments (21)   |   Related: > Monitoring


Problem

Every good SQL Server DBA goes through some list of checks when they get to the office in the morning to make sure all of their systems are running smoothly. This tip will highlight some of the more important items that every DBA should be monitoring either manually or by using some form of scheduled scripting.

Note: All of the following queries have been tested on SQL Server 2008 and may not work correctly with previous versions of SQL Server

Solution

Item 1 - Are all of your SQL Server services running?

Obviously once you connect to your instance you know that the database service is up and running but you can use the extended stored procedure xp_servicecontrol to check if any service is up and running. Below are a few examples of what you can check.

exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'MSSQLServer'
exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'SQLServerAgent'
exec master.dbo.xp_servicecontrol 'QUERYSTATE', 'SQLBrowser'

SQL Server services are running

Related tips:


Item 2 - Did all of your SQL Agent Jobs run successfully?

This item can be checked with a fairly straightforward query of the msdb database. The first part of the query checks for any failed job steps and the second part is only concerned with the overall job status. This is also checked because a step could be set to continue even on failure, but should probably still be looked at in the morning. Also, if you are using the SQL Server Agent to backup your databases then this is also a good way to check if any backup jobs failed.

use msdb
go
select 'FAILED' as Status, cast(sj.name as varchar(100)) as "Job Name",
       cast(sjs.step_id as varchar(5)) as "Step ID",
       cast(sjs.step_name as varchar(30)) as "Step Name",
       cast(REPLACE(CONVERT(varchar,convert(datetime,convert(varchar,sjh.run_date)),102),'.','-')+' '+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),1,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),3,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),5,2) as varchar(30)) 'Start Date Time',
       sjh.message as "Message"
from sysjobs sj
join sysjobsteps sjs 
 on sj.job_id = sjs.job_id
join sysjobhistory sjh 
 on sj.job_id = sjh.job_id and sjs.step_id = sjh.step_id
where sjh.run_status <> 1
  and cast(sjh.run_date as float)*1000000+sjh.run_time > 
      cast(convert(varchar(8), getdate()-1, 112) as float)*1000000+70000 --yesterday at 7am
union
select 'FAILED',cast(sj.name as varchar(100)) as "Job Name",
       'MAIN' as "Step ID",
       'MAIN' as "Step Name",
       cast(REPLACE(CONVERT(varchar,convert(datetime,convert(varchar,sjh.run_date)),102),'.','-')+' '+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),1,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),3,2)+':'+SUBSTRING(RIGHT('000000'+CONVERT(varchar,sjh.run_time),6),5,2) as varchar(30)) 'Start Date Time',
       sjh.message as "Message"
from sysjobs sj
join sysjobhistory sjh 
 on sj.job_id = sjh.job_id
where sjh.run_status <> 1 and sjh.step_id=0
  and cast(sjh.run_date as float)*1000000+sjh.run_time >
      cast(convert(varchar(8), getdate()-1, 112) as float)*1000000+70000 --yesterday at 7am

Verify all of your SQL Server Agent Jobs have run successfully

Related tips:


Item 3 - Do you have a recent backup of all your SQL Server databases?

The two queries below will list any database that either does not have any backup or has not been backed up in the last 24 hours. The first query checks your full backups and the second query checks your transaction log backups (only for those databases in full recovery mode).

SELECT d.name AS "Database",
       ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Full Backup"
FROM sys.databases d
LEFT JOIN (SELECT database_name,type,MAX(backup_finish_date) backupdate FROM backupset
           WHERE type LIKE 'D'
           GROUP BY database_name,type) b on d.name=b.database_name
WHERE (backupdate IS NULL OR backupdate < getdate()-1)
SELECT d.name AS "Database",
       ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Log Backup"
FROM sys.databases d
LEFT JOIN (SELECT database_name,type,MAX(backup_finish_date) backupdate FROM backupset
           WHERE type LIKE 'L'
           GROUP BY database_name,type) b on d.name=b.database_name
WHERE recovery_model = 1
  AND (backupdate IS NULL OR backupdate < getdate()-1)

Verify your full SQL Server database backups

Verify your transaction log SQL Server database backups

Related tips:


Item 4 - Are there any errors in your SQL Server Error Log?

In order to check the SQL Server Error Log we are going to use the undocumented extended stored procedure, xp_readerrorlog. This query will look at the current log and go back a maximum of 2 days looking for any errors during that time frame.

declare @Time_Start datetime;
declare @Time_End datetime;
set @Time_Start=getdate()-2;
set @Time_End=getdate();
-- Create the temporary table
CREATE TABLE #ErrorLog (logdate datetime
                      , processinfo varchar(255)
                      , Message varchar(500))
-- Populate the temporary table
INSERT #ErrorLog (logdate, processinfo, Message)
   EXEC master.dbo.xp_readerrorlog 0, 1, null, null , @Time_Start, @Time_End, N'desc';
-- Filter the temporary table
SELECT LogDate, Message FROM #ErrorLog
WHERE (Message LIKE '%error%' OR Message LIKE '%failed%') AND processinfo NOT LIKE 'logon'
ORDER BY logdate DESC
-- Drop the temporary table 
DROP TABLE #ErrorLog

SQL Server Error Log Messages

Related tips:



Item 5 - Are you running out of space on any of your disks on your SQL Server?

You can used the extended stored procedure xp_fixeddrives to get a quick look at the space left on your drives.

exec master.dbo.xp_fixeddrives

SQL Server Free Disk Sapce

Related tips:


Item 6 - Are you running low on server memory for SQL Server?

To check the memory on your server we can use the dynamic management view dm_os_sys_memory.

SELECT available_physical_memory_kb/1024 as "Total Memory MB",
       available_physical_memory_kb/(total_physical_memory_kb*1.0)*100 AS "% Memory Free"
FROM sys.dm_os_sys_memory

SQL Server Memory Usage

Related tips:


Item 7 - Are there any SQL Server statements in the cache that could use tuning?

The following query will identify any poor performing SQL statements. You can alter the "order by" clause depending on what you are most concerned with (IO vs. CPU vs. Elapsed Time).

SELECT top 10 text as "SQL Statement",
   last_execution_time as "Last Execution Time",
   (total_logical_reads+total_physical_reads+total_logical_writes)/execution_count as [Average IO],
   (total_worker_time/execution_count)/1000000.0 as [Average CPU Time (sec)],
   (total_elapsed_time/execution_count)/1000000.0 as [Average Elapsed Time (sec)],
   execution_count as "Execution Count",
   qp.query_plan as "Query Plan"
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.plan_handle) st
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
order by total_elapsed_time/execution_count desc

Tune SQL Server Queries

Related tips:


Item 8 - How many connections do you have to your SQL Server instance?

This query on its own does not provide too much information other than show you if there is some blocking in the system. However, once you get a baseline for your applications through running this query, you'll be able to see if you have a higher than normal number of connections. This can be an early sign that there may be a problem.

SELECT spid, kpid, blocked, d.name, open_tran, status, hostname,
cmd, login_time, loginame, net_library
FROM sys.sysprocesses p
INNER JOIN sys.databases d 
 on p.dbid=d.database_id

SQL Server Connections

Item 9 - How many requests is your SQL Server processing?

As with the previous query, checking the number of requests coming into your SQL Server does not tell you too much. If you capture this number during normal operation you can use it as a baseline for comparison later on. Generally speaking around 1000/sec is a busy SQL Server but this number depends a lot on the hardware you are running on. For others 100/sec may be too much for their instance to handle. Also, using the below query as a template, you can query other O/S performance counters related to SQL Server.

DECLARE @BRPS BIGINT
SELECT @BRPS=cntr_value 
FROM sys.dm_os_performance_counters
WHERE counter_name LIKE 'Batch Requests/sec%'
WAITFOR DELAY '000:00:10'
SELECT (cntr_value-@BRPS)/10.0 AS "Batch Requests/sec"
FROM sys.dm_os_performance_counters
WHERE counter_name LIKE 'Batch Requests/sec%'

SQL Server Batch Requests Per Second

Related tips:

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Ben Snaidero Ben Snaidero has been a SQL Server and Oracle DBA for over 10 years and focuses on performance tuning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Thursday, February 20, 2020 - 3:54:42 AM - ankur Back To Top (84663)

Hi Team,

Can someone please share me script which will consolidate all the result instead of 1 email for each step i need 1 email for all steps.


Friday, November 20, 2015 - 8:50:05 AM - Vivekananda Back To Top (39112)

sqlservr.exe -> cpu usage taking high and some times CPU usage Performance also going 100 percent. so, please sugget me. how to resolve this type issues. am always strugling this type of performance issues.


Wednesday, August 5, 2015 - 9:51:19 PM - Sovana Shrestha Back To Top (38390)

Thanks Ben. Great article.

I am thinking to schedule those to run weekly basis and configure to send an email like a weekly report.

Sovana

 


Thursday, March 26, 2015 - 11:13:55 PM - Ben Snaidero Back To Top (36726)

Hi Omraj,

Just curious what version of SQL Server you are using?  As I mention at the start of the tip these were written for 2008 so all the queries may not work for earlier versions.

Thanks for reading.

Ben


Thursday, March 26, 2015 - 9:43:59 AM - Omraj Kadam Back To Top (36712)

Hi Ben,

Thanks for your SQL Server tips..!

But actually I'm facing an issue,When i execute the query below

SELECT spid, kpid, blocked, d.name, open_tran, status, hostname,
cmd, login_time, loginame, net_library
FROM sys.sysprocesses p
INNER JOIN sys.databases d
 on p.dbid=d.database_id

 

i get an error...! as

Msg 208, Level 16, State 1, Line 1
Invalid object name 'sys.sysprocesses'.
Msg 208, Level 16, State 1, Line 1
Invalid object name 'sys.databases'.

 

thank you for help in advance..!

Regards,

Omraj


Wednesday, April 30, 2014 - 10:24:56 AM - Chad Collier Back To Top (30564)

Excellent Tips! Also well put together, very concise and easy to understand. 


Tuesday, November 6, 2012 - 6:23:48 AM - dinesh Back To Top (20223)

DECLARE

 

@BRPS

BIGINT

SELECT

 

@BRPS=

cntr_value

FROM

 

sys.

dm_os_performance_counters

WHERE

 

counter_name LIKE

'Batch Requests/sec%'

WAITFOR

 

DELAY

'000:00:10'

SELECT

 

(cntr_value-@BRPS)/10.0 AS

"Batch Requests/sec"

FROM

 

sys.

dm_os_performance_counters

WHERE

 

counter_name LIKE'Batch Requests/sec%'

 

 

 

 

 

 

 

 

=> is showing 182.700000.

Please guide me...


Thursday, June 14, 2012 - 6:21:05 AM - Sai Back To Top (17991)

Thanks Ben,

Nice Article for all Jr DBA's .

Thanks & Regards,

Sai


Sunday, March 25, 2012 - 11:13:44 PM - Ben Snaidero Back To Top (16605)

Hi adveet,

I actually ended up writing a seperate tip on automating these checks.  You can find that tip here.

http://www.mssqltips.com/sqlservertip/2551/automate-sql-server-monitoring-with-email-alerts/

Please feel free to post a comment on that tip if you have any questions.

Thanks for reading

Ben.


Saturday, March 24, 2012 - 3:15:58 PM - adveet Back To Top (16603)

HI,

I am doing relgular checks manully i want to autamate the above mentioned steps in artical with job,

could you please tell me, hows it will be done and can we get output in on report format in excel or txt


Thursday, October 20, 2011 - 6:18:10 PM - Jeremy Kadlec Back To Top (14885)

John and Azim,

Thank you for the posts. Hopefully we will have some upcoming tips on monitoring and alerting.

I think Ben did a great job outlining the basic SQL Server items to monitor when you start your day.

Hopefully this will be the first of many tips from Ben to benefit the community.

Thank you,
Jeremy Kadlec

 


Thursday, October 20, 2011 - 11:32:48 AM - Ben Snaidero Back To Top (14882)

Checks like this, whether run manually or through some automated scripting, are usually run under an account with sysadmin privileges.  If you would like to run these under some other account that does not have this privilege you'd have to grant privileges to the account on the individual objects.


Thursday, October 20, 2011 - 9:54:42 AM - cecil small Back To Top (14881)

I think it is a great article however i am having permission issue on item1 , Item2 and Item3 when i tried to run the script.

Any ideas?


Wednesday, October 19, 2011 - 1:22:23 PM - Azim Back To Top (14877)

Good write up. I strongly agree with John for automating database monitoring in way that it not only updates you regarding the DB statuses but also strikes you, the moment things go wrong  


Wednesday, October 19, 2011 - 1:01:58 PM - --cranfield Back To Top (14876)

Nice article.  We also check replication and mirroring latency...


Wednesday, October 19, 2011 - 12:58:54 PM - John Fox Back To Top (14875)
DBAs checking things like this on a regular basis is a waste of time. (pause for effect...) However, having the COMPUTER check these things on a regular basis and send emails or text pages when things are not as they should be is an EFFICIENT use of time. For example, I have an automated task that checks every 5 minutes if there are any sessions waiting for memory, or if the amount of available memory exceeds certain limits. Or if there are any queries of a certain type that are running for more than 5 minutes. I have a tasks that checks for databases that are missing backups. Another checks the SQL Server logs for errors every 5 minutes, it's very handy to know if someone can't login because they are using the wrong password or username and very efficient to have the answer to their question BEFORE they call you. All this is done through pages, emails, or text messaging, depending on the severity of the problem. Some go just to me, some go to ops also. I also get automated emails about system statistics that I file away and look at from time to time. But I do it when I have the time, not every morning when I come in. The scripts in the article are very useful, but if a DBA is running a set of scripts every day and checking them manually, then they should think about how to get the computer to do the work for them so they can spend time building that test system the developers needed yesterday. Exception emails are more likely to be noticed than 'daily' emails. Daily statistical emails are great things to file away and use when needed. Filters in mail packages provide effective ways of separating them, or bringing the more important ones to your attention. And yes .. you also need a way to make sure the monitoring system is working properly. That's why I don't use SQL Jobs to do any of this. We have multiple monitoring systems that not only monitor the databases, but also monitor each other, and run external to the database.

Wednesday, October 19, 2011 - 12:26:03 PM - Greg Larsen Back To Top (14874)

Great article, but your script to identify the databases without backups in last 24 hours didn't consider differential backups.  I changed the code as below, so databases that had a differential backup in the last 24 didn't show up on the report.

 

SELECT d.name AS "Database",
       ISNULL(CONVERT(VARCHAR,b.backupdate,120),'NEVER') AS "Last Full Backup"
FROM sys.databases d
LEFT JOIN (SELECT database_name,MAX(backup_finish_date) backupdate FROM backupset
           WHERE type LIKE 'D' or type like 'I'
           GROUP BY database_name) b on d.name=b.database_name
WHERE (backupdate IS NULL OR backupdate < getdate()-1)


Wednesday, October 19, 2011 - 11:05:26 AM - Tim G Back To Top (14873)

 

Joe Sack had a similar list posted at http://www.sqlservercentral.com/articles/Administration/sevenmonitoringscripts/1024/

The ones he checked were:

  1. Failed jobs report
  2. Free space by drive.
  3. Disabled jobs.
  4. Running jobs
  5. Server role members.
  6. Last backup date.
  7. SQL Log for errors

Wednesday, October 19, 2011 - 9:20:21 AM - lit Back To Top (14872)

thanks a lot!


Wednesday, October 19, 2011 - 8:50:49 AM - madmucho Back To Top (14871)

Realy good article thank you very much.


Wednesday, October 19, 2011 - 8:03:33 AM - Lamar Marsengill Back To Top (14870)

Thanks for the great article.















get free sql tips
agree to terms