By: Ben Snaidero | Updated: 2016-01-25 | Comments (1) | Monitoring
We suspect there are some issues with our disk subsystem and would like to be alerted by email if SQL Server encounters any corruption in any of the data files that reside on the disk in question. How can we add this automated monitoring?
Luckily for us, SQL Server 2005 introduced a new table in the msdb database called suspect_pages which logs any pages which it suspects might be bad. Any time the database engine comes across a page id that it thinks might be bad, whether through a query reading a page, a DBCC operation or a backup/restore, the page id is added/updated in this table along with some details about the event. The details can be found in the event_type column in this table and a description of these types is below. Notice that event_types 4, 5 and 7 actually show that the issue has been resolved.
It's important to note that suspect_pages has to be maintained by a DBA or someone with permissions to make updates in the msdb database. This table is limited to 1000 rows, so if there are records in the table that are no longer needed they should be deleted so new rows can be added if required. More on managing this table can be found here.
The first thing we'll need is a query to pull information from the suspect_pages table. Since this table mainly contains non-descriptive ids, let's join it with sys.databases and sys.master_files so our alert will contain everything we need to track down and resolve the issue. Below is the TSQL for this query.
select sp.database_id "Database ID", d.name "Database", sp.file_id "File ID", mf.physical_name "File", sp.page_id "Page ID", case when sp.event_type = 1 then '823 or 824 error other than a bad checksum or a torn page' when sp.event_type = 2 then 'Bad checksum' when sp.event_type = 3 then 'Torn Page' when sp.event_type = 4 then 'Restored (The page was restored after it was marked bad)' when sp.event_type = 5 then 'Repaired (DBCC repaired the page)' when sp.event_type = 7 then 'Deallocated by DBCC' end as "Event Desc", sp.error_count "Error Count", sp.last_update_date "Last Updated" from msdb.dbo.suspect_pages sp inner join sys.databases d on d.database_id=sp.database_id inner join sys.master_files mf on mf.database_id=sp.database_id and mf.file_id=sp.file_id
Now that we have a query with all the information we need we just have to wrap this query with some logic to send out an email only when it actually finds data. Using database mail and some html formatting, samples of each can be found here, we can produce a nicely formatted email alert. The TSQL code for this is below and can be added to a new or existing SQL job to monitor your instance.
declare @count integer declare @tableHTML nvarchar(MAX); declare @subj nvarchar(100) select @count=count(1) from msdb.dbo.suspect_pages; set @subj = 'Suspect Pages Found in ' + @@SERVERNAME; set @tableHTML = N'<H1>Suspect Pages Found in ' + @@SERVERNAME + ', details are below.</H1>' + N'<table border="1">' + N'<tr><th>Database ID</th><th>Database</th>' + N'<th>File ID</th><th>File</th><th>Page ID</th>' + N'<th>Event Desc</th><th>Error Count</th><th>Last Updated</th></tr>' + cast ( ( select td = sp.database_id, '', td = d.name, '', td = sp.file_id, '', td = mf.physical_name, '', td = sp.page_id, '', td = case when sp.event_type = 1 then '823 or 824 error other than a bad checksum or a torn page' when sp.event_type = 2 then 'Bad checksum' when sp.event_type = 3 then 'Torn Page' when sp.event_type = 4 then 'Restored (The page was restored after it was marked bad)' when sp.event_type = 5 then 'Repaired (DBCC repaired the page)' when sp.event_type = 7 then 'Deallocated by DBCC' end, '', td = sp.error_count, '', td = sp.last_update_date from msdb.dbo.suspect_pages sp inner join sys.databases d on d.database_id=sp.database_id inner join sys.master_files mf on mf.database_id=sp.database_id and mf.file_id=sp.file_id for xml path('tr'), TYPE ) as nvarchar(max) ) + N'</table>' ; IF @count > 0 exec msdb.dbo.sp_send_dbmail @recipients=N'[email protected]', @body= @tableHTML, @subject = @subj, @body_format = 'HTML', @profile_name ='testprofile'
As an added benefit, assuming you would be running this at some interval whether it be daily/weekly/monthly whatever you decide is best for your environment, we could also piggy back on this job and do the cleanup of this table at the same time. Adding another step to this job that contains the following TSQL we can purge anything in this table that is older than 90 days. Note that you can change this purge logic however you see fit.
delete from msdb.dbo.suspect_pages where last_update_date < getdate()-90
Hopefully you check the suspect_pages table in all of your instances and you don't find anything. Rather than have you go through the process of corrupting a page in one of your databases to see what the report looks like I'll include a sample of one below for reference.
- Read more on ways to fix corrupted pages:
- Read more tips on monitoring SQL Server
- Read more tips on database mail
Last Updated: 2016-01-25
About the author
View all my tips