SQL Server Database Corruption and Impact of running CHECKDB repair with allow data loss
By: Simon Liew | Updated: 2018-08-24 | Comments (1) | Related: More > Database Console Commands DBCCs
I am learning about SQL Server database corruption and had a question. When a single row in a SQL Server data page is corrupted, for example due to page CHECKSUM error, what will happen if I run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS? Would I lose more data or is only the corrupted row deallocated?
Depending on the type of corruption, data loss using CHECKDB with REPAIR_ALLOW_DATA_LOSS can vary to different degrees. Specifically, to the problem description, even when we know for sure only a single row is corrupted in a data page which causes a CHECKSUM error, the whole data page will be deallocated when SQL Server repairs with allow data loss.
CHECKSUM is calculated and stored at the page level (not at each row level) hence it provides a high level of data-file integrity. Here is an over-simplification of how page CHECKSUM works; assume a data page contain 100 rows with a single value of 2 in each row. The CHECKSUM algorithm sums these values and stores the sum (which is 200) to the page header as a checksum. An external factor outside of SQL Server then somehow changes one of the rows into a value of 0 in the data page. When SQL Server reads this data page into the buffer pool and validates the page checksum, the calculated checksum value would be 198, but the checksum value stored in the page header would still be 200. Hence, SQL Server would raise this as a CHECKSUM corruption error, because the actual and expected checksum values are different.
The demonstration below will step you through a contrived example of a page CHECKSUM error and you will see that SQL Server deallocates the whole data page even when only a single row is corrupted.
SQL Server Corruption Demonstration
The code demonstration here is written on SQL Server 2016 Developer Edition on Service Pack 2.
This demonstration will focus on what data are discarded when REPAIR_ALLOW_DATA_LOSS option is specified with DBCC CHECKDB.
Create a Corrupt Test SQL Server Database
The script below will create a database [CorruptionTest] and populate table dbo.mssqltips with 250,000 rows. Table dbo.mssqltips has a column BigCol with data type CHAR(2000). With this, we are expecting each data page to fit 3 rows per page.
USE master GO DROP DATABASE IF EXISTS [CorruptionTest] GO CREATE DATABASE [CorruptionTest] GO ALTER DATABASE [CorruptionTest] MODIFY FILE ( NAME = N'CorruptionTest', SIZE = 2GB ) GO ALTER DATABASE [CorruptionTest] MODIFY FILE ( NAME = N'CorruptionTest_log', SIZE = 2GB ) GO ALTER DATABASE [CorruptionTest] SET RECOVERY FULL; GO ALTER DATABASE [CorruptionTest] SET PAGE_VERIFY CHECKSUM GO CREATE TABLE CorruptionTest.dbo.mssqltips (increment INT, randomGUID uniqueidentifier, randomValue INT, BigCol CHAR(2000) DEFAULT 'a', INDEX CIX_SQLShack_increment1 UNIQUE CLUSTERED (increment)) GO SET NOCOUNT ON; DECLARE @counter INT = 1; BEGIN TRAN WHILE @counter <= 250000 BEGIN INSERT INTO CorruptionTest.dbo.mssqltips (increment, randomGUID, randomValue) VALUES (@counter, NEWID(), ABS(CHECKSUM(NewId())) % 140000000) SET @counter += 1 END; COMMIT TRAN; GO
The query below outputs the table dbo.mssqltips top 10 rows across 3 data pages - Page Id 224, 226 and 227. As expected, we can see each data page contains 3 rows. For this corruption demonstration, we randomly choose row with increment = 5 in blue to work with. We will corrupt the value in column BigCol in Page Id 226 Slot 1 with 0x0.
In this tip, we will not describe the steps to corrupt a value in a data page. You can use the T-SQL command in my previous tip Undetected SQL Server Data Corruption with Database Page Verify option set to NONE to create the corruption.
SELECT TOP 10 sys.fn_PhysLocFormatter(%%physloc%%) PageId, * FROM [CorruptionTest].[dbo].[mssqltips]
Analyze the Corrupt Data Page Content in SQL Server
So, we have corrupted the value in BigCol with 0x0 on Page Id 226 Slot 1. Using DBCC PAGE, we can look at the content of Page Id 226.
From the output, Slot 0 and Slot 2 in Page Id 226 is still carrying all the value that we have inserted. The value of BigCol in Slot 1 is carrying “blank”, because we have overwritten it with value 0x0.
DBCC TRACEON (3604); GO DBCC PAGE ('CorruptionTest', 1, 226, 3); GO
Accessing Corrupted Data Page in SQL Server
Corruption can remain hidden in a database unless the corrupted page is read by storage engine into memory and buffer pool validates the page checksum, or the corruption is detected when performing a database consistency check.
To prove this statement above, we will execute 3 queries. The first two queries will not access the corrupt page. However, the third query will hit the corrupt data page with SELECT TOP 24995.
SELECT TOP 3 * FROM [CorruptionTest].[dbo].mssqltips GO SELECT TOP 249994 * FROM [CorruptionTest].[dbo].mssqltips ORDER BY increment DESC GO SELECT TOP 249995 * FROM [CorruptionTest].[dbo].mssqltips ORDER BY increment DESC GO
The first 2 queries execute successfully and return rows because it does not need to touch the corrupted data page. The third query fails because it requires accessing the corrupted page. As soon as the storage engine tries to retrieve the corrupted data page, the query was terminated and returns an error.
DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS
Let’s repair the database with allow data loss. We will need to set the database into single user mode, run the repair and then set the database back into multi user mode.
USE master; ALTER DATABASE [CorruptionTest] SET SINGLE_USER WITH ROLLBACK IMMEDIATE; GO DBCC CHECKDB ('CorruptionTest', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS; GO ALTER DATABASE [CorruptionTest] SET MULTI_USER; GO
We run the initial query to retrieve the top 10 rows again and it looks like increment 4, 5 and 6 are now gone. This is because the whole Page Id 226 was deallocated. All rows in table dbo.mssqltips get a new Page Id because the table has been rebuilt and allocated new pages. If you read the DBCC CHECKDB output messages closely, all these details were described in the messages pane above.
In this tip, we saw that SQL Server removes the whole data page that contains a single corrupted row.
Similarly, if corruption occurs on system tables, it would mean user table entries are deallocated from system tables even though the user table itself is not corrupt. Hence, after the repair with allow data loss the user tables affected would just disappear.
Resolving corruption is a delicate process and your chances to avoid corruption are better with a comprehensive backup regime and periodic restore testing of backup files, as well as running integrity checks.
- DBCC CHECKDB (Transact-SQL)
- Pages and Extents Architecture Guide
- Undetected SQL Server Data Corruption with Database Page Verify option set to NONE
- Set the PAGE_VERIFY Database Option to CHECKSUM
About the author
View all my tips
Article Last Updated: 2018-08-24