SQL Server Database Corruption and Impact of running CHECKDB repair with allow data loss


By:   |   Updated: 2018-08-24   |   Comments   |   Related: More > Database Console Commands DBCCs


Problem

I am learning about SQL Server database corruption and had a question. When a single row in a SQL Server data page is corrupted, for example due to page CHECKSUM error, what will happen if I run DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS? Would I lose more data or is only the corrupted row deallocated?

Solution

Depending on the type of corruption, data loss using CHECKDB with REPAIR_ALLOW_DATA_LOSS can vary to different degrees. Specifically, to the problem description, even when we know for sure only a single row is corrupted in a data page which causes a CHECKSUM error, the whole data page will be deallocated when SQL Server repairs with allow data loss.

CHECKSUM is calculated and stored at the page level (not at each row level) hence it provides a high level of data-file integrity. Here is an over-simplification of how page CHECKSUM works; assume a data page contain 100 rows with a single value of 2 in each row. The CHECKSUM algorithm sums these values and stores the sum (which is 200) to the page header as a checksum. An external factor outside of SQL Server then somehow changes one of the rows into a value of 0 in the data page. When SQL Server reads this data page into the buffer pool and validates the page checksum, the calculated checksum value would be 198, but the checksum value stored in the page header would still be 200. Hence, SQL Server would raise this as a CHECKSUM corruption error, because the actual and expected checksum values are different.

The demonstration below will step you through a contrived example of a page CHECKSUM error and you will see that SQL Server deallocates the whole data page even when only a single row is corrupted.

SQL Server Corruption Demonstration

The code demonstration here is written on SQL Server 2016 Developer Edition on Service Pack 2.

This demonstration will focus on what data are discarded when REPAIR_ALLOW_DATA_LOSS option is specified with DBCC CHECKDB.

Create a Corrupt Test SQL Server Database

The script below will create a database [CorruptionTest] and populate table dbo.mssqltips with 250,000 rows. Table dbo.mssqltips has a column BigCol with data type CHAR(2000). With this, we are expecting each data page to fit 3 rows per page.

USE master
GO
DROP DATABASE IF EXISTS [CorruptionTest]
GO
CREATE DATABASE [CorruptionTest]
GO
ALTER DATABASE [CorruptionTest] MODIFY FILE ( NAME = N'CorruptionTest', SIZE = 2GB )
GO
ALTER DATABASE [CorruptionTest] MODIFY FILE ( NAME = N'CorruptionTest_log', SIZE = 2GB )
GO
ALTER DATABASE [CorruptionTest] SET RECOVERY FULL;
GO
ALTER DATABASE [CorruptionTest] SET PAGE_VERIFY CHECKSUM  
GO
CREATE TABLE CorruptionTest.dbo.mssqltips 
(increment INT, randomGUID uniqueidentifier, randomValue INT, BigCol CHAR(2000) DEFAULT 'a',
INDEX CIX_SQLShack_increment1 UNIQUE CLUSTERED (increment))
GO

SET NOCOUNT ON;
DECLARE @counter INT = 1;
BEGIN TRAN
   WHILE @counter <= 250000
   BEGIN
      INSERT INTO CorruptionTest.dbo.mssqltips (increment, randomGUID, randomValue) 
      VALUES (@counter, NEWID(), ABS(CHECKSUM(NewId())) % 140000000)

      SET @counter += 1
   END;
COMMIT TRAN;
GO

The query below outputs the table dbo.mssqltips top 10 rows across 3 data pages - Page Id 224, 226 and 227. As expected, we can see each data page contains 3 rows. For this corruption demonstration, we randomly choose row with increment = 5 in blue to work with. We will corrupt the value in column BigCol in Page Id 226 Slot 1 with 0x0.

In this tip, we will not describe the steps to corrupt a value in a data page. You can use the T-SQL command in my previous tip Undetected SQL Server Data Corruption with Database Page Verify option set to NONE to create the corruption.

SELECT TOP 10
   sys.fn_PhysLocFormatter(%%physloc%%) PageId,
   *
FROM [CorruptionTest].[dbo].[mssqltips]
Query to retrieve top 10 rows from table

Analyze the Corrupt Data Page Content in SQL Server

So, we have corrupted the value in BigCol with 0x0 on Page Id 226 Slot 1. Using DBCC PAGE, we can look at the content of Page Id 226.

From the output, Slot 0 and Slot 2 in Page Id 226 is still carrying all the value that we have inserted. The value of BigCol in Slot 1 is carrying “blank”, because we have overwritten it with value 0x0.

DBCC TRACEON (3604);
GO
DBCC PAGE ('CorruptionTest', 1, 226, 3);
GO
DBCC PAGE to display content of corrupted page id 226

Accessing Corrupted Data Page in SQL Server

Corruption can remain hidden in a database unless the corrupted page is read by storage engine into memory and buffer pool validates the page checksum, or the corruption is detected when performing a database consistency check.

To prove this statement above, we will execute 3 queries. The first two queries will not access the corrupt page. However, the third query will hit the corrupt data page with SELECT TOP 24995.

SELECT TOP 3 *
FROM [CorruptionTest].[dbo].mssqltips
GO
SELECT TOP 249994 *
FROM [CorruptionTest].[dbo].mssqltips
ORDER BY increment DESC
GO
SELECT TOP 249995 *
FROM [CorruptionTest].[dbo].mssqltips
ORDER BY increment DESC
GO

The first 2 queries execute successfully and return rows because it does not need to touch the corrupted data page. The third query fails because it requires accessing the corrupted page. As soon as the storage engine tries to retrieve the corrupted data page, the query was terminated and returns an error.

3 queries showing error only when storage engine needs to access the corrupted data page.&#xA;The error message indicate that the expected and actual checksum value is different.

DBCC CHECKDB with REPAIR_ALLOW_DATA_LOSS

Let’s repair the database with allow data loss. We will need to set the database into single user mode, run the repair and then set the database back into multi user mode.

USE master;
ALTER DATABASE [CorruptionTest] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
GO
DBCC CHECKDB ('CorruptionTest', REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;
GO
ALTER DATABASE [CorruptionTest] SET MULTI_USER;
GO
Repair database with allow data loss

We run the initial query to retrieve the top 10 rows again and it looks like increment 4, 5 and 6 are now gone. This is because the whole Page Id 226 was deallocated. All rows in table dbo.mssqltips get a new Page Id because the table has been rebuilt and allocated new pages. If you read the DBCC CHECKDB output messages closely, all these details were described in the messages pane above.

Query to retrieve top 10 rows from table

Conclusion

In this tip, we saw that SQL Server removes the whole data page that contains a single corrupted row.

Similarly, if corruption occurs on system tables, it would mean user table entries are deallocated from system tables even though the user table itself is not corrupt. Hence, after the repair with allow data loss the user tables affected would just disappear.

Resolving corruption is a delicate process and your chances to avoid corruption are better with a comprehensive backup regime and periodic restore testing of backup files, as well as running integrity checks.

Next Steps


Last Updated: 2018-08-24


get scripts

next tip button



About the author
MSSQLTips author Simon Liew Simon Liew is an independent SQL Server Consultant in Sydney, Australia. He is a Microsoft Certified Master for SQL Server 2008 and holds a Masterís Degree in Distributed Computing.

View all my tips
Related Resources





Comments For This Article





download





Recommended Reading

Using DBCC PAGE to Examine SQL Server Table and Index Data

Minimize performance impact of SQL Server DBCC CHECKDB

Error 601: Could not continue scan with NOLOCK due to SQL Server data movement

Execute SQL Server DBCC SHRINKFILE Without Causing Index Fragmentation

SQL Server DBCC CHECKDB Overview














get free sql tips
agree to terms