Understanding how SQL Server stores data in data files

By:   |   Comments (14)   |   Related: 1 | 2 | More > Database Administration


Problem

Have you ever thought about how SQL Server stores data in its data files? As you know, data in tables is stored in row and column format at the logical level, but physically it stores data in data pages which are allocated from the data files of the database. In this tip I will show how pages are allocated to data files and what happens when there are multiple data files for a SQL Server database.

Solution

Every SQL Server database has at least two operating system files: a data file and a log file. Data files can be of two types: Primary or Secondary.  The Primary data file contains startup information for the database and points to other files in the database. User data and objects can be stored in this file and every database has one primary data file. Secondary data files are optional and can be used to spread data across multiple files/disks by putting each file on a different disk drive. SQL Server databases can have multiple data and log files, but only one primary data file. Above these operating system files, there are Filegroups. Filegroups work as a logical container for the data files and a filegroup can have multiple data files.

The disk space allocated to a data file is logically divided into pages which is the fundamental unit of data storage in SQL Server. A database page is an 8 KB chunk of data. When you insert any data into a SQL Server database, it saves the data to a series of 8 KB pages inside the data file. If multiple data files exist within a filegroup, SQL Server allocates pages to all data files based on a round-robin mechanism. So if we insert data into a table, SQL Server allocates pages first to data file 1, then allocates to data file 2, and so on, then back to data file 1 again. SQL Server achieves this by an algorithm known as Proportional Fill.

The proportional fill algorithm is used when allocating pages, so all data files allocate space around the same time. This algorithm determines the amount of information that should be written to each of the data files in a multi-file filegroup based on the proportion of free space within each file, which allows the files to become full at approximately the same time. Proportional fill works based on the free space within a file.

Analyzing How SQL Server Data is Stored

Step 1: First we will create a database named "Manvendra" with three data files (1 primary and 2 secondary data files) and one log file by running the below T-SQL code. You can change the name of the database, file path, file names, size and file growth according to your needs.

CREATE DATABASE [Manvendra]
 CONTAINMENT = NONE
 ON  PRIMARY
( NAME = N'Manvendra', FILENAME = N'C:\MSSQL\DATA\Manvendra.mdf',SIZE = 5MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB ),
( NAME = N'Manvendra_1', FILENAME = N'C:\MSSQL\DATA\Manvendra_1.ndf',SIZE = 5MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB ),
( NAME = N'Manvendra_2', FILENAME = N'C:\MSSQL\DATA\Manvendra_2.ndf' ,SIZE = 5MB , MAXSIZE = UNLIMITED, FILEGROWTH = 10MB )
 LOG ON
( NAME = N'Manvendra_log', FILENAME = N'C:\MSSQL\DATA\Manvendra_log.ldf',SIZE = 10MB , MAXSIZE = 1GB , FILEGROWTH = 10%)
GO

Step 2: Now we can check the available free space in each data file of this database to track the sequence of page allocations to the data files. There are multiple ways to check such information and below is one option. Run the below command to check free space in each data file.

USE Manvendra
GO
Select DB_NAME() AS [DatabaseName], Name, file_id, physical_name,
    (size * 8.0/1024) as Size,
    ((size * 8.0/1024) - (FILEPROPERTY(name, 'SpaceUsed') * 8.0/1024)) As FreeSpace
    From sys.database_files

You can see the data file names, file IDs, physical name, total size and available free space in each of the database files.

data file free spaces post SQL Server database creation

We can also check how many Extents are allocated for this database. We will run the below DBCC command to get this information. Although this is undocumented DBCC command this can be very useful information.

USE Manvendra
GO
DBCC showfilestats

With this command we can see the number of Extents for each data file. As you may know, the size of each data page is 8KB and eight continuous pages equals one extent, so the size of an extent would be approximately 64KB. We created each data file with a size of 5 MB, so the total number of available extents would be 80 which is shown in column TotalExtents, we can get this by (5*1024)/64.

UsedExtents is the number of extents allocated with data. As I mentioned above, the primary data file includes system information about the database, so this is why this file has a higher number of UsedExtents.

used extents post SQL Server database creation

Step 3: The next step is to create a table in which we will insert data. Run the below command to create a table. Once the table is created we will run both commands again which we ran in step 2 to get the details of free space and used/allocated extents.

USE Manvendra;
GO
CREATE TABLE [Test_Data] (
    [Sr.No] INT IDENTITY,
    [Date] DATETIME DEFAULT GETDATE (),
    [City] CHAR (25) DEFAULT 'Bangalore',
 [Name] CHAR (25) DEFAULT 'Manvendra Deo Singh');

Step 4: Check the allocated pages and free space available in each data file by running same commands from step 2.

USE Manvendra
Go
Select DB_NAME() AS [DatabaseName], Name, file_id, physical_name,
    (size * 8.0/1024) as Size,
    ((size * 8.0/1024) - (FILEPROPERTY(name, 'SpaceUsed') * 8.0/1024)) As FreeSpace
    From sys.database_files

You can see that there is no difference between this screenshot and the above screenshot except for a little difference in the FreeSpace for the transaction log file.

SQL Server databas file space post table creation

Now run the below DBCC command to check the allocated pages for each data file.

DBCC showfilestats

You can see the allocated pages of each data files has not changed.

used extents post SQL Server table creation

Step 5: Now we will insert some data into this table to fill each of the data files. Run the below command to insert 10,000 rows to table Test_Data.

USE Manvendra
go
INSERT INTO Test_DATA DEFAULT VALUES;
GO 10000

Step 6: Once data is inserted we will check the available free space in each data file and the total allocated pages of each data file.

USE Manvendra
Go
Select DB_NAME() AS [DatabaseName], Name, file_id, physical_name,
    (size * 8.0/1024) as Size,
    ((size * 8.0/1024) - (FILEPROPERTY(name, 'SpaceUsed') * 8.0/1024)) As FreeSpace
    From sys.database_files

You can see the difference between the screenshot below and the above screenshot. Free space in each data file has been reduced and the same amount of space has been allocated from both of the secondary data files, because both files have the same amount of free space and proportional fill works based on the free space within a file.

SQL Server Database File space post data insert

Now run below DBCC command to check the allocated pages for each data files.

DBCC showfilestats

You can see a few more pages have been allocated for each data file. Now the primary data file has 41 extents and the secondary data files have a total of 10 extents, so total data saved so far is 51 extents. Both secondary data files have the same number of extents allocated which proves the proportional fill algorithm.

used extents post SQL Server data insert

Step 7: We can also see where data is stored for table "Test_Data" for each data file by running the below DBCC command. This will let us know that data is stored on all data files.

DBCC IND ('Manvendra', 'Test_data', -1);

I attached two screenshots because the number of rows was very large to show all data file IDs where data has been stored. File IDs are shown in each screenshot, so we can see each data page and their respective file ID. From this we can say that table Test_data is saved on all three data files as shown in the following screenshots.

SQL Server data table saved on which data files


Data table saved on particular SQL Server data files

Step 8: We will repeat the same exercise again to check space allocation for each data file. Insert an additional 10,000 rows to the same table Test_Data to check and validate the page allocation for each data file. Run the same command which we ran in step 5 to insert 10,000 more rows to the table test_data. Once the rows have been inserted, check the free space and allocated extents for each data file.

USE Manvendra
GO
INSERT INTO Test_DATA DEFAULT VALUES;
GO 10000
Select DB_NAME() AS [DatabaseName], Name, file_id, physical_name,
    (size * 8.0/1024) as Size,
    ((size * 8.0/1024) - (FILEPROPERTY(name, 'SpaceUsed') * 8.0/1024)) As FreeSpace
    From sys.database_files

We can see again both secondary data files have the same amount of free space and similar amount of space has been allocated from the primary data file as well. This means SQL Server uses a proportional fill algorithm to fill data in to the data files.

SQL Server database file space post data insert

We can get the extent information again for the data files.

DBCC showfilestats

Again we can see in increase in the UsedExtents for all three of the data files.

used extents post SQL Server data insert
Next Steps
  • Create a test database and follow these steps, so you can better understand how SQL Server stores data at a physical and logical level.
  • Explore more knowledge with SQL Server Database Administration Tips


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Manvendra Singh Manvendra Singh has over 5 years of experience with SQL Server and has focused on Database Mirroring, Replication, Log Shipping, etc.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Thursday, April 14, 2022 - 7:40:31 AM - Jakub Back To Top (90002)
Awesome explanation!

Friday, May 28, 2021 - 10:10:41 AM - Dal Bharath Back To Top (88758)
Excellent summary.

Thank you

Wednesday, July 1, 2020 - 10:32:15 AM - vkp Back To Top (86075)

This article is informative and well explained. Thank you!


Monday, May 20, 2019 - 3:19:26 AM - Shreyas Moolya Back To Top (80129)

Thanks Manvendra for this post. I was new to MS SQL server 2017 and this helped me visualize how data gets stored inside the database.

Also, it would be nice if you could understand and post an article on how python as a service interacts with the MySQL server as there are very less good posts to find good knowledge like the above.


Friday, March 8, 2019 - 6:57:29 PM - Kavish Back To Top (79224)

I have a VLDB (2TB) with 3 primary files. There is a one time activity that will update about 80 percent records in all of the tables.

My question is?

If I add 2 more files to the database before the update activity, will the update write the data to the new files i.e. Will data be taken out of the 3 files and use the proportional fill algorithm to spread the data across all five files

OR

The update only update the exisiting page and the new files will only be used if the updated data is more than the existing data and it needs new pages to write to?


Sunday, March 4, 2018 - 4:55:28 AM - Trung Back To Top (75345)

 Thanks 

 


Friday, January 26, 2018 - 9:10:46 PM - Melc Back To Top (75049)

 

 Hi,

Just want to ask if you have 2tb database, then you add secondary file, will the size of the mdf will not grow anymore hence the ndf? I have a dillema right now because of our large db size.


Thursday, October 12, 2017 - 8:18:00 AM - Supercharged Back To Top (67236)

Next to database performance benefits another important reason may be valid for your environment.

From recoverability point of view you may not prefer one large MDF-file. Let's say you have a 2TB database all fysically located in one MDF-fie. Recovering this file (from your backup) may slow down your whole recovery process for following 2 reasons:

1. When file restore fails, you have to start all over again with this large file.

2. No parallellism is used by SQL Server during restore operation having just one file. Having multiple files within your filegroup will enables parallell file restore by defaut enhancing you whole restore job.

Just my 2cents, hope it helps. 


Thursday, May 25, 2017 - 4:27:09 PM - Eric in Sacramento Back To Top (56062)

Very simple, helpful, and informative, thank you!


Thursday, September 1, 2016 - 12:26:25 PM - Sepideh Back To Top (43249)

 

 Just to let you know this top helped, Thank you!

 


Monday, July 11, 2016 - 9:14:19 PM - Tom Back To Top (41864)

Thanks Greg and Manvendra.   That makes sense.


Monday, July 11, 2016 - 8:46:59 AM - Greg Robidoux Back To Top (41860)

Hi Tom,

to add to Manvendra's response, here are some reasons why you would create additional data files:

  • the current disk is out of space, so you create a new data file on a different disk drive
  • performance reasons, if you put the files on different drives you might be able to get an IO performance improvement

-Greg


Sunday, July 10, 2016 - 10:44:56 AM - Manvendra Back To Top (41856)

 

 Hi Tom,

 

As I mentioned in "Solution Section" about secondary files in above tip. One of the main reason to have multiple data files are as given "Secondary data files are optional and can be used to spread data across multiple files/disks by putting each file on a different disk drive"


Friday, July 8, 2016 - 8:52:15 AM - Tom Back To Top (41836)

I am not a DBA and I am relatively new to the concepts of how data is stored.  Can you give a few real-life examples of why one might want to have more than one file (.NDF)? 

 















get free sql tips
agree to terms