Creating a table with horizontal partitioning in SQL Server

By:   |   Comments (9)   |   Related: 1 | 2 | 3 | 4 | > Partitioning


Problem

Sales data in my production database has increased significantly. I am planning to archive this data apart from the production database, so this can still be used for reporting purposes. I am conscious about performance issues while inserting new data to these archive tables and also for report response time. During the planning phase I found out about the new partitioning feature in SQL Server 2005 and later editions of SQL Server. It is a vast topic with several associated operations and configurations involved. In this tip I will share what I learned about partitioning, show an example to implement horizontal table partitioning along with a comprehensive list of related benefits and considerations.

Solution

In SQL Server 2000 and prior there was option for partitioned views, but this had several limitations and there were several things to consider before implementing. With SQL Server 2005 and onwards we now have an option to horizontally partition a table with up to 1000 partitions and the data placement is handled automatically by SQL Server. Horizontal partitioning is the process of dividing the rows of a table in a given number of partitions. The number of columns is the same in each partition.

Although you can have multiple partitions of a horizontally partitioned table on just one filegroup it makes more sense to break these out into separate filegroups, so you can also get some performance benefits. Some of the benefits include an I/O performance boost, because all partitions can reside on different disks. Another reason is the advantage of being able to separately backup a partition through filegroup backups. Also, the SQL Server database engine intelligently determines the partition to be accessed for certain data. And if more than one partition is to be accessed the database engine may use multiple processors in parallel for data retrieval. Such design aspects are very important to get the full advantages of table partitioning.

Before running this script, create the following folders for this example. You can use different folders, but you will need to adjust the scripts below accordingly.

  • D:\PartitionDB\FG1
  • D:\PartitionDB\FG2
  • D:\PartitionDB\FG3

Now run following script to create a new database with three data files on three filegroups.

--Script # 1: Create a table with two data files

USE Master
GO
 
CREATE DATABASE DBForPartitioning
ON PRIMARY (NAME='DBForPartitioning_1',
   FILENAME= 'D:\PartitionDB\FG1\DBForPartitioning_1.mdf',
   SIZE=2,
   MAXSIZE=100,
   FILEGROWTH=1 ),
FILEGROUP FG2 (NAME = 'DBForPartitioning_2',
   FILENAME = 'D:\PartitionDB\FG2\DBForPartitioning_2.ndf',
   SIZE = 2,
   MAXSIZE=100,
   FILEGROWTH=1 ),
FILEGROUP FG3 (NAME = 'DBForPartitioning_3',
   FILENAME = 'D:\PartitionDB\FG3\DBForPartitioning_3.ndf',
   SIZE = 2,
   MAXSIZE=100,
   FILEGROWTH=1 )
GO

Now we have a database DBForPartitioning with three data files created in three file groups. This may be confirmed using the following script.

--Script # 2: Confirm number of file groups and files in DBForPartitioning

Use DBFOrPartitioning
GO 
-- Confirm Filegroups
SELECT name as [File Group Name]
FROM sys.filegroups
WHERE type = 'FG' 
GO 

-- Confirm Datafiles
SELECT name as [DB File Name],physical_name as [DB File Path] 
FROM sys.database_files
WHERE type_desc = 'ROWS'
GO
FilegroupsTable

Broad plan

In a broad plan there are three major steps for implementing horizontal partitioning in SQL Server

  1. Create a partitioning function. It will have criteria to partition the data in the partitions.
  2. Create partition schemes to map the created partition function to file groups. It is related to the physical storage of data on disk.
  3. Create the table by linking it to the partition scheme and also to the partition function. A partition column will be used for this purpose.

At this point before implementation of these three steps we should have a clear idea about the structure of the table that we are going to create with horizontal partitioning. As mentioned earlier we have to implement horizontal partitioning on a table where sales data will be archived. For simplicity we will assume that the structure of our partitioned archival table will be SalesArchival (saleTime dateTime, item varchar(50)). The column on which data will be referred to partitions is called the partition column and it will be used in the partition function as the partition key. The partition column is important and should have the following conditions fulfilled:

  • Partition column is always a single column or computed column or persisted computed column by combining more than one columns.
  • Any data type that may be used for index key is eligible for partition column except TIMESTAMP data type.

Create partition function

In our example we have sales data for 2007, 2008 and 2009. So here we will create three partitions by providing two partition ranges in our partitioning function. The function will create data boundaries. In our case we are required to place all the data before 2009 in the first partition, data for 2009 will be placed in the second partition and data after 2009 will be placed in the third partition.

--Script # 3: Create partition function

Use DBForPartitioning
GO 

CREATE PARTITION FUNCTION salesYearPartitions (datetime)
AS RANGE RIGHT FOR VALUES ( '2009-01-01', '2010-01-01')
GO

The partition key provided in the function is of type that will be the primary key in our partitioned table and partition ranges are based on this column. In our table this column is saleTime with a data type of DATETIME. The range defined may be RIGHT or LEFT. Here we have used a range RIGHT. As a rule of thumb ranges can be translated as

  • RIGHT means < or >=
  • LEFT can be translated as <= and >.

In our case we have used RIGHT with following values.

Range RIGHT translation
Record with saleTime Destination partition
< 2009-01-01 DBForPartitioning_1
>=2009-01-01 and < 2010-01-01 DBForPartitioning_2
>=2010-01-01 DBForPartitioning_3

If we had used range LEFT then partitioning criteria would be as follows:

Range LEFT translation
Record with saleTime Destination partition
<= 2009-01-01 DBForPartitioning_1
>2009-01-01 and <= 2010-01-01 DBForPartitioning_2
> 2010-01-01 DBForPartitioning_3

As you can see from these two examples there is not a big difference on where the data would reside when using date, but it could be significant based on some other data that may be used.

Create partition scheme

To get optimized file structure, we have already created three file groups for this database and a partition function is created with three partitions defined with certain criteria. Now it is time to link file groups and partition functions. We have to define the physical storage of records partitioned on the basis of ranges defined in the partition function. In our design each partition will go to separate file group. This approach will also allow us achieve benefits of performance for data load or delete operations along with utilization of filegroup backups.

--Script # 4: Create partitioning scheme

Use DBForPartitioning
GO 

CREATE PARTITION SCHEME Test_PartitionScheme
AS PARTITION salesYearPartitions
TO ([PRIMARY], FG2, FG3 )
GO

Mapping partitions to filegroups is flexible. Multiple partitions may exist on a single filegroup and single partition may be allotted to a single filegroup.

Create partitioned table

Now we may create a horizontal partitioned table by using the partitioned scheme and partition column. And for performance optimization we will make the SaleTime column the primary key hence a cluster index for this table.

--Script # 5: Create table with horizontal partitioning

Use DBFOrPartitioning
GO 

CREATE TABLE SalesArchival
  (SaleTime datetime PRIMARY KEY,
   ItemName varchar(50))
ON Test_PartitionScheme (SaleTime);
GO

Insert data to check the partitioned table functionality

Now it is time to insert data to verify our required functionality for partitioned table.

--Script # 6: Insert sample in horizontally partitioned table

Use DBFOrPartitioning
GO

INSERT INTO SalesArchival (SaleTime, ItemName)
SELECT '2007-03-25','Item1' UNION ALL
SELECT '2008-10-01','Item2' UNION ALL
SELECT '2009-01-01','Item1' UNION ALL
SELECT '2009-08-09','Item3' UNION ALL
SELECT '2009-12-30','Item2' UNION ALL
SELECT '2010-01-01','Item1' UNION ALL
SELECT '2010-05-24','Item3'
GO

Lastly , we can verify the rows in the different partitions

--Script # 7: Verify data distribution in horizontally partitioned table

Use DBFOrPartitioning
GO 

SELECT partition_id, index_id, partition_number, Rows 
FROM sys.partitions
WHERE OBJECT_NAME(OBJECT_ID)='SalesArchival'
GO
RowsTable

We have created and verified a horizontal partitioned table for archival. Now data may be loaded in this table by any means.

There are some further considerations that are necessary to keep in mind while implementing horizontal partitioning:.

Considerations while planning

  • In a clustered table, partition column should be part of primary key or clustered key.
  • By default, indexes created on a partitioned table will also use the same partitioning scheme and partitioning column that is being used by the table
  • If data in a partition is not required to be modified that partition may be marked READ ONLY
  • Entire table will be locked during an index rebuild operation so you can not rebuild indexes on a single partition with the ONLINE option.
  • If you ever require to change partition key then you will be required to recreate the table, reload the data and rebuild the indexes.
  • Partition column and partition key both should match in terms of data type, length and precision.
  • Only available in Enterprise and Developer editions
  • All partitions must reside in the same database
  • You can rebuild indexes based on a partition instead of rebuilding the entire index.

Some enhancements for SQL Server 2008

  • Data compression can be implemented on specified or all partitions.
  • When appropriate use the date data type for a partition column and partition key which can cut down the storage needs and improve performance..
  • Lock settings may be implemented at partition level rather than at table level.
Next Steps
  • CREATE PARTITION FUNCTION to read further about creating PARTITION FUNCTION
  • CREATE PARTITION SCHEME to read further about defining PARTITION SCHEMES
  • Instead of backing up whole large table, you may backup only the partition that gets updated after a data load session
  • Try to span the partitions on multiple physical disks through multiple filegroups.
  • Refer to these other tips about partitioning


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Atif Shehzad Atif Shehzad is a passionate SQL Server DBA, technical reviewer and article author.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, January 9, 2018 - 12:28:35 PM - Kannadhasan G Back To Top (74927)

How to reduce table partition timing for existing table in SQL Server.

I have a problem with table partition, I am working in a very big database, It is around 2 terabyte. So we planned to split the large tables into different partitions at last year. On that time we were divided a table into two partitons. One partition was located in primary file group and another one was located in secondary file group named as PartitionFG1. So we got lots of benefits from this activity like index maintenance and performance.

And then we have planned to create another one partition for the same table for this upcoming year records. So we have created a new secondary file group and we have altered the partition scheme to use new secondary file group named as PartitionFG2. So we have alter the partition function to split the range, here is the problem. Here the new range value split and moves the data into new partition but not into the new file group. For example table’s max PK value is 1000, so I used split range value as 1000. So it should not be taken any time to move data to new partition, because new partition will contains 0 records only.

Now I am going describe the scenario by sequence.

1. Table Creation

CREATE TABLE Tbl_3rdParatition(PK NUMERIC(18,0) identity(1,1),Line Varchar(100))

CREATE CLUSTERED INDEX CidxTbl_3rdParatition ON Tbl_3rdParatition (pk ASC)

2. Insert record into table

DECLARE @I INT

DECLARE @CNT INT

SET @I=1

SET @CNT = 1000

WHILE (@I< = @CNT)

BEGIN

INSERT INTO Tbl_3rdParatition (Line) VALUES ('Primary')

SET @I=@I+1

END

Now we have inserted 1000 records into the table.

We can verify the table’s row and partition by below query.

SELECT p.partition_number, fg.name, p.rows

FROM sys.partitions p

INNER JOIN sys.allocation_units au ON au.container_id = p.hobt_id

INNER JOIN sys.filegroups fg ON fg.data_space_id = au.data_space_id

WHERE p.object_id = OBJECT_ID('Tbl_3rdParatition')

WHERE p.object_id = OBJECT_ID('Tbl_3rdParatition')

So it ensures primary file group contains all records of the table.

Now I am going to split the table into 2 partitions.

CREATE PARTITION FUNCTION PF_Tbl_3rdParatition(NUMERIC(18,0)) AS RANGE LEFT

FOR VALUES(500);

CREATE PARTITION SCHEME PS_Tbl_3rdParatition AS PARTITION PF_Tbl_3rdParatition TO ([PRIMARY],[SWPPartitionFG1])

CREATE UNIQUE CLUSTERED INDEX CidxTbl_3rdParatition ON dbo.Tbl_3rdParatition(PK) WITH(DROP_EXISTING = ON)ON PS_Tbl_3rdParatition(PK) ;

Now it is divided into 2 partitions. It took some time to move 500 records to SWPPartitionFG1.

Now I am going to create a new partition on another file group.

ALTER PARTITION SCHEME PS_Tbl_3rdParatition

NEXT USED [SWPPartitionFG2]

ALTER PARTITION FUNCTION PF_Tbl_3rdParatition()

SPLIT RANGE (1000)

ALTER INDEX [CidxTbl_3rdParatition] ON [dbo].[Tbl_3rdParatition ] REBUILD WITH (SORT_IN_TEMPDB = OFF, ONLINE = ON, MAXDOP = 1)

As of above query, 3rd partition should have 0 records. That is correct. But 3rd partition should be stored in SWPPartitionFG2 correct? But second partition’s data fully moved into SWPPartitionFG2. And the 3rd partition is allocated in SWPPartitionFG1 it is wrong!. So it takes too much of time to transfer the data from FG1 to FG2.

I desired to store the 3rd partition data in third file group (SWPPartitionFG2). Then only it will not take large time in partitioning process. For this reason we want lots of time to create a new partition. Our client will not give that much of down time for us.

Our Actual table size is 300 GB. 2nd partitions SWPPartitionFG1 File group contains 200 GB of data. So it requires 2:30 hrs time to move the data From FG1 to FG2. Please help me reduce the time in this activity.


Tuesday, October 28, 2014 - 5:23:25 AM - Atif Shehzad Back To Top (35094)

@Hany. Yes we can implement partitioning for an existing table. You can study following tip about it.

http://www.mssqltips.com/sqlservertip/2888/how-to-partition-an-existing-sql-server-table/


Tuesday, October 28, 2014 - 5:21:41 AM - Atif Shehzad Back To Top (35093)

@Hany. In that case you have to modify the primary key and add the required partitioning column in it.


Monday, October 27, 2014 - 7:17:13 AM - Hany Helmy Back To Top (35073)

And what if the Date column is not part of the primary key, then how could we achieve partitioning?


Monday, October 27, 2014 - 4:26:12 AM - Hany Helmy Back To Top (35069)

Hi Atif,

Thanx 4 the article, very informative, I have a question: can we implement Partitioning on existing tables in the database (based on the Date column) not new table? and how?

Hany


Wednesday, August 27, 2014 - 9:51:40 AM - Kaushik Mitra Back To Top (34282)

Superb knowledge shairing article!!! Thanks for this wonderful knowledge item ....

 

Regards

Kaushik


Tuesday, April 10, 2012 - 1:03:51 AM - Ali Back To Top (16829)

thanks for your good article,

as i see, you create partition files in one partition disk, so is that possible we get more performance or our query get faster because of this?

is that possible sqlserver use multiple prossesor for retrive data?

is we get better benefit if we have one file or multiple file for keeping data in one physical disk?

thancks


Tuesday, November 24, 2009 - 2:21:14 AM - @tif Back To Top (4492)

 Thanks for appreciation. Although it is not necessary to partition on different files/filegroup, but as in meintioned following reasons in article that make it suitable and recommended to use different file groups

"Although you can have multiple partitions of a horizontally partitioned table on just one filegroup it makes more sense to break these out into separate filegroups, so you can also get some performance benefits.  Some of the benefits include an I/O performance boost, because all partitions can reside on different disks. Another reason is the advantage of being able to separately backup a partition through filegroup backups. Also, the SQL Server database engine intelligently determines the partition to be accessed for certain data. And if more than one partition is to be accessed the database engine may use multiple processors in parallel for data retrieval. "

Hope that will serve the purpose

Thanks


Monday, November 23, 2009 - 4:57:51 PM - Repriser2009 Back To Top (4489)

Thanks for the excellent article. I have a question. Do we have to partition on different files (possibly in different file group)? I have seen Chad post a partition article that was using the default (Primary) without additional file?

What is the performance results of with or without?















get free sql tips
agree to terms