Problem Partitioning, introduced with SQL 2005 Enterprise Edition, is a godsend for working with very large tables. It provides the means to effectively manage and scale your data at a time when tables are growing exponentially, but maintenance windows are either shrinking or non-existent as with many 24/7 online environments. In this tip I will cover some data partitioning myths and truths you should know about.
Solution Having worked with large partitioned databases for a few years, I'd like to share some of my lessons learned. Its important to understand what partitioning is and also what it is not.
First off, I must stress that table and index partitioning is an Enterprise feature in SQL 2005 and remains an Enterprise Feature in SQL 2008. There is no getting around this. Having said that I have found that partitioning can be the most compelling motivating factor for shops to go with Enterprise Edition over Standard. The headaches, maintenance outages, weekend work et al associated with maintaining Very Large Databases can often be eliminated or minimized with a well thought out partitioning scheme. If you have VLDBs then Enterprise Edition is a no brainer as it will also give you these additional features to complement your partitioning - Asynchronous Database Mirroring, Online Index Rebuilds, backup compression (SQL 2008).
All the Enterprise features are also in Developer edition, so installing Developer edition on your workstation is a good way to test and play with Enterprise features. Often features that are Enterprise now could be in Standard down the road so you need to keep up to speed as a SQL Server professional.
You can't restore a database with partitioning to a non-Enterprise Edition instance. Important to bear this in mind. Any partition functions and schemes would need to be removed prior to backing up the database and restoring it to a non Enterprise edition instance.
Processed 208 pages for database 'myDB', file 'myDB' on file 1.
Processed 4 pages for database 'myDB', file 'myDB_log' on file 1.
Msg 3167, Level 16, State 1, Line 1
RESTORE could not start database 'myDB'.
Msg 3013, Level 16, State 1, Line 1
RESTORE DATABASE is terminating abnormally.
Msg 905, Level 21, State 1, Line 1
Database 'myDB' cannot be started in this edition of SQL Server because it contains a partition function 'myRangePF1'. Only Enterprise edition of SQL Server supports partitioning.
Partitioned Views offer similar properties to partitioned tables, but do not require Enterprise Edition. See here for an overview on partitioned views. Partitioned views was a surprisingly effective but complicated way of 'partitioning' data in SQL 2000 and still works just as well in SQL 2005 and 2008.
Myth 1: Partitioning is a "Scale-Out" solution
Partitions cannot span servers or instances. Partitions have to be in the same instance and in the same database. Partitioning therefore is a scale-up solution. A scale-out solution for SQL Server can be implemented through distributed partitioned views hosted on Federated Database Servers. This is an advanced solution that I have yet to encounter in my career (I would like to hear from anyone who has actually implemented this in a production environment).
Myth 2: Partitions must be created on different filegroups
The partition scheme definition defines on which filegroup a partition resides. It's a common misconception that you have to spread your partitions out among multiple filegroups. Most code examples seem to use multiple filegroups and I think this is where this myth stems from. The only reason, in my opinion, that you would want multiple filegroups is if those filegroups reside on physically separate drives and you were doing this to improve your I/O for large range queries. I have not seen any performance benefit to having multiple filegroups located on the same drive.
Only split your partitions among multiple filegroups if you have a compelling reason to do so. That reason should be increased performance or manageability.
Myth 3: To partition a non-partitioned table you will need to drop and recreate it
Not true. You can partition an existing table by creating a clustered index (or rebuilding an existing clustered index) on your new Partition Scheme. This will effectively partition your data as the leaf level of a clustered index is essentially the data. The example below creates a partitioned clustered index on a partition scheme PScheme_Day.
CREATE CLUSTERED INDEX idx ON tblPartitioned(SQLCreated)
Myth 4: Partitioning an existing table is a strictly offline operation
It's true that rebuilding or creating a clustered index is indeed an offline operation. Your table will not be available for querying during this operation. However, partitioning is an Enterprise feature, so we have the online index rebuild feature available to use. The ONLINE = ON option allows us to still query the table while under the covers the partitioning operation is going on. SQL Server does this by using an internal snapshot of the data. Obviously, there is a performance hit and I don't recommend you do this during a busy time but if you have a requirement for 24x7 availability then this is a possible solution.
CREATE CLUSTERED INDEX idx ON tblPartitioned(SQLCreated)
WITH (DROP_EXISTING = ON, ONLINE = ON)
Myth 5: SWITCH'ing partitions OUT or IN in only a few seconds
You'll often read that that the reason partitioning operations like SWITCH IN and SWITCH OUT are so fast and efficient is that they are "metadata" only operations meaning that there is no actual data movement but only the pointers or internal references to the data get changed.
SWITCH'ing partitions OUT or IN is a truly "meta data" operation in that although the partitioned data has magically moved from the partitioned table to the SWITCH table or vice versa there hasn't actually been any movement of data on the disk or inside the data file. However, in my experience, I have found that with highly transactional partitioned tables the SWITCH operation can get blocked or cause blocking itself. This is due to the ALTER TABLE...SWITCH operation requiring a schema modify lock on both the source and target tables.
To get around this potential blocking issue I always set a timeout before any switch operation so that if the ALTER TABLE...SWITCH does start to cause blocking it will only be for as long as the timeout. I will then re-issue the ALTER TABLE...SWITCH statement repeatedly with the timeout until the operation goes through successfully. In this way the partition maintenance task will not interfere with the OLTP function of the database:
In the example below I set a 30 second timeout for the SWITCH to complete. If it timeouts I will repeat until it completes successfully.
SET LOCK_TIMEOUT 30000
ALTER TABLE myPartionedTable SWITCH PARTITION 2 TO SwitchTable
Msg 1222, Sev 16, State 56, Line 1 : Lock request time out period exceeded.
Myth 6: Altering a partition function is a metadata only operation
In practice you may find out that a MERGE or SPLIT operation may take much much longer than the few seconds expected. Altering a partition function is an offline operation and can also result in movement of data on disk and so become extremely resource intensive.
As long as the 2 partitions you are MERGEing together are empty then there will be no data movement and it will be a metadata only operation. If you have a sliding window and you are MERGEing the oldest 2 partitions together then a rule to follow is to always make sure both partitions are empty (SWITCHed OUT) before you MERGE.
To avoid or minimize data movement when SPLITing a partition function always ensure that you know beforehand how many rows are in the underlying partition that is being SPLIT and how many rows would fall on each side of the new boundary. Armed with this information you should be able to determine beforehand the amount of data movement to expect depending on whether you have defined your partition function using RANGE RIGHT or LEFT.
Be aware that more than one table or index can reside on the same partition scheme or use the same partition function. When you run ALTER PARTITION FUNCTION it will affect all of these tables and indexes in a single transaction.
If your query is written in such a way that it can read only the partitions it needs the data from then you will get partition elimination and therefore an equivalent performance improvement. If your query does not join or filter on the partition key then there will be no improvement in performance over an unpartitioned table i.e. no partition elimination. In fact, a query that hits a partitioned table has the potential to be even slower than than an unpartitioned table even if both tables have the same index defined. This is due to the fact that each partition in a partitioned table is actually its own b-tree which means that a partitioned index seek will need to do one seek per partition as opposed to one seek per table for an unpartitioned index seek.
Both tables below have a nonclustered index defined on the commodity column. The index on table tblPartitioned has been created on the partition scheme to align it with the clustered index but as we can see from the below this causes each partition to be scanned because the partition key is actually on a different column.
The graphical query plan clearly shows the difference in cost between the two queries. The top partitioned table query
If we also filter on the partition key column we get the desired performance improvement over the equivalent query on the unpartitioned table. If you are partitioning tables in an existing database then its a good idea to test all your queries for performance. Don't assume that once the data is partitioned that your queries will improve. Queries may need to be updated to include the new partition key in the where clause or join to take advantage of the performance benefits of partitioning.
Myth 8: Partitioned tables ease maintenance of VLDBs and Very Large Tables
Partitioning can be effectively used to break up a very large table into multiple partitions based on a column or partitioning key. However, there can be significant management overhead in maintaining partitioned tables. The features of partitioned tables that give you the ease of management can also conspire against you to make your maintenance a nightmare!
Thinking of each partition as its own table will help you understand how best to approach your maintenance. Index rebuilds can be quicker because you can rebuild just those partitions that you identify as being fragmented. Conversely, full partitioned Index rebuilds will take longer as each partition is its own b-tree.
Note that old style DBCC SHOWCONTIG command does not work for partitioned indexes. You need to familiarize yourself with the more powerful DMV sys.dm_db_index_physical_stats. This DMV returns fragmentation information at a partition level and provides you the necessary information to make a decision on which partitions require rebuilding or reorganizing.
-- to return fragmentation information on partitioned indexes
object_name(a.object_id) AS object_name,
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED') a
JOIN sys.indexes b on a.object_id = b.object_id and a.index_id = b.index_id
order by object_name(a.object_id), a.index_id, b.name, b.type_desc, a.partition_number
Once fragmentation has been identified we can rebuild the index statement for just that single partition:
--Rebuild only partition 11. ALTER INDEX IX_alert_events_timestamp ON dbo.alert_events REBUILD Partition = 11; GO
If you're partitioning your data on a date field and you have a sliding window and you are only ever adding data to the right most partitions then you can save unnecessary overhead by not checking for fragmentation of older partitions where you already know they are not fragmented. sys.dm_db_index_physical_stats DMV allows you to specify a partition number. This greatly improves IO over full index scans and has the potential to shorten maintenance job durations.
Well designed database partitioning gives the DBA the opportunity to effectively work with massive amounts of data while at the same time providing the same level of performance as a small database. Having a sound partition management and maintenance plan will free up valuable DBA time to concentrate on more challenging problems and opportunities.
I encourage you to read the chapters in Books On Line on Partitioning
This tip from Atif provides good information on partitioning
Alan, great article. Thanks for posting. There aren't many articles out there from folks who have partitioned existing, large tables, so I appreciate your effort...
At my company, we are prototyping methods for partitioning existing tables in our OLTP database that are clustered on PK, and your recommendation to use CREATE UNIQUE CLUSTERED INDEX idx ... WITH (DROP_EXISTING =ON) confirms the results of our performance testing which suggests this technique is several times faster than using "DROP CONSTRAINT PK WITH MOVE TO..." (as recommended by the MS SQL Server 2008 partitioning white paper) or "DROP / ADD CONSTRAINT", which is the code the SQL Server 2008 "Partition Wizard" generates.
Nevertheless, it takes at least 20 minutes for us to partition a 17GB table, even though we have created a partition function that results in all the data staying on partition 1, and no data in partition 2. So we are puzzled why SQL Server has to touch all (or most of) the leaf data pages of the clustered index. If they are already associated with partition 1, why all the data movement going on? Do you have any insights on this? Is there a "meta-data only" method for telling SQL Server that the existing clustered index should belong to partition 1 of the table?
Essentially, we are looking for ways to reduce/eliminate data movement while partitioning an existing table that is clustered on PK, primarily because this table resides in an OLTP database and we don't have 20 minutes to partition a single table.
Any ideas you or the others on this thread have is greatly appreciated.
Did you perform tests regarding multiple tables using the same PF/PS or having a single PF/PS for each table?
I'm designing a set of 4 tables which are partitioned on a datetime column and are mapped to a filegroup per month. Those 4 tables have the same life cycle, so it seemed logical to me to have them use the same PS.
Every month we alter the PS to accept the next month of data. in a try catch (t-sql) block we first try the alter partition function. If that doesn't succeed the catch block will jump in. In that catch block we change database to master, set the userdatabase to single-user, and the run the alter partition function sql once more. After that, the database is being set multi-user.
Would it be enough to just select those for tables with tablokx ? (to avoid having to put the db in single-user mode and kick out everybody else)
Would there be a performance (or any other) gain by giving each table its own PF/PS?
With regard to a "metadata-only" trick to fool SQL Server into keeping all the data in partition 1, we spoke with a MS SQL Server expert and, just as we suspected, he said there are only 3 ways to partition an existing table that is clustered on PK (as mentioned in my previous post). So, we're sticking with CREATE CLUSTERED INDEX WITH DROP_EXISTING = ON which provides us the best performance.
With regard to your question about rebuilding the clustered index online... I just finished perf testing ONLINE = ON and am very pleased with the results: although the time it takes to rebuild the clustered and non-clustered indexes doubles, there is *no* impact to SELECT queries executed before, during, or after the indexes are rebuilt. After digging into BOL and other articles, I learned this is because SQL Server takes a snapshot of the table prior to the index rebuild, and routes queries to the source and/or target as it bulk-copies data into the new index. There is only a brief lock on the table before and after the rebuild to change meta data, swap the source with the target, etc.
The other lesson I learned during our perf testing is that when you specify ONLINE = ON and SORT_IN_TEMPDB = ON, make sure you have enough space for the entire clustered index in TempDb, and a copy of the clustered and non-clustered indexes on the drive where your data files live. So if you have a lot of tables to partition, then you can only do a few at a time otherwise you'll fill up the log and/or tempdb.
With regard to measuring potential data movement when adding partitions to an existing, partitioned table (via ALTER PARTITION FUNCTION...SPLIT), I learned you can use the $PARTITION function which will tell you the number of rows that will be inserted/deleted. The lesson that I learned here during our perf testing is to always add an empty partition to the end of a partitioned table to ensure a metadata-only operation and no data movement. Then all new rows will be inserted into this new partition.
On the the other hand, when I measured the impact of splitting an existing partition containing 60-days of data into two partitions each containing 30-days of data, SQL Server took a schema lock on the entire table and inserted/deleted 1.6m rows which took 2.5 minutes on a 4-socket, hex-core server with 50GB of memory assigned to our SQL Server 2008 instance. For us, that's too long to lock out our application which is used by hospitals and needs to be up 24/7. And those inserts/deletes are logged as one transaction so be careful of log growth...
Thanks for your help,
Thursday, July 15, 2010 - 10:44:41 PM - --cranfield
In the Enterprise edition you also have a feature called ONLINE piecemeal restore, which allows you to restore a filegroup from backup, while the rest of the database is available to your users.
Here is an example: you have a table with 500 million rows, of which only 10% are used and the rest are never updated, just read. let's say you partition this table and set the partitions in the same filegroup. In case of a failure or data corruption - guess what - your entire filegroup will be unavailable. Your users will still be able to read and write, as long as the data is in cache, but if you would like to restore the filegroup, you wiould have to mark it as OFFLINE and then restore from backup (last full + differential + logs + tail of log + 'with reovery').
Obviously, if you have all eggs in the same basket and your filegroup has all 500 million rows, then it will take you significantly longer time to restore the filegroup as opposed to restoring only 50 million rows of important data and having the rest 450 million rows still available to the users.
Wednesday, May 11, 2011 - 10:29:07 AM - --cranfield
Thanks for the comments on multiple filegroups. Yes, you are correct - there are benefits to multiple filegroups. If correctly designed they can aid quick recovery and can also be used as a tool to improve backups time and keep down your tape backup costs. But its a lot of overhead to manage these filegroups. At the moment I'm looking at a solution to reduce our backup costs by putting some data into read_only filegroups. If consideration has not been given during the design phase then trying to implement this change once your database is in the terrabytes is a real pain.
Wednesday, May 11, 2011 - 12:19:47 PM - Feodor Georgiev
you are correct that good design is essential for avoiding the maintenance overhead of the filegroups. I am far from the thought that someone will have 50 partitioned tables in one database with 10 partitions each on different filegroup each. (This actually sounds funny :) )
Here is a post I wrote a while back on how to treat your data. If you look at 'Steinar's graph' you will notice that the data can be categorized by 4 types: current (used or not used often) and archive (used or not used). This can be used as a general guideline for creating filegroups, do you agree? And also for making decisions about placing the filegroups on proper storage units.
There is a 5th type of data, though, and maybe I should think about integrating it in the 'Steinar's graph'. More specifically the 5th type of data is something you use, but you dont really need: non-clustered indexes.
I hope you get my point: you can save a lot on backup solutions by not backing up the non-clustered indexes; I am also very disapointed that SQL Server still does not engage an adequate solution for this, i.e. you cannot backup your database without the non-clustered indexes. (Well, technically you can, but then you will run in all kinds of problems).
So, out of curiosity - which is your biggest problem with finding a good backup solution:
Sorry for the late reply. Thanks for the info. We've decided not to back up data that we can derive or put back together again from other data sources. For example we no longer back up replicas (replicated databases).
We dont have a problem finding a good backup solution. We have a backup department who do all that stuff for us and merely charge us a fee/mbyte.
What we really need to do more aggresively is take offline the data that is no longer current or needed. In the past I've used copy agents to move data data from expensive OLTP databases to less expensive, slower reporting databases. A second step is to BCP out the old data to flat files and restore only if needed...
Actually, once upon a time I was on the Quest Software SQL Server Customer Advisory Board and I told them that they needed to build into their LiteSpeed backup tool the ability to only backup the data and leave all the non-clustered indexes behind. they thought it was a great idea but nothing came of it....
Tuesday, May 31, 2011 - 12:15:09 PM - Feodor Georgiev
I am sincerely sorry that you have not succeeded in creating a way to avoid the non-clustered index pages during backup.
You are right about not backing up the replicated data and about 'dumping' some of the old data to differernt storage and different formats.
Today I read this amazing story on Paul Randal's blog about a failed backup and successful restore (http://sqlskills.com/BLOGS/PAUL/post/Unbelievable-tale-of-disaster-and-recovery.aspx).
I know we are getting a bit off topic here, but I am curious - is your backup team testing the backups, and are they charging per MB of tested backup? (well, the second question is a joke, of course :) )
Wednesday, June 01, 2011 - 1:44:12 AM - Alan Cranfield
thanks for your interest....We have a monthly automated tape backup restore drill. The backup team restores a selection of database dumps from tape to a test server and we do the restores and verifies. We also 'verify' all our db backups staright after the backups.
We are paranoid so in addition to backups we also use db mirroring onsite and log shipping offsite...
partition column and UNIQUE clustered index key column is different. that time i created partioned for table. but table–>properties–>storage doesn’t show partition. but if i use select * from sys.partitions where object_name(object_id)=’tablename’. it shows partioning except clustered index..what is the difference?..is this correct way to partition?
Another thing to add is that UPDATE STATISTICS operations can be split by partition in partitioned tables. The only workaround is to implement partitioned views. I have encountered this and I wish I knew about this prior to implementing partitioning.
I have appilcation which has on SQL Enterprise database. If I have partitioned few tables in my databse, will there be any effects of these partitioned tables on my running application? What all things I will need to test from the application point of view, if I have partitioned some table underlying it?
Monday, May 21, 2012 - 9:50:09 PM - Alan Cranfield
The underlying partitioning on a table is transparent to your application. You dont need to test the application. What you do need to test maintenanc scripts that are using ALTER the partition schemes and functions, if any..SWITCH IN/OU etc... Also, you may want to test that your SQL queries and SPs are using the partitioninh key in WHERE clauses and joins to take advantage of partition elinination.
Examine your original reason for using partitioning and then see if you have achieved what you set out by implementing this Enterprise feature.
I have this table currently clustered by it's own key. It's growing big and have most of the acceses by another field (a foreign key to another table) We were thinking in to soluctions:
1) Change the current clustered PK for a nonclustered one. Add a new clustered index on the FK field. Assign specific filegroups to the table: one for the data, one for the indexes. Add just one file for each phisical unit and assign them to the filegroups. We hope SQL Server to split the table somehow equaly (with time) in the files and so the load.
2) Change the clustered index to the FK and partition the table by this key. If most of the queries are including the key index, we shoul see and improvement. But: what happens with the PK? (We can´t include the partitioning key in every PK/UQ key? Will the PK nonclustered index be not aligned? Is it so terrible as I've read?
What would you recomend for this case?
Thanks in advance and keep on with the great job!
Monday, May 28, 2012 - 1:26:52 AM - Tharindu Dhaneenja
I have to disagree about multiple filegroups on the same disk, as long as the number is not too absurd, there are still advantages so subdividing your partitions by filegroup, for instance it makes it easier to remove the oldest partition by simply removing the filegroup, also it leaves you the option of moving to different drives in the future if necessary, makes it easier to visually see how your data is divided, easier for backups, restore specific partitions to other servers etc.
I've a sliding window approach implemented in production and this job runs on top of hour and slides partition by an hour. This job acquires a Schema modified lock correctly.
However, there is a .net process using bulk insert on these partitioned tables and gave following error:-Insert bulk failed due to a schema change of the target table
The error happened as the sliding partition job was running arround the same time. However, I assumed that SQL Server will handle this internally and make bulk insert wait untill the partition job is over vs throwing an error.
Any inputs are appriciated.
Thursday, March 28, 2013 - 12:03:03 PM - Alan Cranfield
I would suggest creating a whole bunch of future hourly partitions and not running your window slide jobs so often. Why do you need to move the partition window every hour? would daily or weekly also work? Then you could do the sliding window move during a maintenance window or low traffic time.