SQL Server Fragmentation Hands On Experience (Part 9 of 9)
In our final stop for the fragmentation series we're going to walk through a complete script that will cover almost everything we've discussed in the series, including:
- New object creation and allocation (heaps, clustered, non-clustered indexes)
- Impact of parallel operations vs. serial, sort_in_tempdb, etc.
- Full scan, range scan, and singleton seek performance tests against cold and warm caches with:
- No fragmentation
- Logical fragmentation (unordered pages with high page density and low page density)
- Extent fragmentation (from concurrent allocation operations and multiple files for a single index)
- Comparison of Defrag vs. Rebuild for multiple types of structures with multiple types of fragmentation
- Output of full comparison for performance related measures (physical reads, logical reads, test durations, etc.) of each of the above for multiple tests
This script should allow you to see and understand first-hand the impact of different types of operations against different types of structures in a variety of fragmentation levels and validate many of the things we discussed in terms of fragmentation's performance impact on operations, what causes fragmentation, understanding the output of detection for fragmentation, the different types of fragmentation and the impact on operations, and the different ways to address fragmentation once you have it and need to fix it. I'm going to walk through the script and what happens at a very high level in the remainder of this post, but to get details you'll want to run it and view the output for yourself in your own test environment.
First, some logistics:
- The script is built to be executed all at once and spits out a bunch of information and notes for you to consume once complete - alternatively, you can simply walk down the script manually if you like as well, following step-by-step
- This script uses both the built-in SQL methods for detecting fragmentation as well as the custom procedure for viewing page linkage and summary information
- The script needs to be run in SQLCMD mode, as there are 5 different variables you need to set for customizing to your environment. This is a script that will work with SQL 2005 only.
- The script will create 2 databases (named as you specify and creating files in the location you specify) and also make use of Database Snapshots in the final portion of the script (defraging/rebuilding comparisons), so to successfully run that portion of the script it needs to be tested on either the Enterprise or Developer edition of SQL 2005...the databases will not be dropped at the end of the script, so be sure to do that if you want to remove them when finished testing
- The output of the script will include detailed notes about preceding/following statistics if you set the "printComments" flag to 1 at the top of the script. Notes about results and operations are delimited by strings of dashes ("-") printed across the screen, then the comments, then an ending delimiter of dashes again.
- Major 'sections' of the output are delimited by large headers that are equal signs (i.e. "=") printed across the screen, then the comment about the section, then an ending equal signs delimiter
Ok, that should cover the logistics, below is an analysis of the results from the test run on my desktop which have been attached to this post in a spreadsheet called "desktopResults.xlsx". Additionally, I've attached to this post the output of the script on my desktop, as well as the script itself.
My desktop machine is a Dell Precision Workstation T5400 64-bit machine, quad-core Xeon E5410 (2.33Ghz, 12Mb L2 cache, 1333 FSB), 16 GB DDR2 SDRAM, a 160GB 10k 16mb cache SATA boot drive, and a 500GB 7.2k 16mb cache SATA data drive. For these tests, the log files for the databases resided on the data drive and the data files resided on the faster boot drive. Multiple test runs were performed for both MAXDOP=1 and MAXDOP=0 configurations and I am reporting the results of one test from each that fell into the 'middle' of the test results for the given configuration.
Some key points to takeaway from and inspect in the output include:
- As expected, "cold cache" tests are significantly more impacted by fragmentation than "war cache" tests - the slowdown in warm-cache tests is not from improperly ordered page chains or interleaved pages, but instead from the page density of each page in cache being lower and requiring more logical reads (i.e. reading more pages) to return all the data. Notice the warm-cache tests for our logically fragmented structure with a higher page density is significantly less impacted than the same tests with a low page density.
- Also as expected, scan operations are significantly impacted by higher fragmentation levels, whereas seek operations are hardly impacted at all (both warm and cold cache tests).
- Comparing run times for parallel vs. sequential plans (i.e. maxdop = 0 vs maxdop = 1), you should notice that warm-cache tests benefited significantly from a parallel operation (both fragmented and non-fragmented data), cold-cache tests with non-fragmented data were often impacted negatively, and cold-cache tests with fragmented data benefited marginally. This is as expected, since parallelizing a plan is all about helping performance with added CPU power, not for improving IO throughput (the read-ahead manager is responsible for that). Warm-cache operations are all about churning through cached pages as fast as possible, so parallel plans would help this significantly. Non-fragmented data in a cold-cache scan operation would be optimized by the read-ahead manager feeding a single sequential operator as much as possible - adding additional threads to read additional data wouldn't help this type of operation much at all. With fragmented data, additional threads requesting different data pages may help marginally since the read-ahead manager can't do it's job optimally (non-contiguous pages), however again, it will be marginal at best typically. Since the disk system and IO operations are where fragmentation introduces the most pain, optimizing IO throughput is where benefits will be seen, not in adding CPU power.
- A clustered object typically inherently contains a bit more "fragmentation" than a heap, particularly with insert-heavy type workloads. A big difference between a heap and a clustered table is that in a cluster, the storage structure is a B-Tree (balanced tree) where the leaf pages are the data and above the leaf there are intermediate pages and a single root page. These non-leaf levels of the B-tree are maintained as data is modified within the index. A heap does not have these non-leaf pages, since a heap is not a B-tree, it is simply a bunch-o-pages that are allocated as data is modified in the table.So, as data is inserted into a heap, the engine will simply try to get a free page somewhere, if it does not exist, it will allocate a new extent, and then continue using pages, there is no maintenance for non-leaf pages here, since they don't exist in a heap - in this way, no matter how much data you continue to add, so long as the extents that are next in the file haven't been taken up by something else, you'll get relatively contiguous data pages.With a cluster, as data is inserted, even if its ever-increasing data like an identity or date/time value, as the non-leaf pages are filled up, additional non-leaf pages will have to be allocated for storing the data for the upper-levels of the index.
- Using a serial build (maxdop = 1) and using the sort_in_tempdb option can help significantly with building indexes that are more contiguous when initially built. When SORT_IN_TEMPDB is set to OFF, the default, the sort runs are stored in the destination filegroup. During the first phase of creating the index, the alternating reads of the base table pages and writes of the sort runs move the disk read/write heads from one area of the disk to another. The heads are in the data page area as the data pages are scanned. They move to an area of free space when the sort buffers fill and the current sort run has to be written to disk, and then move back to the data page area as the table page scan is resumed. The read/write head movement is greater in the second phase. At that time the sort process is typically alternating reads from each sort run area. Both the sort runs and the new index pages are built in the destination filegroup. This means that at the same time the Database Engine is spreading reads across the sort runs, it has to periodically jump to the index extents to write new index pages as they are filled. If the SORT_IN_TEMPDB option is set to ON and tempdb is on a separate set of disks from the destination filegroup, during the first phase, the reads of the data pages occur on a different disk from the writes to the sort work area in tempdb. This means the disk reads of the data keys generally continue more serially across the disk, and the writes to the tempdb disk also are generally serial, as do the writes to build the final index. Even if other users are using the database and accessing separate disk addresses, the overall pattern of reads and writes are more efficient when SORT_IN_TEMPDB is specified than when it is not. The SORT_IN_TEMPDB option may improve the contiguity of index extents, especially if the CREATE INDEX operation is not being processed in parallel. The sort work area extents are freed on a somewhat random basis with regard to their location in the database. If the sort work areas are contained in the destination filegroup, as the sort work extents are freed, they can be acquired by the requests for extents to hold the index structure as it is built. This can randomize the locations of the index extents to a degree. If the sort extents are held separately in tempdb, the sequence in which they are freed has no effect on the location of the index extents. Also, when the intermediate sort runs are stored in tempdb instead of the destination filegroup, there is more space available in the destination filegroup. This increases the chances that index extents will be contiguous.
- Note that the performance of a multi-file index on the same spindles is more like the extent fragmented performance than the contiguous performance. Of course, this is my desktop system and not a high-end many-spindle SAN/DAS system or anything, which could obviously produce different results. Key thing to understand is that multi-file systems introduce some level of extent fragmentation into a system for the added benefit of IO throughput and space (assuming you have a properly configured IO system with multiple spindles and IO paths and a BUS that isn't overloaded, etc., etc.). If you'd like to see what a difference it would make, rerun this same test on the same database after adding the -T1118 and -E startup switches and restarting the server. You would notice that the fragment sizes go back up to the 30-40 range, and the fragment count is back down to near single-file-database numbers. This shows what type of benefit the -E and -T1118 startup switches can provide when used appropriately.
- While running the tests, consider monitoring the impact on your IO system and cache. If you look at the buffer pool cache size and IO sizes during the above tests, you should notice that when performing contiguous IO operations, the avg. IO sizes are consistently above 64k in size, up to 256k in size maybe even. When performing non-contiguous IO operations, the avg. IO sizes are most likely consistently lower than 64k - the more those sizes approach 8k, the more IO operations that are necessary to transfer the same number of data, and hence the longer the run times.
- Pay close attention to the impact on defragmenting structures that a defrag vs. a rebuild has and correlate that back to what you know about performance - might make you rethink your current de-fragmentation plan.
Enjoy, post or email any questions!
About the author
View all my tips