SQL Server Fragmentation What it impacts and what it does not Part 5 of 9
In post #4 in the series, we talked about ways to avoid each type of fragmentation - in this 5th post in the series we'll discuss how each type of fragmentation impacts performance, what to expect in terms of impacted operations, and when you possibly might not see any impact from fragmentation at all. As with earlier posts again, we will not be covering things we discussed from each of the 4 prior posts, so consider reviewing posts 1, 2, 3, and 4 if anything seems fuzzy.
So, potentially the most common concern and question I get from customers regarding fragmentation after explaining some of the internals, types, what causes it, how to avoid it, etc., is something like "Well, that's great to know, but what's the bottom line? What performance impact does it cause, and why/when should I be concerned?". A good question indeed - and the answer isn't always what many people expect, or what they've been told in the past. As with anything in technology, the real answer always depends on a given scenario, workload, infrastructure, hardware, etc. combination, but there are some generalizations that can be made as rules-of-thumb so to speak. Instead of working through each of the different types of fragmentation, I'm instead going to just point out generalizations to keep in mind for the different types of operations performed against different structures in typical workloads and how they are impacted by each of the different types of fragmentation - additionally, I'm not going to discuss file-level fragmentation much, given that it is a) hard to reproduce consistently and b) usually not an issue in the majority of server installations today. Let's get to it:
- Most fragmentation only impacts physical IO operations, not logical
operations within cache - i.e., if you have a very fragmented system, but the
entire database fits in cache/buffer pool, you will not be impacted
by fragmentation (with the exception of page density as we discussed in prior
posts) during typical operations - obviously, you'd still see a difference for
the initial cache warm-up, and possibly for some checkpoint operations. Assume
for a moment that you have a system where a given index can:
- Fit entirely in cache
- Will remain there for the duration of the service operation
In this scenario, the difference in performance between the index being 99% fragmented vs. 0% fragmented (assuming the same or very similar page densities) would be minimal. The only times you may see a performance impact would be during the initial cache warm-up (i.e. the first and only time the index data was pulled from disk into cache) and potentially during some checkpoint operations.
- Singleton seeks into an index are not impacted by fragmentation (even seeks that require IO operations) - take the same scenario as outlined in the bullet point above, and you'll end up with a similar result - singleton seeks against a heavily fragmented index vs. the same seeks against a minimally fragmented index would result in very similar performance results. If you think about the way a singleton lookup works, it makes perfect sense - the storage engine will need to traverse and touch the same number of pages to get to a given record within a leaf page no matter where on disk the pages reside. Scans (full and range) are however impacted very significantly - more so with logical fragmentation than extent fragmentation, however noteworthy with both
- Allocating a single index or heap across multiple files can lead to similar fragmentation-like side effects as extent fragmentation (i.e. interleaving allocations for multiple indexes among each other). Use of multiple files should be managed appropriately.
- Heaps are a flat structure without a linkage of any sort between pages - this means that heaps are not impacted by logical fragmentation at all, ever (they can't be since there is no logical ordering of the pages). Additionally, given that they are flat structures, they have no root/intermediate pages (like a b-tree) and hence, obviously can't incur fragmentation due to non-leaf page organization. Of course, they also don't support seek operations directly (need an associated nonclustered index for that).
- Advanced storage systems available today (high-end SANs, large scale DAS systems) can minimize the effects of fragmentation on performance - many have large cache's where many reads are fulfilled from, generally read requests to a single LUN are spread across 10's to 100's of spindles, and the fabric/hardware that carry the data are very, very performant. In one case study I was involved with a few years back, fragmentation impact on smaller IO systems pushed upwards of 250% degradation, whereas the same workload against a large, enterprise scale SAN system only impacted performance by 30%.
- Data Warehousing/OLAP/DSS type workloads will typically suffer performance impacts to a much greater degree than that of traditional OLTP type workloads (primarily due to the point discussed in the first bullet). DSS type workloads typically perform very large reads on large amounts of data where large performance benefits can be achieved when data in contiguous, allowing the read-ahead manager to perform many large reads ahead of the CPU processing.
That's a good start for each, in the next post, I'm going to provide a script to walk-through on your own machines that you can use to try and validate/understand some of the discussions from above with
About the author
View all my tips
Article Last Updated: 2008-02-17