SQL Server Fragmentation How to avoid it (Part 4 of 9)
In the prior post, we discussed the major causes for each type of fragmentation, which followed posts covering storage basics and access methods and what fragmentation is and the types there are. In this 4th post in in the series, we'll discuss what can be done to avoid each type of fragmentation to the extent possible. Also, as in prior posts, we won't talk about things in this post that have been covered in the prior posts, so take a look back to posts 1, 2, and 3 as needed.
Given that the primary contributor to logical fragmentation is page splitting, the key to avoiding logical fragmentation is to do everything possible to avoid page-splitting. This can be done in a couple of different ways:
- Choose index keys that are either ever-increasing values or ever-decreasing values. In this type of index setup, newly inserted records will always be placed at either the beginning or end of the logical page-chain, and hence lead to very few (if any) page-splitting as a result of insert operations. You would still see page splitting at the root/intermediate level of the index, but rarely (if ever) at the leaf level of the index. If you are using GUID values for index keys, with SQL Server 2005, consider using the new function "newsequentialid()" to generate ever-increasing GUID values instead of random location values.
- Choose static index keys. If an index key value never changes, it will never have to be moved from the position it is placed in originally as a result of an update to the key values. If an index key is non-static, when the value(s) changes, the record is nearly guaranteed to require being moved to the appropriate new logical position within the index, thereby causing 2 potential side-effects: a) the page that the record was originally on will have a lower density (at least for some time) due to the empty space left by the record moving to a new location - and b) the new page that the record is moved into will likely incur a page-split operation (assuming the new page is already at or near full capacity).
- Use an appropriate Fill Factor setting. Notice quickly that I didn't say "use a fillfactor"...I said use an appropriate setting. In the majority of scenarios, the default fillfactor (zero...equivalent to 100) setting is appropriate. Many times, fillfactor settings are used without gathering any statistical information to back-up the reason for choosing the setting, which leads to wasted space on pages, a bloated buffer pool, and low page densities. If you've used/are using fillfactor settings that are not the default, are you really sure you're using the appropriate value? What tests/analysis did you do to determine the value you are using? Discussing all that would go into picking the appropriate fillfactor value would be a post or 2 on its own, maybe we'll get to write up some information on that in a future post. If the page density values for indexes that have a non-default fillfactor value set are consistently close to the fillfactor value, that's a good starting sign that the appropriate fillfactor is probably not in use. On the other hand, if you are using a default fillfactor setting, and your indexes are consistently showing high logical fragmentation values and low page densities after an appropriate amount of activity (appropriate depends on your environment), that's a good starting sign that a non-default fillfactor might help.
Remember that using a non-default fillfactor is in effect introducing a form of logical fragmentation up front (building the index with a lower-than-optimal page density) to help avoid heavy amounts of logical fragmentation in the long run. Remember too that all forms of fragmentation other than low page density only impact performance on transfers of data from disk to memory and vice versa. Therefore, if the benefit your workload gets from having contiguous pages on disk and thereby being able to perform faster sequential reads (traditional OLAP type systems) outweighs the impact of having lower-than-optimal page densities in cache, a lower fillfactor value is going to be more effective than scenarios where sequential reads do not occur too frequently (traditional OLTP type systems).
There are some scenarios where it generally makes absolutely no sense to have a non-default fillfactor setting:
- Read-only workloads. Why would you have a fillfactor here? There is absolutely no good reason for it - if there are no writes occurring, there's no potential for any type of fragmentation to be introduced.
- Indexes with a ever-increasing or ever-decreasing value. Given that inserts will always be positioned at the beginning or end of the index, where would a fillfactor help? The only scenarios this would help would be when this applies but the index keys are not static (rare) and they would be updated a significant amount of the time, or a significant portion would be updated at a given time (rare again), or if variable length columns within the records are updated from small values to large values frequently (thereby causing a growth of size in the record, potentially causing it to need to move to another page). These scenarios are quite rare when using these types of keys, but they do happen.
- Indexes on identity values. This is basically the same as #2, but it is done so often, I want to drive it home with a special bullet.
To avoid/minimize extent fragmentation, you need to try and minimize the mixing of allocation activity of different objects/indexes within a given file (see the prior post for information on how this leads to extent fragmentation). There are a couple of options/tools/methods we can use to try and minimize this:
- Use multiple files/groups appropriately. It is a common misconception that adding multiple files to a given database will somehow improve performance/IO throughput. I plan to write a post on this in the future with much more detail, but there are a few great posts by Paul Randal and the PSS team from Microsoft that discuss it a bit (like many myths with SQL Server, this seems to have become a common misconception by nature of actually being true for a given scenario (tempdb, allocation workloads) and just blindly being applied to all other scenarios...). Using multiple files/groups does have many benefits, but let's discuss the terms of how they can be used to avoid extent fragmentation for now. If you have a few objects/structures that seem to be very heavy with new space allocations, one approach to avoid interleaved extents between each of the structures would be to dedicate a single file in a single filegroup to each of these structures. Assuming the files are sized appropriately, and they are dedicated to supporting only a single object/structure, all allocations within that file would be contiguous. Of course, using multiple files for a single structure (or multiple structures) can actually have the opposite effect, as you'd actually be introducing extent fragmentation across each file for the given object. So, use them carefully.
- Consider using the -E startup option. This option changes the default database engine behavior from allocating a single extent per allocation request to 4 extents per allocation request. Basically, this would ensure that for each allocation you have 4 contiguous extents for the given object instead of just 1. Interestingly, this option is used in the TPC-H SQL Server TPC tests (possibly others as well, I didn't read them all :-)). This can have a drastic positive impact on the amount of extent fragmentation you see on your system. Of course, in a heavy-allocation workload, this could have negative impact as well, particularly in high-concurrency type applications (although, if you have a very high-end SAN, it might help there too). It would all depend on your scenario and workload, but it's worth knowing about and testing in your environment if extent allocations and fragmentation is an issue. You can read more about this switch in this KB article.
- As a final thought, you can also consider disabling mixed-extent allocations all-together. This generally will only help in allocation-intensive type workloads, but in general, I've always figured the benefits of doing so far outweigh the con (64kb of space per object...not too much in today's world really). You do this using the -T1118 trace flag, which is most famous from the tempdb concurrency topic here.
File-level fragmentation can be avoided primarily by ensuring you size your SQL data/log files to the appropriate size when the database is first created. The cost of storage is so cheap today, that it is rare to find systems any longer where storage needs to be added to the system to support unexpected growth before the system gets replaced entirely anyhow. This being the case, it's generally possible to grow your data/log files large enough initially to account for years worth of growth (yes, there are exceptions). If this isn't possible, sizing of the data/log files should be performed manually in the largest growth interval as is possible, since a single grow operation will to the best of the ability of the components provide contiguous space on-disk. Do not rely on the auto-grow feature of SQL Server to manage your file sizes and growth cycles for you - aside from other factors like when the file will grow, locking and concurrency issues, etc., etc., it is best for you to proactively manage the sizes and growth of the files for this reason alone. Note that I'm not saying to disable/not use the auto-grow feature, just don't rely on it for day-to-day file growth management.
File-level fragmentation is also typically the least concern you'll have for your systems - again, there are exceptions, but given other bottlenecks, the penetration of advanced SAN/DAS type IO systems in the market today, and the fact that most systems today are dedicated SQL Servers, those exceptions will be rare. If you do your best to ensure you are managing the file sizes and growth cycles of your data/log files, chances are very good that File-level fragmentation is not what you should be worrying about...
That wraps up the primary ways to avoid each type of fragmentation, next post we'll cover how each type of fragmentation affects/impacts performance (and when it doesn't...).
About the author
View all my tips