Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Fragmentation Storage basics and Access Methods Part 1 of 9


By:   |   Read Comments   |   Related Tips: 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | More > Fragmentation and Index Maintenance

ALERT: Did you know 66% of DBAs say their workload is increasing! - Click here to learn more.


Lots of times I get customers and non-customers talking about fragmentation - everything from what it is, to how it impacts performance, to what objects can be fragmented, to how to check for fragmentation. Quite often (almost always) the discussion inevitably includes lots of points that are not 100% accurate, or open to interpretation depending on what exactly someone is referring to using a particular terminology. There are also quite a few misnomers, or myths even, that exist regarding fragmentation and related topics.

So, let's try to understand some of these different things. In this and a few related posts, I'm going to try and discuss some of the following topics:

Inevitably (must be my word of the day, 2nd time I've used it in this post), along the course of discussing this topic, we'll have to hit on some other topics such as:

  1. Storage characteristics
  2. Clusters, Heaps, B-Trees, etc.
  3. Access Methods
  4. etc.

So, let's get this party started...in this post, we're going to discuss some basics that need to be at least touched on in terms of storage and access methods. aI'm going to provide minimal detail here, since you can easily get much more detailed information elsewhere.

Pages

A page in Sql Server is the primary storage structure for all data (data, indexes, BLOB/CLOB/LOB, row-overflow/SLOB, etc.). Pages are 8k in size and store records in no particular order (row offset is used to find each logical record on the page) - a very basic structure is as follows:

A page in Sql Server is the primary storage structure for all data

Image originally created by and courtesy of Paul Randal from SQLSkills.com from various presentations.

Extents

An Extent in Sql Server is the primary allocation structure (i.e. what is allocated to objects when they request more space to store stuff). Extents are made up of 8 contiguous pages (64k) and can be either shared or uniform in nature. Yes, this means anytime new storage space is needed for data, or an index, or a heap, it is allocated a single, full, contiguous extent (8 contiguous pages). The only exceptions to this are when space is allocated for a new object and the 1118 trace flag is not enabled, or when you have the -E startup option enabled, or if space is being allocated for a very small object that is less than ~64k in size. We'll discuss the -E startup option in a later blog and how it can be helpful, for information on the 1118 trace flag, see this KB article: http://support.microsoft.com/kb/328551. We aren't going to discuss the differences between shared and uniform extents, but a simple read in Books Online will get you information to cover that.

HoBT

HoBT (pronounced "hobbit") stands for "Heap or B-Tree" and is the logical name for how all user data is stored in Sql Server (either in a heap or a B-tree).

Heap

A heap is basically a bunch of pages with no particular structure (aside from that of a page/extent), no order, no linkage. If you have a table that has no Clustered Index defined on it, then the data for that table is stored in a heap. Heaps are logically a flat structure (i.e. they are made up of only data pages, no root/intermediate pages). Given that there is no order, structure, or page linkage in a heap, there is no direct support in a heap for singleton lookups or range scans.

B-Tree

A B-Tree (balanced-tree) is a balanced, structured object, and it is used for all indexes in Sql Server (Clustered, Nonclustered, XML, etc.). Think of a B-Tree as a triangle-shaped structure of pages - it has a single root page at the top of the triangle with pointers to the next level down the triangle (which is wider than the root level by the number of pages that the root page can contain pointers to), each of those pages contain pointers to the next level, and so on (these are called intermediate pages) down the triangle until you get to the bottom of the triangle, which is the widest and referred to as the 'leaf' level of the structure. In ASCII x marks, a B-tree would look like the following if it contained 2 intermediate levels where each page in the root and intermediate levels could each could point to 3 other pages:

x
xxx
xxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxx

What resides at the 'leaf' level of the structure depends on what type of index the B-tree supports. If it is a clustered index, then the leaf level of the index contains all the data for all the columns for all the rows of the table; if it is a non-clustered index on a clustered table it contains the non-clustered index keys, the included column values, and the keys for the clustered index (uses those as a pointer to the rest of the data); if it is a non-clustered index on a heap table, it contains the non-clustered index keys, the included column values, and the physical row-id value for the record in the heap (uses this to get a pointer to the rest of the data).

Data at the leaf-level of a B-Tree is logically ordered in order of the index key - this applies to both clustered and non-clustered indexes (ever heard that the data is physically sorted this way? not the case...this is exactly what logical fragmentation is, the logical ordering of pages in an index not matching the physical layout within the file). This logical ordering is achieved via the "row offset"/"slot array" for records on a page, and via a doubly-linked list for pages in an index. A doubly-linked list is basically a pointer in the header of each page that points to the prior logical page and the next logical page in the index. We'll discuss this and how it relates to fragmentation more later...

Seek

A seek is an efficient access method that can be fulfilled using an index structure. Seeking touches fewer pages than scanning, and can only occur on an index of some type. Think of a seek as what you would do with a phone book if I told you to find the phone number of someone with last name 'Boyd' and the first name 'Chad'...

Scan

A scan is basically an access method whereby all data, or some range of data, in an index or heap must be touched or retrieved in order to fulfill a request. Think of a scan as what you would do with a phone book if I told you to find the phone numbers for everyone in the book, or the phone number for all people with the first name 'Chad', or for someone with a phone number of (555) 555-5555...

Singleton Lookup

A singleton lookup is a seek operation whereby a single record/page of data is retrieved. This can occur for example when you perform a query that needs to access only a single record, or a few records from a single page. The navigation through a B-tree in a singleton lookup would be similar to the following diagram:

A singleton lookup is a seek operation

Image originally created by and courtesy of Paul Randal from SQLSkills.com from various presentations.

Full Scan - key ordered

A key-ordered full scan is a scan operation whereby all the leaf pages of a given B-tree structure are retrieved by following the page linkage (i.e. starting with the first page in the leaf level and following to the next page via the doubly-linked list, and so on and so on). Note that this type of scan can only be performed on a B-tree structure and NOT against a heap. This occurs for example when you perform a query that must return all the rows and columns in a clustered table, or only a few rows but the filter predicate has no index to seek with, or against a non-clustered index if you want all rows for a table but only values from columns that are covered by the non-clustered index. (navigational diagram follows under Range Scan)...

Range Scan

A range-scan is a scan operation whereby a range of pages are retrieved - think of this as a full-scan with boundaries. The navigation through a B-tree in a range scan would be similar to the following diagram (a full scan operation would be exactly the same, only the entire leaf would be scanned instead of just a portion of it):

A range-scan is a scan operation whereby a range of pages are retrieved

Image originally created by and courtesy of Paul Randal from SQLSkills.com from various presentations.

Full Scan - Allocation Ordered

An allocation ordered full scan is the same as a key ordered full scan, only instead of scanning the data via page linkage, the pages from the leaf of the index (or from a heap) are scanned via page ID values taken from system pages that track which pages are allocated to the given index/heap (pages that track this are called 'IAM' pages). Basically all the physical page ID values are gathered for pages allocated to the given index/heap, and then those pages are retrieved directly - even in a B-tree structure, the root/intermediate pages for the index would not be touched at all (no reason to do so, since the physical page ID values take you directly to the leaf pages needed). This is the ONLY type of operation that is directly supported against a heap (though seeks and range-scans can occur indirectly via non-clustered indexes that are built against the heap table). The navigation through a B-tree in this type of scan would be similar to the following (notice that the root/intermediate levels aren't touched)...navigation through a heap would be similar as well, only a heap wouldn't include root/intermediate pages at all in the structure:

An allocation ordered full scan is the same as a key ordered full scan

Image originally created by and courtesy of Paul Randal from SQLSkills.com from various presentations.

Read-Ahead

Read-ahead is an access method mechanism that attempts to intelligently pre-fetch data that resides on-disk into the data cache prior to it being needed for use by the CPU - this is done to try and optimize the IO throughput of a system in order to keep the CPU as busy as possible without waiting on the slower IO system to get needed data. This type of operation is typically reserved for scans of data that fetch medium/large amounts of pages/data, and can issue 1,8,32, or 64 page IOs in a single IO operation - the size of operation that can be performed is very heavily impacted by the contiguity of the data (the more contiguous the data, the bigger the IOs that can performed, the better performance you see). There are 2 kinds of read-ahead operations, 1 that works with allocation ordered scans, and 1 that works with key-ordered scans.

With allocation ordered scan, the IAM pages in the database list the extents used by a heap or index - the engine will read the IAM and build a sorted list of the physical disk addresses that must be read, allowing the engine to optimize it's IOs as large sequential read performed in sequence based on their location on disk, vs. multiple small, random reads.

With key-ordered scans, the engine uses the information stored in the intermediate index pages 1 level above the leaf level to schedule serial read-aheads for pages that contain the keys found. If a request is made for example for all keys from 1 to 100, the engine will first read the index page above the leaf page for key 1 (on it's way to traversing to the leaf page); however, instead of simply reading each leaf page in sequence from page 1 to page 100, the engine scans the intermediate level page and builds a list of the leaf pages that must be read to get pages 1 thru 100, then schedules all reads in key order - in addition, the engine will recognize if pages are contiguous and perform a single read to retrieve the adjacent pages in a single operation as opposed to multiple, smaller operations. A similar type of operation is used to pre-fetch data from a base cluster or heap when scanning through a non-clustered index - since the leaf rows of a non-clustered index contain pointers to the data rows in the cluster/heap structure, as the storage engine reads through the leaf of the non-clustered index, it also starts scheduling asynchronous reads for the corresponding data rows whose pointers have already been retrieved. This allows the engine to fetch data efficiently from the underlying cluster/heap before it has completed the scan of the non-clustered index.

The navigation for a read-ahead in a key-ordered scan would look something like the following:

The navigation for a read-ahead in a key-ordered scan would look something like the following:

Image originally created by and courtesy of Paul Randal from SQLSkills.com from various presentations.

Ok, that should do it in terms of some of the basics to cover around storage, allocation, and access methods in regards to what we need to understand when talking about Fragmentation. In the next post, we'll discuss what different types of fragmentation there are, what storage structures they apply to, and what access methods are impacted by each. In future posts, we'll delve into much more...



Last Update:


next webcast button


next tip button



About the author





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools