SQL Server 2008 Spatial Index Performance

Problem

The institution I work with has decided to migrate their database system to SQL Server 2008. One of the applications uses geospatial data, which consists of millions of rows. I understand that their are indexes that can be used for geospatial data, but have not worked with them. What’s the scoop on them?

Solution

Geospatial data has certainly become an important component in applications for everything from tracking packages via delivery services to the outdoor activity of geocaching. Before long these tables can grow to millions of records. Although SQL Server has indexes that assist in narrowing a result set, the storage of geospatial data as a binary format represents an interesting challenge in terms of indexing. SQL Server 2008 provides Spatial Indexes for this situation.

Like other indexes in SQL Server, spatial indexes utilize a B-Tree structure. B-tree structures, however, can represent data in only two dimensions, so the index re-factors the space into a four-level grid hierarchy, referred to as Level 1, which is the top level; Level 2; Level 3; and Level 4. Level One is more like a “bird’s eye view” of the covered area; each level below that takes a section of the previous level and further divides it into another grid. The numbering of the cells shows the hierarchy. The first cell in Level One is 1; the first cell in Level Two is 1.1 because it is a subsection of Level One. This hierarchical model is built into the query processor, which allows for minimal storage of the information needed. Each level contains the same number of same-sized cells, the size of which can be specified when creating the index. The more cells in an index, the more dense the grid is. So a grid with 64 cells is more dense than a grid with 16 cells.

When a spatial index is created, it first performs the refactoring of space. It then performs a tessellation process, whereby it fits the space represented by the data into the previously created grid hierarchy without gaps or overlaps. The tessellation process is governed by three rules:

Rule Name	Description
The covering rule	If an object completely covers a cell and overlaps into other cells then the cell is recorded in the index and no other attempt is made to maneuver other objects against that cell
The cells-per-object rule	A limit on the number of objects that tessellation can occur with in a given cell. This limitation is enforced in all levels except for Level One and can be customized when creating the index (the maximum values is 8192)
The deepest-cell rule	This rule takes advantage of the hierarchical model mentioned earlier by recording only the information required in the deepest cell

There are some restrictions on creation of a spatial index. First, spatial indexes can only be created on either a geometry or geography data type. Second, the table must have a primary key constraint with a maximum of 15 columns. Once a spatial index is created on a table the primary key composition cannot be modified. Third, the maximum size of index key records is 895 bytes. Fourth, spatial indexes cannot be created on indexed views.

Creating a Spatial Index Using SQL Server Management Studio

Navigate through the object tree to the Indexes section of the table on which the index will be created

Type in a name for the index and choose SPATIAL in the Index Type

Click Add in the Index Key Columns section and click OK

Choose the index key columns for the index

The choice for tessellation is tied to the data type on which the index will be created. On the left lower side (in the Progress area) you’ll see that an error was previously generated. I attempted to create an index on a geography data column using the GEOMETRY_GRID tessellation scheme.

On the left side of the main Index Creation are additional options for the index. Of most importance is the Spatial section. In this section you can specify the bounding box parameters (for indexes on geometry columns only) and grid density.

Creating a Spatial Index Using Transact-SQL

CREATE SPATIAL INDEX [NDX_Earthquake_EarthquakeGeoCode] <br>ON <br>[dbo].[Earthquake] ([EarthquakeGeocode]) <br>USING GEOGRAPHY_GRID <br>WITH <br>( <br>     GRIDS=(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH) <br>     , CELLS_PER_OBJECT = 64 <br>     , PAD_INDEX = OFF <br>     , SORT_IN_TEMPDB = OFF <br>     , DROP_EXISTING = OFF <br>     , ALLOW_ROW_LOCKS = ON <br>     , ALLOW_PAGE_LOCKS = ON <br>) <br>ON [PRIMARY]

Script Dissection
CREATE SPATIAL INDEX [NDX_Earthquake_EarthquakeGeoCode] ON [dbo].[Earthquake] ([EarthquakeGeocode])	This section specifies the index type, the index name, and the table and column the index will be created on. The syntax is essentially the same as for non-clustered index creation aside from the word SPATIAL
Geography data type USING GEOGRAPHY_GRID Geometry data type USING GEOMETRY_GRID	This section identifies which grid type to use (either GEOMETRY_GRID or GEOGRAPHY_GRID). The default for the grid is the corresponding data type. If you attempt index creation using a grid type different from the data type an error will be returned
Geography data type WITH ( GRIDS=(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH) , CELLS_PER_OBJECT = 64 Geometry data type WITH (BOUNDING_BOX =(0, 0, 1800, 1800)) GRIDS=(LEVEL_1 = HIGH,LEVEL_2 = HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH) , CELLS_PER_OBJECT = 64	This section specifies either the grid density and the number of cells per object (geography data type) or the bounding boxes (geometry data type). Again, when creating a spatial index for a geography data column the maximum number of cells per object is 8192 (an error will be returned if you try to specify a higher value). As for

Script Dissection

CREATE SPATIAL INDEX
[NDX_Earthquake_EarthquakeGeoCode]
ON
[dbo].[Earthquake]
([EarthquakeGeocode])

This section specifies the index type, the index name, and the table and column the index will be created on. The syntax is essentially the same as for non-clustered index creation aside from the word SPATIAL

Geography data type
USING GEOGRAPHY_GRID

Geometry data type
USING GEOMETRY_GRID

This section identifies which grid type to use (either GEOMETRY_GRID or GEOGRAPHY_GRID). The default for the grid is the corresponding data type. If you attempt index creation using a grid type different from the data type an error will be returned

Geography data type
WITH
(
GRIDS=(LEVEL_1 = HIGH,LEVEL_2 =
HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH)
, CELLS_PER_OBJECT = 64

Geometry data type
WITH
(BOUNDING_BOX =(0, 0, 1800, 1800))
GRIDS=(LEVEL_1 = HIGH,LEVEL_2 =
HIGH,LEVEL_3 = HIGH,LEVEL_4 = HIGH)
, CELLS_PER_OBJECT = 64

This section specifies either the grid density and the number of cells per object (geography data type) or the bounding boxes (geometry data type). Again, when creating a spatial index for a geography data column the maximum number of cells per object is 8192 (an error will be returned if you try to specify a higher value). As for

Performance Difference

To show the difference in performance let’s look at the number of earthquakes near my fictional getaway spot near Fiji. The approximate coordinates of my little getaway is 179.41 Longitude by -16.57 Latitude. I would like to see how many earthquakes’ epicenters occurred within 100 miles of my getaway. There are a little over 3100 earthquakes in the dataset (since February, 2009) , with 125 of them whose epicenter was close enough to Fiji so that the description of the location has the word Fiji in it. In the query I declare a geography type variable and set it equal to my getaway coordinates. Then I look for all earthquakes where the distance between it and my getaway are less than or equal to 100 miles (528,000 feet):

DECLARE @Getaway GEOGRAPHY <br>SELECT @Getaway = geography::STGeomFromText('POINT(178.31 -16.8)', 4326) <br>SELECT <br>     EarthquakeID <br>     , CONVERT(CHAR(10),EarthquakeTime,101) + ' ' + <br>           CONVERT(CHAR(8),EarthquakeTime,108) as EarthquakeTime <br>     , EarthquakeDescription <br>     , EarthquakeGeocode.Long as Longitude <br>     , EarthquakeGeocode.Lat as Latitude <br>     , EarthquakeGeocode.M as Magnitude <br>     , EarthquakeGeocode.Z as Depth <br>     , CONVERT(NUMERIC(4,1),@Getaway.STDistance(EarthquakeGeocode)/5280) as MilesAway <br><br>FROM <br>     MSSQLTIPS.dbo.Earthquake (nolock) <br>WHERE <br>     @Getaway.STDistance(EarthquakeGeocode) < 528000 <br>ORDER BY <br>     CONVERT(NUMERIC(4,1),@Getaway.STDistance(EarthquakeGeocode)/5280);

I executed the query three ways: without a spatial index or primary key, with a primary key but without the spatial index, and with both the primary key and spatial index. During the executions I captured the actual execution plan, IO statistics, and time statistics. Executing the query with no indexes at all ran as you would expect, a TABLE SCAN. Once a primary key constraint was created on the table I executed the query. Again, the execution plan showed exactly what was expected, a CLUSTERED INDEX SCAN. When the spatial index was created, a whole different scenario appeared:

Execution Plan after the spatial index was created

Clustered Index Only	After Spatial Index Creation
(51 row(s) affected) Table ‘Earthquake’. Scan count 1, logical reads 88, physical reads 0, read-ahead reads 0, lob logical reads 0 , lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected) SQL Server Execution Times: CPU time = 218 ms, elapsed time = 357 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.	(51 row(s) affected) Table ‘Earthquake’. Scan count 0, logical reads 136, physical reads 0, read-ahead reads 0, lob logical reads 0 , lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0 , read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘extended_index_277576027_384000’. Scan count 2054, logical reads 4108, physical reads 0, read-ahead reads 0 , lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.(1 row(s) affected) SQL Server Execution Times: CPU time = 47 ms, elapsed time = 163 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

Clustered Index Only

After Spatial Index Creation

(51 row(s) affected) Table ‘Earthquake’. Scan count 1, logical reads 88, physical reads 0, read-ahead reads 0, lob logical reads 0 , lob physical reads 0, lob read-ahead reads 0. (1 row(s) affected)

SQL Server Execution Times: CPU time = 218 ms, elapsed time = 357 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

(51 row(s) affected) Table ‘Earthquake’. Scan count 0, logical reads 136, physical reads 0, read-ahead reads 0, lob logical reads 0 , lob physical reads 0, lob read-ahead reads 0. Table ‘Worktable’. Scan count 0, logical reads 0, physical reads 0 , read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table ‘extended_index_277576027_384000’. Scan count 2054, logical reads 4108, physical reads 0, read-ahead reads 0 , lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.(1 row(s) affected)

SQL Server Execution Times: CPU time = 47 ms, elapsed time = 163 ms.

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.

Performance is certainly enhanced, even in the relatively small earthquake dataset I was working against. You can experiment with the cell density and other aspects of the spatial index to meet the needs of your organization. Although it doesn’t usually make sense to have multiple indexes covering the same table column, multiple indexes on a spatial data column can be used with different tessellation parameters in certain conditions.

Next Steps

Many thanks to GeoCommunity for sending me the daily newsletter that contains both the geospatial news of the day as well as recent earthquakes around the globe!
If you are interested in learning more about spatial indexes and would like to use the data I collected, click here for a Zip file containing both a flat file and XML file
Read the Spatial Indexing Overview on Microsoft TechNet
Stay tuned for more Spatial Data Storage Tips

Tim Cullen

Tim has been working in the IT industry since 2003. After spending a few years as an “IT Generalist”, his career focus turned to SQL Server and application development. He currently works as an IT Specialist for a Federal agency in Maryland, where he focuses on SQL Server administration and providing guidance to teams on SQL Server performance issues. When not working, he enjoys playing golf, playing bass guitar, and exercising.

MSSQLTips Awards: Trendsetter (25+ tips)

Creating a Spatial Index Using SQL Server Management Studio

Creating a Spatial Index Using Transact-SQL

Performance Difference

Leave a ReplyCancel Reply