Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Comparing SQL Server Datatypes, Size and Performance for Storing Numbers


By:   |   Read Comments (5)   |   Related Tips: 1 | 2 | 3 | More > Data Types

Attend a SQL Server Conference for FREE >> click to learn more


Problem

When more than one numeric SQL Server data type may be suitable for a field in a project, which data type should I choose? What implications might the choice of a data type have for storage and performance in SQL Server?

Solution

SQL Server has nine different data types for storing numeric information, but they have differences that may make them more or less suitable for different uses. Clearly, if you need to store a decimal value than the INT data type is inappropriate. Similarly, you would use DECIMAL or NUMERIC, instead of REAL, if you wanted to store an exact value since REAL stores an approximate value.

But there are times when there is more than one data type which could hold the values needed. For instance, BIGINT or DECIMAL with its default values could almost always be used in place of INT. However, there are performance and size reasons to generally prefer the smaller data type. To look at this let us create three tables with 150 million rows and give them primary keys, but no other indexes. One will use INT, the other BIGINT, and the final one will use DECIMAL.

use test
GO

/* Purge any tables that already exist to start fresh*/

if OBJECT_ID('IntTbl', 'U') is not NULL
 drop table dbo.IntTbl

if OBJECT_ID('BigIntTbl', 'U') is not NULL
 drop table dbo.BigIntTbl

if OBJECT_ID('DecimalTbl', 'U') is not Null
 drop table dbo.DecimalTbl

/* Create the tables.
Each table will have three columns, mostly to ensure 
the rows are relatively sizable and to magnify the 
difference made by the size of the data types to make it easier to see.
The first column will be a direct count up to their number of rows. 
The second column will start at the maximum number of rows and go to 0.
The third column will be the first column times two.
Each table will have a primary key and its associated clustered index,
but will have no nonclustered indexes.*/

create table IntTbl 
 (col1 Int primary key, --No nonclustered IDX, but we'll have a PK
 col2 Int,
 col3 Int)

create table BigIntTbl 
 (col1 BigInt primary key, 
 col2 BigInt,
 col3 BigInt)

--Default Decimal, which has a default precision of 18 and scale of 0
create table DecimalTbl
 (col1 Decimal primary key, 
 col2 Decimal,
 col3 Decimal)

/* Create a variable to store the maximum number to insert.
Helps ensure all tables have the same number of inserts */
declare @maxValue int 
set @maxValue = 150000000 --150 Million

/*Now populate the tables.
The Tablock table hint is used to avoid potential issues with 
lock resources.
More one the use of TABLOCK at: 
http://timothyawiseman.wordpress.com/2014/05/26/dealing-with-sql-server-lock-resource-errors/
*/


insert into dbo.IntTbl with (TABLOCK) (col1, col2, col3) 
select top (@maxValue)
 row_number() over (order by s1.name) as col1,
 @maxValue - row_number() over (order by s1.name) as col2,
 row_number() over (order by s1.name) * 2 as col3
from master.dbo.syscolumns s1,
 master.dbo.syscolumns  s2


insert into dbo.BigIntTbl with (TABLOCK) (col1, col2, col3) 
select top (@maxValue)
 row_number() over (order by s1.name) as col1,
 @maxValue - row_number() over (order by s1.name) as col2,
 row_number() over (order by s1.name) * 2 as col3
from master.dbo.syscolumns s1,
 master.dbo.syscolumns  s2

insert into dbo.DecimalTbl with (TABLOCK) (col1, col2, col3)  
select top (@maxValue)
 row_number() over (order by s1.name) as col1,
 @maxValue - row_number() over (order by s1.name) as col2,
 row_number() over (order by s1.name) * 2 as col3
from master.dbo.syscolumns s1,
 master.dbo.syscolumns  s2

I set up a Python script to time a number of different possible operations on each of these tables. Each test was run 50 times with the results averaged. For instance, I had the server return a simple row count. In another test, I selected 100 values arbitrarily and then had the server return the rows with those values in the primary key.

Row Count Time Results
Select 100 Values Time Results

I also ran several other tests that aren't displayed to preserve space. In each of these tests, the table using DECIMAL came out substantially worse than the tables using INT and BIGINT. The picture with INT and BIGINT was much less clear and less consistent. The differences were often small between IntTbl and BigIntTbl. BigIntTbl occasionally outperformed IntTbl. But in most situations, INT came out slightly faster than BIGINT.

The results on the space used were perfectly clear. We can see this by running sp_spaceused against each of the tables. Although all three tables hold exactly the same data, the BigIntTbl is more than 60% larger than IntTbl, and DecimalTbl is larger still. To be clear, at this point each table has a clustered index, but no nonclustered indexes.

The data types will also have a major impact on the size of any nonclustered indexes we create. If we put a nonclustered index on the second column for each of them using:

CREATE UNIQUE NONCLUSTERED INDEX IntTblIndex ON dbo.IntTbl 
(col2 ASC)

CREATE UNIQUE NONCLUSTERED INDEX BigIntTblIndex ON dbo.BigIntTbl 
(col2 ASC)

CREATE UNIQUE NONCLUSTERED INDEX DecimalTblIndex ON dbo.DecimalTbl 
(col2 ASC)

We can then use SYS.DM_DB_PARTITION_STATS to get an approximate size for the indexes.

select 
 i.name,
 i.type_desc,
 s.used_page_count,
 s.used_page_count * 8 as ApproxSizeKb
from 
 sys.indexes i
 inner join sys.dm_db_partition_stats s 
 on s.object_id = i.object_id and s.index_id = i.index_id
where 
 i.name in ('IntTblIndex', 'BigIntTblIndex', 'DecimalTblIndex')

Which gives:

Index Space Results.

The bigIntTblIndex is more than a third larger than IntTblIndex and DecimalTblIndex is larger still. Their size also affected how long it took to create the indexes. The BigIntTblIndex took nearly a third longer to create than IntTblIndex.

In conclusion, the data type chosen to hold numeric data has substantial impact on the speed of some operations, the space used by the table, and the space used by the indexes. While the data type always needs to accommodate the range of values expected for a field, including possible future growth, noticeable savings in space and execution time can be gained when the smaller, more efficient data type is chosen.

Next Steps


Last Update:


signup button

next tip button



About the author





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Wednesday, December 21, 2016 - 1:57:40 PM - Leo Back To Top

One thing that is missing in this article is that the decimal type does not specify a precision (so the default 18 applies, which means 9 storage bytes). For short readers this might mislead to a conclusion to avoid decimal (numeric) in general. Which is not the case. At the end, I guess the performance is simply related to storage bytes, so that DECIMAL(9) (5 storage bytes) would have a better performance and size than bigint but still worse than int (i.e. storage bytes 4 < 5 < 8).

Would be nice to redo these tests with a fourth blue column: IntTbl, Decimal9Tbl, BigIntTbl and Decimal18Tbl. Any guess how they would rank?

 


Tuesday, September 16, 2014 - 2:06:48 PM - Jeff Moden Back To Top

Thanks for the nice, short, and sweet article on this subject.  It does, however, make me turn even more of an evil eye towards many table designers and ORMs because what would normally fit an INT datatype just fine is frequently defined as NUMERIC(19,0).


Friday, September 05, 2014 - 1:00:17 PM - Timothy A Wiseman Back To Top

Bill, thank you for the feedback.

 

Renato, I have not tested it with compression, but now you have me curious so I may have to pursue that in the near future.  Thank you.


Thursday, September 04, 2014 - 12:37:02 PM - Renato Back To Top

 

Hi Tim, nice and simple article. Thanks for sharing! One question: Do you have tested the same code  using Row and/or Page Compression?

Just curious to see the gain of storage thanks to vardecimal e/or row compression saving-typing...


Thursday, September 04, 2014 - 9:50:14 AM - Bill Back To Top

An excellent and concise article on the impact of using the proper numeric data type.  This is an fundamental aspect of database design that is too often overlooked theses days, because it is so easy to throw hardware at performance problems.


Learn more about SQL Server tools