SQL Table Join Performance for INT vs BIGINT vs GUID Data Types

Problem

Reading some of the comments from my last tip got me thinking about the performance of table joins on different datatypes.  I had always assumed (I know I should know better) that as long as the columns, regardless of whether or not there is a foreign key relationship defined, are indexed that there should be no performance issues based on datatype of the join column chosen to join two tables.  In this tip we will compare joining tables using the following datatypes to see if there is in fact any performance difference: INT, BIGINT, GUID (one randomly generated and generated using the newsequentialid() function).

Solution

In order to perform this test, let’s setup 4 similar sets of parent/detail tables.  The only difference will be that each parent table primary key and its associated detail table join column will use a different column type: one of INT, BIGINT and GUID (one randomly generated and generated using the newsequentialid() function).  Below is the T-SQL to generate these tables and load some sample data.

 CREATE TABLE parent_int     ([pk_col] [int] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_bigint  ([pk_col] [bigint] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_guid    ([pk_col] [uniqueidentifier] not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE parent_seqguid ([pk_col] [uniqueidentifier] default newsequentialid() not null primary key clustered,
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20)); 
CREATE TABLE detail_int     ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [int],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_bigint  ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [bigint],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_guid    ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [uniqueidentifier],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
CREATE TABLE detail_seqguid ([pk_col] [int] IDENTITY(1,1) not null primary key clustered,
                             [join_col] [uniqueidentifier],
                             [col1] [int],
                             [col2] [datetime],
                             [col3] [varchar](20));
ALTER TABLE dbo.detail_int ADD CONSTRAINT FK_detail_int_parent_int FOREIGN KEY
 (join_col) REFERENCES dbo.parent_int (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;
ALTER TABLE dbo.detail_bigint ADD CONSTRAINT FK_detail_bigint_parent_int FOREIGN KEY
 (join_col) REFERENCES dbo.parent_bigint (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;
ALTER TABLE dbo.detail_guid ADD CONSTRAINT FK_detail_guid_parent_guid FOREIGN KEY
 (join_col) REFERENCES dbo.parent_guid (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;
ALTER TABLE dbo.detail_seqguid ADD CONSTRAINT FK_detail_seqguid_parent_guid FOREIGN KEY
 (join_col) REFERENCES dbo.parent_seqguid (pk_col) 
 ON UPDATE  NO ACTION ON DELETE  NO ACTION;
CREATE INDEX detail_int_join_col ON dbo.detail_int (join_col);
CREATE INDEX detail_bigint_join_col ON dbo.detail_bigint (join_col);
CREATE INDEX detail_guid_join_col ON dbo.detail_guid (join_col);
CREATE INDEX detail_seqguid_join_col ON dbo.detail_seqguid (join_col);
-- data load
DECLARE @returnid table (pk_col uniqueidentifier)
DECLARE @val int
DECLARE @randval int
DECLARE @val2 int
DECLARE @pkint int
DECLARE @pkguid uniqueidentifier
DECLARE @pkseqguid uniqueidentifier
SELECT @val=1
WHILE @val < 200000
BEGIN  
   INSERT INTO parent_int (pk_col, col1, col2,  col3) 
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   INSERT INTO parent_bigint (pk_col, col1, col2,  col3) 
      VALUES (@val,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   SELECT @pkint = @val;
   SELECT @pkguid = newid();
   INSERT INTO parent_guid (pk_col, col1, col2,  col3) 
     VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   INSERT INTO parent_seqguid ( col1, col2,  col3)
     output inserted.pk_col into @returnid 
     VALUES (round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
   SELECT @pkseqguid = r.pk_col from @returnid r
   SELECT @randval=FLOOR(RAND()*(10-1)+1);
   SELECT @val2=1
   WHILE @val2 <= @randval
   BEGIN
      INSERT INTO detail_int (join_col,col1,col2,col3)
         VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_bigint (join_col,col1,col2,col3)
         VALUES (@pkint,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_guid (join_col,col1,col2,col3)
         VALUES (@pkguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      INSERT INTO detail_seqguid (join_col,col1,col2,col3)
         VALUES (@pkseqguid,round(rand()*100000,0),getdate(),'TEST' + CAST(@val AS VARCHAR));
      SELECT @val2=@val2+1
   END
   
   SELECT @val=@val+1
END
GO                        

As with the last test we did let’s run through some SELECT queries both before and after the indexes on the tables have been rebuilt to see if there is any difference in performance with the indexes being more or less fragmented.  Below is the state of the indexes after the initial data load.

TableNameIndexNameIndexTypeFragmentationPctPageCount
parent_seqguidPK__parent_s__CF31D692708D8E83CLUSTERED0.711253
parent_intPK__parent_i__CF31D69240B1A39FCLUSTERED0.42952
parent_guidPK__parent_g__CF31D6926ABDEAE2CLUSTERED99.231821
parent_bigintPK__parent_b__CF31D6927AAC240BCLUSTERED0.471052
detail_seqguiddetail_seqguid_join_colNONCLUSTERED0.833220
detail_intdetail_int_join_colNONCLUSTERED0.571733
detail_guiddetail_guid_join_colNONCLUSTERED99.385970
detail_bigintdetail_bigint_join_colNONCLUSTERED0.672231

To test the performance we will run two different SELECT statements.  One that returns all the records from the parent and the associated records from the detail table.  The second query will return a single record from the parent and the associated record from the detail table.  Note that when querying a single record from the parent table you need to make sure that the detail table has the same number of records to ensure the test results are not skewed based on the size of the dataset returned.  Below is the T-SQL for these statements.

 -- all records in parent
select * from parent_int p inner join detail_int d on d.join_col=p.pk_col;
select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col;
select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col;
select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col;
  
-- single record in parent
select * from parent_int p inner join detail_int d on d.join_col=p.pk_col 
  where p.pk_col=121143;
select * from parent_bigint p inner join detail_bigint d on d.join_col=p.pk_col 
  where p.pk_col=121143;
select * from parent_guid p inner join detail_guid d on d.join_col=p.pk_col 
  where p.pk_col='A10B3C3C-3ABC-47CB-8DDE-22DC1DF89447';
select * from parent_seqguid p inner join detail_seqguid d on d.join_col=p.pk_col 
  where p.pk_col='E89535AA-27A9-E711-985B-3402863D5BD9';

As always we’ll use SQL Profiler to collect the performance statistics.  Below are the SQL Profile results from this first test run.

Table/QueryCPU (ms)ReadsDuration (ms)
parent_int/Full Join1295624210431
parent_bigint/Full Join1397683211523
parent_guid/Full Join1483861614734
parent_seqguid/Full Join2512820814008
parent_int/Single Record0272
parent_bigint/Single Record0271
parent_guid/Single Record0272
parent_seqguid/Single Record0271

Looking at these results we see that in order to join the entire table the seqguid case uses quite a bit more CPU than the other 3 cases.  Also both GUID cases performed ~20% more IO than the integer test cases.  In the test cases where the query was limited to just one record from the parent table the performance was almost identical for all 4 test cases.  Even though we have quite a bit of fragmentation with the index on the random GUID column it still performs just as well as the other datatypes in this case.

Now let’s rebuild our indexes, verify we don’t have any more fragmentation and run through our test cases again.  Below is the T-SQL to rebuild the indexes.

 ALTER INDEX [PK__parent_g__CF31D6926ABDEAE2] ON [dbo].[parent_guid] REBUILD;
ALTER INDEX [PK__parent_s__CF31D692708D8E83] ON [dbo].[parent_seqguid] REBUILD;
ALTER INDEX [PK__parent_i__CF31D69240B1A39F] ON [dbo].[parent_int] REBUILD;
ALTER INDEX [PK__parent_b__CF31D6927AAC240B] ON [dbo].[parent_bigint] REBUILD;
ALTER INDEX [detail_seqguid_join_col] ON [dbo].[detail_seqguid] REBUILD;
ALTER INDEX [detail_int_join_col] ON [dbo].[detail_int] REBUILD;
ALTER INDEX [detail_guid_join_col] ON [dbo].[detail_guid] REBUILD;
ALTER INDEX [detail_bigint_join_col] ON [dbo].[detail_bigint] REBUILD;

As you can see below we no longer have any index fragmentation.

TableNameIndexNameIndexTypeFragmentationPctPageCount
parent_seqguidPK__parent_s__CF31D692708D8E83CLUSTERED0.231254
parent_intPK__parent_i__CF31D69240B1A39FCLUSTERED0.31953
parent_guidPK__parent_g__CF31D6926ABDEAE2CLUSTERED0.231252
parent_bigintPK__parent_b__CF31D6927AAC240BCLUSTERED0.281052
detail_seqguiddetail_seqguid_join_colNONCLUSTERED0.093221
detail_intdetail_int_join_colNONCLUSTERED0.171734
detail_guiddetail_guid_join_colNONCLUSTERED0.093222
detail_bigintdetail_bigint_join_colNONCLUSTERED0.132232

Now let’s rerun our select queries again and see if there is any difference with performance.  Below are the SQL Profiler results from this second test run.

Table/QueryCPU (ms)ReadsDuration (ms)
parent_int/Full Join1102624310231
parent_bigint/Full Join1328683211275
parent_guid/Full Join1295804414101
parent_seqguid/Full Join2532820314206
parent_int/Single Record0272
parent_bigint/Single Record0272
parent_guid/Single Record0272
parent_seqguid/Single Record0271

These results match almost exactly with our first test run.  The only difference being, now that the index has been rebuilt, the random GUID datatype is performing fewer reads in the case where we are querying the entire parent table.  Although it’s not quite as good as either of the integer test cases.

So there we have it.  At least for this simple test case, we have demonstrated that there is very little difference with table join performance using these 4 datatypes.

Next Steps

One comment

  1. You said:
    “At least for this simple test case, we have demonstrated that there is very little difference with table join performance using these 4 datatypes”

    How did you arrive at this conclusion when the Full Join for the GUID is 14,101 ms whereas the Full Join for the INT is only 10,231 ms? That’s a difference of ~28%. I don’t think that’s insignificant (unless I misunderstood your results).

Leave a Reply

Your email address will not be published. Required fields are marked *