Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Performance Comparison of TOP vs ROW_NUMBER


By:   |   Read Comments (3)   |   Related Tips: More > Performance Tuning

New Whitepaper > 10 Ways We Can Steal Your Data >> download now


Problem

Most of the time when we write a T-SQL query to retrieve data from a specific table, we are interested in retrieving a specific sorted range and number of records rather than retrieving all the records. The range of data can be managed using the WHERE clause, the sort of the data is managed by the ORDER BY clause and the retrieved number of records can be managed by the TOP or ROW_NUMBER statements. Which one is better, using the TOP or ROW_NUMBER statements to limit the number of retrieved records?

Solution

The TOP T-SQL statement is used to limit the number of rows returned in the query result set to a specific number. If the TOP statement is used with the ORDER BY clause, the returned result will be limited to the first ordered N records; otherwise, it will return the first N records with no order.

On the other hand, the ROW_NUMBER statement returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.

It is not a difficult task to return the same query results written with the TOP statement vs. the ROW_NUMBER statement. However, is it better to limit the number of records returned from the query using the TOP or ROW_NUMBER statements?

For each of the runs below, I enabled the TIME and IO statistics and the Actual Execution Plan to compare the queries performance. This is the command to turn on the stattsics for IO and TIME.

SET STATISTICS IO ON
SET STATISTICS TIME ON

Test 1 - Return 10 Rows

Let us run the below two queries that return the first 10 records from the Product table in two ways, the first using the TOP statement and the second using the ROW_NUMBER statement. I enabled the TIME and IO statistics and the Actual Execution Plan to compare the queries performance:

SELECT TOP 10 [ProductID]
      ,[Name]
      ,[ProductNumber]
FROM [MSSQLTipsDemo].[Production].[Product]
ORDER BY ProductID 
OPTION (RECOMPILE)
GO
 
SELECT PP.[ProductRank],PP.ProductID , PP.[Name]
      ,PP.[ProductNumber]
FROM 
(
   SELECT ROW_NUMBER() OVER (ORDER BY [ProductID]) AS [ProductRank] 
      ,ProductID
      ,[Name]
      ,[ProductNumber]
   FROM [MSSQLTipsDemo].[Production].[Product]
   ) AS PP
WHERE PP.[ProductRank] <=10
ORDER BY [ProductRank] 
OPTION (RECOMPILE)
   

You can see the below results are the same, as they are using the ORDER BY clause:

Result1

Comparing the execution plans for both queries, you will find that the query using the TOP statement (37%) has less weight compared to the query using the ROW_NUMBER statement (63%) relative to the batch.

Plan1

The IO statistics show that the query using the TOP statement performs less logical reads than the query using the ROW_NUMBER statement:

IO1

The TIME statistics show the query using the TOP statement executed faster than the query using the ROW_NUMBER statement:

Time1

Comparing the Actual and Estimated number of rows returned from the Clustered Index Scan operator in the two queries, both actual and estimated number of rows for the query using the TOP statement are equal, which is not the case for the querying using the ROW_NUMBER statement where there are different actual and estimated values. Note this difference as we do more tests.

Rows1

Test 2 - Return 100 Rows

Let us increase the number of records returned from the previous queries to be 100 records:

SELECT TOP 100 [ProductID]
      ,[Name]
      ,[ProductNumber]
FROM [MSSQLTipsDemo].[Production].[Product]
ORDER BY ProductID 
OPTION (RECOMPILE)
GO
 
SELECT PP.[ProductRank],PP.ProductID , PP.[Name]
      ,PP.[ProductNumber]
FROM 
(
   SELECT ROW_NUMBER() OVER (ORDER BY [ProductID]) AS [ProductRank] 
      ,ProductID
      ,[Name]
      ,[ProductNumber]
   FROM [MSSQLTipsDemo].[Production].[Product]
   ) AS PP
WHERE PP.[ProductRank] <=100
ORDER BY [ProductRank] 
OPTION (RECOMPILE)
   

Comparing the execution plans for both queries again, you will see that the two queries have the same weight:

Plan2

Moreover, both queries perform the same number of logical reads.

IO2

The query using the TOP statement executed faster than the query using the ROW_NUMBER statement this time too.

Time2

Comparing the Actual and Estimated number of rows returned from the Clustered Index Scan operator of the two queries, both actual and estimated number of rows for the both queries are the same:

Rows2

Test 3 - Return 1000 Rows

Increasing the number of records returned from the previous queries again to be 1000 records this time:

SELECT TOP 1000 [ProductID]
      ,[Name]
      ,[ProductNumber]
FROM [MSSQLTipsDemo].[Production].[Product]
ORDER BY ProductID 
OPTION (RECOMPILE)
GO
 
SELECT PP.[ProductRank],PP.ProductID , PP.[Name]
      ,PP.[ProductNumber]
FROM 
(
   SELECT ROW_NUMBER() OVER (ORDER BY [ProductID]) AS [ProductRank] 
      ,ProductID
      ,[Name]
      ,[ProductNumber]
   FROM [MSSQLTipsDemo].[Production].[Product]
   ) AS PP
WHERE PP.[ProductRank] <=1000
ORDER BY [ProductRank] 
OPTION (RECOMPILE)
   

Comparing the execution plans for both queries, you will find that the query using the ROW_NUMBER statement has less weight than the query using the TOP statement this time:

Plan3

In addition, both queries perform the same number of logical reads:

IO3

The query with the TOP statement again executed faster than the query with the ROW_NUMBER.

Time3

The actual and estimated number of rows for the query using the TOP statement are equal, but the numbers are different for the ROW_NUMBER statement.

Row3

Conclusion

Execution Plan

The execution plan comparison gave us a different indication each time with how long the queries took relative to the batch.  First TOP was faster, than it was slower, but the overall execution plans were the same will all of the tests.  Therefore we cannot build our conclusion using just the execution plan comparison.  It seems we are comparing the estimated costs for these queries, based on the estimated number of rows returned from the queries, not the actual number of rows. This is clear from the difference between the actual number and estimated number of records shown in the Clustered Index Scan operator for the query that used the ROW_NUMBER statement. Therefore, we cannot trust the costs that are calculated based on the estimated values to compare the queries performance.

IO Stats

Moving to the IO statistics gathered from the previous tests, there was a small difference in the number of logical reads between the query using the TOP statement versus the ROW_NUMBER statement for the first test, but as the result sets got bigger the logical reads were equal.

Time Stats

The Time statistics are clear in all tests. The time required to execute the query using the TOP statement was always less than the ROW_NUMBER statement. This shows for our example, that the TOP statement was always faster than the ROW_NUMBER statement.

Next Steps
  • This is just a start to get an idea of how things can act differently.  It is better to use more than one performance metric to get the real performance difference between two queries, without basing your decision on one set of tests.
  •  Take the time to write queries in multiple ways and capture the stats as we did above to see what works the best and implement the fastest solution.


Last Update:


signup button

next tip button



About the author
MSSQLTips author Ahmad Yaseen Ahmad Yaseen is a SQL Server DBA with a bachelor’s degree in computer engineering as well as .NET development experience.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Friday, June 30, 2017 - 3:34:32 PM - Ahmad Yaseen Back To Top

 Thank you all fot your valuable input here.

Dear Sebastien,

Please note that the  ROW_NUMBER() can't be used in the hAVING clause to replace the nested SELECT getting the below error:

Msg 4108, Level 15, State 1, Line 21

Windowed functions can only appear in the SELECT or ORDER BY clauses.

Best Regards,

Ahmad

 

 


Thursday, June 29, 2017 - 12:22:22 PM - Scott Coleman Back To Top

I can't imagine using ROW_NUMBER() instead of TOP in a simple query like your example.  I think a better comparison would be a lookup where TOP requires a correlated subquery.

We have an Employee table, and an EmployeeContact table with multiple types (email, home phone, business phone, cell phone, fax, etc) and possibly multiple entries for each type.  To get one "best" phone number for each employee using TOP requires a correlated subquery executed once for every Employee row.

SELECT  e.EmployeeID, e.FirstName, e.LastName,
        Phone = ( SELECT TOP 1 ContactValue
                  FROM        dbo.EmployeeContact
                  WHERE        TypeCode IN ( 'BSP', 'MBL', 'CRP', 'HMP' )
                  ORDER BY    TypeCode, Rank )
FROM dbo.Employee e;

Using ROW_NUMBER, the Employee table can be joined to a subquery that sequences EmployeeContact in one pass.  The TOP query uses a Nested Loop join, the ROW_NUMBER query uses a more efficient Merge Join.  The estimated execution time is 82% for TOP and 18% for ROW_NUMBER 

SELECT  e.EmployeeID, e.FirstName, e.LastName,
        Phone = c.ContactValue
FROM dbo.Employee e
LEFT JOIN ( 
    SELECT  EmployeeID, ContactValue,
            Best = ROW_NUMBER() OVER ( PARTITION BY EmployeeID ORDER BY TypeCode, Rank )
    FROM  dbo.EmployeeContact
    WHERE TypeCode IN ( 'BSP', 'MBL', 'CRP', 'HMP' ) 
) c ON c.EmployeeID = e.EmployeeID AND c.Best = 1;

 


Thursday, June 29, 2017 - 6:44:27 AM - Sébastien Back To Top

hello, why you don't use the 'having' clause in place of nested 'select' with the row_number ? Performances are not better ?

 

 

 

 


Learn more about SQL Server tools