SQL Server Ranking Functions Row_Number and Rank

By:   |   Comments (10)   |   Related: > Functions System


Problem

SQL Server introduced four different ranking functions either to rank records in a result-set or to rank records within groups of records of a result-set. With this inclusion we are no longer required to write several lines of code to get ranking. It does not only help in simplifying the query, but also improves the performance of the query. So now the questions are, what are these ranking functions, how do they work and how do they differ from each other?

Solution

In this tip I am going to discuss ROW_NUMBER and RANK ranking functions, how they work and how they differ from each other. In Part 2 of this series, I will talk about DENSE_RANK and NTILE.

SQL Server ROW_NUMBER()

The ROW_NUMBER function simply assigns sequential numbering to the records of a result-set or to the records within groups of a result-set. You can create these groupings (partition the records) using the PARTITION BY clause. The syntax for ROW_NUMBER function is:

ROW_NUMBER() OVER ( [PARTITION BY partition_value_expression , ... [ n ]] ORDER BY order_value_expression , ... [ n ])
  • OVER clause is required in all the ranking functions and with that you specify the partitioning and ordering of records before the ranking functions are evaluated. If you don't specify it, you will get an error similar to "Incorrect syntax near 'ROW_NUMBER', expected 'OVER'."
  • PARTITION BY clause is not mandatory and if you don't specify it all the records of the result-set will be considered as a part of single record group or a single partition and then ranking functions are applied. When you specify a column/set of columns with PARTITION BY clause then it will divide the result-set into record groups/partitions and then finally ranking functions are applied to each record group/partition separately and the rank will restart from 1 for each record group/partition separately.
  • ORDER BY clause is mandatory and if you don't specify it you will get an error similar to "The ranking function "ROW_NUMBER" must have an ORDER BY clause.". With this clause you specify a column or a set of columns which will be used to order the records within a result-set or within record groups/partitions of a result-set. Please note you can specify only those columns which are being made available by the FROM clause of the query. Also, you can not specify an integer to represent the position of a column, if you do so you will get "Windowed functions do not support integer indices as ORDER BY clause expressions." error.

Script #1 has two queries (All the examples in this tip were run against the AdventureWorks database), the first query does not use the PARTITION BY clause and hence all the records are considered to be in a single partition and then records are sorted by BirthDate column and finally the sequential numbering is assigned.

The second query, as you can see, creates a partition (group of rows) on the basis of ManagerID column and then assigns sequential numbering to each row within each partition (group of rows) separately.

Script #1 - ROW_NUMBER Function

--This script assign sequential number to each row 
--of result-set which is ordered on BirthDate column 
SELECT ROW_NUMBER() OVER 
   (ORDER BY BirthDate) AS RowNumber, 
LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender  
FROM [HumanResources].[Employee]

adventure works
--This script partitions the result-set into multiple  
--partitions (groups) on the basis of ManagerID and  
--then assign sequential number to each row within  
--each partition (group), please note each row within  
--the partition (group) is sorted on BirthDate Column 
SELECT ROW_NUMBER() OVER (
   PARTITION BY ManagerID 
   ORDER BY BirthDate) AS RowNumber,  
LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender  
FROM [HumanResources].[Employee]
adventure works

Sometimes you might want to have sequential numbering of records without sorting the result-set (might be for performance reasons) in other words you want to have numbering on the records as they are stored in the database. You can use the second script provided in the Script #2 table below. Have a look at the execution plans generated for these two queries from Script #2. Note the first query uses the SORT physical operator (which itself is 65% of the query) which is an expensive operator, because of its nature of being a blocking and memory consuming operator. If you use a column or set of columns in the ORDER BY on which there is a clustered index, the Query Optimizer will not include the SORT operator, but rather it will use the clustered index which has already sorted the data. For example, if you replace "ORDER BY BirthDate" to "ORDER By EmployeeID" (which has a clustered index) the optimizer will not use the SORT operator.

Script #2 - ROW_NUMBER without SORT

--This script assign sequential number to each row 
--of resultset which is ordered on BirthDate column 
SELECT ROW_NUMBER() OVER ( 
   ORDER BY BirthDate) AS RowNumber,  
LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender  
FROM [HumanResources].[Employee] 

--This script assign sequential number to each row 
--of resultset as it is stored in the database 
SELECT ROW_NUMBER() OVER ( 
   ORDER BY (SELECT 1)) AS RowNumber,  
LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender  
FROM [HumanResources].[Employee] 
execution style

The inclusion of the ROW_NUMBER function (and other ranking functions) simplifies many query for which we had to write several lines of code. For example you can delete duplicate records from a table with just one query. Refer to "Different strategies for removing duplicate records in SQL Server" for more details.

SQL Server RANK()

The RANK function instead of assigning a sequential number to each row as in the case of the ROW_NUMBER function, it assigns rank to each record starting with 1. If it encounters two or more records to have the same ORDER BY values, it is said to be a tie and all these records get the same rank. For example, in the first image below, you can see the first and second records have the same "Accountant" value in the Title column and hence they both got the same rank.

The second query in Script #3, shows you an example where the result-set has been partitioned on the basis of the Gender column and then ranking is assigned in each group separately. The syntax for the RANK function is very much similar to the syntax of the ROW_NUMBER function as discussed above:

RANK() OVER ( [PARTITION BY partition_value_expression , ... [ n ]] ORDER BY order_value_expression , ... [ n ])

Script #3 - RANK Function

--This script assign rank to each row of result-set which is 
--ordered by Title column. If two or more records happen to have  
--same value for Title Column they will get the same rank 
SELECT RANK()  OVER (  
   ORDER BY Title) AS [RecordRank], 
LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender  
FROM HumanResources.Employee 
record rank
--This script assign rank to each row of a partition (group of records)  
--which is partitioned by Gender column value and which is ordered by  
--Title column. If two or more records happen to have same value for  
--Title Column in the partition they will get the same rank 
SELECT RANK()  OVER (  
   PARTITION BY Gender 
   ORDER BY Title) AS [RecordRank], 
LoginID, ManagerID, Title, BirthDate, MaritalStatus, Gender  
FROM HumanResources.Employee
login id
adventure works

If you notice the ranking is not consecutive with the use of the RANK function; that is because the RANK function ranks records of the result-set sequentially starting from 1. When it finds a tie it assigns the same rank to the all the records in the tie, but still keeps incrementing the record counter so the next record will get a new rank which would be (previous rank + no of records in the current tie + 1) . For example in the first image of Script #3 you can see that the first and second record has "Accountant" title so they both have the same rank i.e. 1, but the next record has a rank of 3 ( 0 + 2 + 1 ).

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Arshad Ali Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, May 12, 2015 - 3:37:56 AM - SP Back To Top (37154)

Really it is very useful and very explanatory about the Ranking functions.

 

Thanks


Friday, January 9, 2015 - 6:22:08 AM - Kaushik Mitra Back To Top (35879)

Wonderful explanation !!! Keep it up Arshad!!!


Monday, October 27, 2014 - 6:26:08 AM - Rana Back To Top (35071)

thanks, its working nice


Thursday, July 24, 2014 - 12:09:43 AM - Mike Back To Top (32850)

Hi can u help me to group student scores into quintile. I think there is a feature in SQL Server 2012, but still we are have not upgrade to it as we are using 2008R2. I tried Ntile(5) but it not generating the desired result. I need below Quintile Column

Student Score Quintile.

Student1 20 1

Student2 20 1

Student3 30 2

Student4 30 2

Student5 40 2

Student6 40 2

Student7 50 3

Student8 50 3

Student9 60 3

Student10 70 4

Student11 70 4

Student12 80 4

Student13 80 4

Student14 90 5


Sunday, September 22, 2013 - 9:58:16 AM - Oleg Back To Top (26889)

Hi Arshad. Thanks for nice article.

Question: I need to create sequential numbers with standard partition, I mean, I have 36 records and I want to mark first 16 as 1, second 16 as 2 and what left as 3. Is there any way to do this with any RANK functions?

 

Thank you in advance


Monday, August 26, 2013 - 4:09:58 AM - sasanka Back To Top (26478)

nice article....thanks


Tuesday, July 9, 2013 - 4:50:14 PM - Miguel Back To Top (25764)

Thank you for your great job . 


Wednesday, August 24, 2011 - 1:49:45 AM - Arshad Back To Top (14490)

Hi Dharanitharan,

Thanks for your comments.

PARTITION BY clause is used to create multiple groups inside a returned resultset based on the columns specified with it and then ranking functions are applied. All columns are required in cases where you want to identify the duplicate records in the returned resultset.

For example, you can create groups of all the duplicate records by considering all the columns of the table and then applying ranking function on each of this group, keeping only one record and deleting all other records from that particular group.

WITH CTE AS
(
SELECT ROW_NUMBER() OVER
(PARTITION BY [FirstName], [LastName], [Address]
Order BY [FirstName] DESC, [LastName] DESC, [Address] DESC )
AS RowNumber,
[FirstName], [LastName], [Address]
FROM Employee tbl )
DELETE FROM CTE Where RowNumber > 1
GO

For more details, refer to this tip http://www.mssqltips.com/tip.asp?tip=1918

Hope it helps.

 


Tuesday, August 23, 2011 - 11:15:10 PM - Dharanitharan Back To Top (14477)

 

Great article Bro! It helped me to understand the basics of Partition by and row number.

In some coding snippets i have seen that more than one column is used in partition by clause. In that case how the partition by will work??

Pls clarify.

 

Thanks in advance!


Wednesday, February 17, 2010 - 12:11:14 PM - nosnetrom Back To Top (4916)
GREAT tip! I needed this functionality a couple weeks ago--wish I'd known it then! Thanks!














get free sql tips
agree to terms