Learn more about SQL Server tools

   
   















































SQL Server 2005 and 2008 Ranking Functions Row_Number and Rank

MSSQLTips author Arshad Ali By:   |   Read Comments (8)   |   Related Tips: More > Functions - System
Problem
SQL Server introduced four different ranking functions either to rank records in a result-set or to rank records within groups of records of a result-set. With this inclusion we are no longer required to write several lines of code to get ranking. It does not only help in simplifying the query, but also improves the performance of the query. So now the questions are, what are these ranking functions, how do they work and how do they differ from each other?

Solution
In this tip I am going to discuss ROW_NUMBER and RANK ranking functions, how they work and how they differ from each other. In Part 2 of this series, I will talk about DENSE_RANK and NTILE.


ROW_NUMBER()

The ROW_NUMBER function simply assigns sequential numbering to the records of a result-set or to the records within groups of a result-set. You can create these groupings (partition the records) using the PARTITION BY clause. The syntax for ROW_NUMBER function is:

ROW_NUMBER() OVER ( [PARTITION BY partition_value_expression , ... [ n ]] ORDER BY order_value_expression , ... [ n ])

  • OVER clause is required in all the ranking functions and with that you specify the partitioning and ordering of records before the ranking functions are evaluated. If you don't specify it, you will get an error similar to "Incorrect syntax near 'ROW_NUMBER', expected 'OVER'."
  • PARTITION BY clause is not mandatory and if you don't specify it all the records of the result-set will be considered as a part of single record group or a single partition and then ranking functions are applied. When you specify a column/set of columns with PARTITION BY clause then it will divide the result-set into record groups/partitions and then finally ranking functions are applied to each record group/partition separately and the rank will restart from 1 for each record group/partition separately.
  • ORDER BY clause is mandatory and if you don't specify it you will get an error similar to "The ranking function "ROW_NUMBER" must have an ORDER BY clause.". With this clause you specify a column or a set of columns which will be used to order the records within a result-set or within record groups/partitions of a result-set. Please note you can specify only those columns which are being made available by the FROM clause of the query. Also, you can not specify an integer to represent the position of a column, if you do so you will get "Windowed functions do not support integer indices as ORDER BY clause expressions." error.

Script #1 has two queries (All the examples in this tip were run against the AdventureWorks database), the first query does not use the PARTITION BY clause and hence all the records are considered to be in a single partition and then records are sorted by BirthDate column and finally the sequential numbering is assigned.

The second query, as you can see, creates a partition (group of rows) on the basis of ManagerID column and then assigns sequential numbering to each row within each partition (group of rows) separately.

Script #1 - ROW_NUMBER Function

--This script assign sequential number to each row
--of result-set which is ordered on BirthDate column
SELECT ROW_NUMBER() OVER (
   
ORDER BY BirthDateAS RowNumber
LoginIDManagerIDTitleBirthDateMaritalStatusGender 
FROM 
[HumanResources].[Employee]

--This script partitions the result-set into multiple 
--partitions (groups) on the basis of ManagerID and 
--then assign sequential number to each row within 
--each partition (group), please note each row within 
--the partition (group) is sorted on BirthDate Column
SELECT ROW_NUMBER() OVER (
   
PARTITION BY ManagerID 
   
ORDER BY BirthDateAS RowNumber
LoginIDManagerIDTitleBirthDateMaritalStatusGender 
FROM 
[HumanResources].[Employee]

Sometimes you might want to have sequential numbering of records without sorting the result-set (might be for performance reasons) in other words you want to have numbering on the records as they are stored in the database. You can use the second script provided in the Script #2 table below. Have a look at the execution plans generated for these two queries from Script #2. Note the first query uses the SORT physical operator (which itself is 65% of the query) which is an expensive operator, because of its nature of being a blocking and memory consuming operator. If you use a column or set of columns in the ORDER BY on which there is a clustered index, the Query Optimizer will not include the SORT operator, but rather it will use the clustered index which has already sorted the data. For example, if you replace "ORDER BY BirthDate" to "ORDER By EmployeeID" (which has a clustered index) the optimizer will not use the SORT operator.

Script #2 - ROW_NUMBER without SORT

--This script assign sequential number to each row
--of resultset which is ordered on BirthDate column
SELECT ROW_NUMBER() OVER (
   
ORDER BY BirthDateAS RowNumber
LoginIDManagerIDTitleBirthDateMaritalStatusGender 
FROM [HumanResources].[Employee]

--This script assign sequential number to each row
--of resultset as it is stored in the database
SELECT ROW_NUMBER() OVER (
   
ORDER BY (SELECT 1)) AS RowNumber
LoginIDManagerIDTitleBirthDateMaritalStatusGender 
FROM 
[HumanResources].[Employee]

The inclusion of the ROW_NUMBER function (and other ranking functions) simplifies many query for which we had to write several lines of code. For example you can delete duplicate records from a table with just one query. Refer to "Different strategies for removing duplicate records in SQL Server" for more details.


RANK()

The RANK function instead of assigning a sequential number to each row as in the case of the ROW_NUMBER function, it assigns rank to each record starting with 1. If it encounters two or more records to have the same ORDER BY <columns> values, it is said to be a tie and all these records get the same rank. For example, in the first image below, you can see the first and second records have the same "Accountant" value in the Title column and hence they both got the same rank.

The second query in Script #3, shows you an example where the result-set has been partitioned on the basis of the Gender column and then ranking is assigned in each group separately. The syntax for the RANK function is very much similar to the syntax of the ROW_NUMBER function as discussed above:

RANK() OVER ( [PARTITION BY partition_value_expression , ... [ n ]] ORDER BY order_value_expression , ... [ n ])

Script #3 - RANK Function

--This script assign rank to each row of result-set which is
--ordered by Title column. If two or more records happen to have 
--same value for Title Column they will get the same rank
SELECT RANK()  OVER 
   
ORDER BY TitleAS [RecordRank],
LoginIDManagerIDTitleBirthDateMaritalStatusGender 
FROM 
HumanResources.Employee

--This script assign rank to each row of a partition (group of records) 
--which is partitioned by Gender column value and which is ordered by 
--Title column. If two or more records happen to have same value for 
--Title Column in the partition they will get the same rank
SELECT RANK()  OVER 
   
PARTITION BY Gender
   
ORDER BY TitleAS [RecordRank],
LoginIDManagerIDTitleBirthDateMaritalStatusGender 
FROM 
HumanResources.Employee

If you notice the ranking is not consecutive with the use of the RANK function; that is because the RANK function ranks records of the result-set sequentially starting from 1.  When it finds a tie it assigns the same rank to the all the records in the tie, but still keeps incrementing the record counter so the next record will get a new rank which would be (previous rank + no of records in the current tie + 1) . For example in the first image of Script #3 you can see that the first and second record has "Accountant" title so they both have the same rank i.e. 1, but the next record has a rank of 3 ( 0 + 2 + 1 ).

Next Steps



Last Update: 2/17/2010


About the author
MSSQLTips author Arshad Ali
Arshad Ali is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft.

View all my tips


print tip Print  
Become a paid author





join MSSQLTips for free SQL Server tips     



Learn more about SQL Server tools
Post a comment or let the author know this tip helped you.

       All comments are reviewed, so stay on subject or we may delete your comment.

*Name   *Email Notify for updates



       Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Monday, October 27, 2014 - 6:26:08 AM - Rana Read The Tip

thanks, its working nice


Thursday, July 24, 2014 - 12:09:43 AM - Mike Read The Tip

Hi can u help me to group student scores into quintile. I think there is a feature in SQL Server 2012, but still we are have not upgrade to it as we are using 2008R2. I tried Ntile(5) but it not generating the desired result. I need below Quintile Column

Student Score Quintile.

Student1 20 1

Student2 20 1

Student3 30 2

Student4 30 2

Student5 40 2

Student6 40 2

Student7 50 3

Student8 50 3

Student9 60 3

Student10 70 4

Student11 70 4

Student12 80 4

Student13 80 4

Student14 90 5


Sunday, September 22, 2013 - 9:58:16 AM - Oleg Read The Tip

Hi Arshad. Thanks for nice article.

Question: I need to create sequential numbers with standard partition, I mean, I have 36 records and I want to mark first 16 as 1, second 16 as 2 and what left as 3. Is there any way to do this with any RANK functions?

 

Thank you in advance


Monday, August 26, 2013 - 4:09:58 AM - sasanka Read The Tip

nice article....thanks


Tuesday, July 09, 2013 - 4:50:14 PM - Miguel Read The Tip

Thank you for your great job . 


Wednesday, August 24, 2011 - 1:49:45 AM - Arshad Read The Tip

Hi Dharanitharan,

Thanks for your comments.

PARTITION BY clause is used to create multiple groups inside a returned resultset based on the columns specified with it and then ranking functions are applied. All columns are required in cases where you want to identify the duplicate records in the returned resultset.

For example, you can create groups of all the duplicate records by considering all the columns of the table and then applying ranking function on each of this group, keeping only one record and deleting all other records from that particular group.

WITH CTE AS
(
SELECT ROW_NUMBER() OVER
(PARTITION BY [FirstName], [LastName], [Address]
Order BY [FirstName] DESC, [LastName] DESC, [Address] DESC )
AS RowNumber,
[FirstName], [LastName], [Address]
FROM Employee tbl )
DELETE FROM CTE Where RowNumber > 1
GO

For more details, refer to this tip http://www.mssqltips.com/tip.asp?tip=1918

Hope it helps.

 


Tuesday, August 23, 2011 - 11:15:10 PM - Dharanitharan Read The Tip

 

Great article Bro! It helped me to understand the basics of Partition by and row number.

In some coding snippets i have seen that more than one column is used in partition by clause. In that case how the partition by will work??

Pls clarify.

 

Thanks in advance!


Wednesday, February 17, 2010 - 12:11:14 PM - nosnetrom Read The Tip
GREAT tip! I needed this functionality a couple weeks ago--wish I'd known it then! Thanks!



 
Sponsor Information