![]() |
|
|
By: Arshad Ali | Read Comments (3) | Print Arshad is a SQL and BI Developer focusing on Data Warehousing projects for Microsoft. Related Tips: More |
|
Solution
In this tip I am going to discuss ROW_NUMBER and RANK ranking functions, how they work and how they differ from each other. In Part 2 of this series, I will talk about DENSE_RANK and NTILE.
ROW_NUMBER()
The ROW_NUMBER function simply assigns sequential numbering to the records of a result-set or to the records within groups of a result-set. You can create these groupings (partition the records) using the PARTITION BY clause. The syntax for ROW_NUMBER function is:
ROW_NUMBER
() OVER ( [PARTITION BY partition_value_expression , ... [ n ]] ORDER BY order_value_expression , ... [ n ])Script #1 has two queries (All the examples in this tip were run against the AdventureWorks database), the first query does not use the PARTITION BY clause and hence all the records are considered to be in a single partition and then records are sorted by BirthDate column and finally the sequential numbering is assigned.
The second query, as you can see, creates a partition (group of rows) on the basis of ManagerID column and then assigns sequential numbering to each row within each partition (group of rows) separately.
|
Script #1 - ROW_NUMBER Function |
--This script assign sequential number to each row
|
Sometimes you might want to have sequential numbering of records without sorting the result-set (might be for performance reasons) in other words you want to have numbering on the records as they are stored in the database. You can use the second script provided in the Script #2 table below. Have a look at the execution plans generated for these two queries from Script #2. Note the first query uses the SORT physical operator (which itself is 65% of the query) which is an expensive operator, because of its nature of being a blocking and memory consuming operator. If you use a column or set of columns in the ORDER BY on which there is a clustered index, the Query Optimizer will not include the SORT operator, but rather it will use the clustered index which has already sorted the data. For example, if you replace "ORDER BY BirthDate" to "ORDER By EmployeeID" (which has a clustered index) the optimizer will not use the SORT operator.
|
Script #2 - ROW_NUMBER without SORT |
|
|
The inclusion of the ROW_NUMBER function (and other ranking functions) simplifies many query for which we had to write several lines of code. For example you can delete duplicate records from a table with just one query. Refer to "Different strategies for removing duplicate records in SQL Server" for more details.
RANK()
The RANK function instead of assigning a sequential number to each row as in the case of the ROW_NUMBER function, it assigns rank to each record starting with 1. If it encounters two or more records to have the same ORDER BY <columns> values, it is said to be a tie and all these records get the same rank. For example, in the first image below, you can see the first and second records have the same "Accountant" value in the Title column and hence they both got the same rank.
The second query in Script #3, shows you an example where the result-set has been partitioned on the basis of the Gender column and then ranking is assigned in each group separately. The syntax for the RANK function is very much similar to the syntax of the ROW_NUMBER function as discussed above:
R
ANK() OVER ( [PARTITION BY partition_value_expression , ... [ n ]] ORDER BY order_value_expression , ... [ n ])|
Script #3 - RANK Function |
--This script assign rank to each row of result-set which is
|
If you notice the ranking is not consecutive with the use of the RANK function; that is because the RANK function ranks records of the result-set sequentially starting from 1. When it finds a tie it assigns the same rank to the all the records in the tie, but still keeps incrementing the record counter so the next record will get a new rank which would be (previous rank + no of records in the current tie + 1) . For example in the first image of Script #3 you can see that the first and second record has "Accountant" title so they both have the same rank i.e. 1, but the next record has a rank of 3 ( 0 + 2 + 1 ).
Next Steps
| Share: | Share | Tweet |
|
![]() |
![]() |
Free SQL Server Learning |
| Wednesday, February 17, 2010 - 12:11:14 PM - nosnetrom | Read The Tip |
| GREAT tip! I needed this functionality a couple weeks ago--wish I'd known it then! Thanks! | |
| Tuesday, August 23, 2011 - 11:15:10 PM - Dharanitharan | Read The Tip |
|
Great article Bro! It helped me to understand the basics of Partition by and row number. In some coding snippets i have seen that more than one column is used in partition by clause. In that case how the partition by will work?? Pls clarify.
Thanks in advance! |
|
| Wednesday, August 24, 2011 - 1:49:45 AM - Arshad | Read The Tip |
|
Hi Dharanitharan, Thanks for your comments. PARTITION BY clause is used to create multiple groups inside a returned resultset based on the columns specified with it and then ranking functions are applied. All columns are required in cases where you want to identify the duplicate records in the returned resultset. For example, you can create groups of all the duplicate records by considering all the columns of the table and then applying ranking function on each of this group, keeping only one record and deleting all other records from that particular group. For more details, refer to this tip http://www.mssqltips.com/tip.asp?tip=1918 Hope it helps.
|
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |