Understanding SQL Ranking Functions ROW_NUMBER(), RANK(), DENSE_RANK() and NTILE()

By:   |   Updated: 2023-05-05   |   Comments (1)   |   Related: More > TSQL


Problem

Do you know what each of the ranking functions in SQL Server does? Maybe you don't know what they all do, but what about at least one? A basic understanding of ranking functions is critical to getting the most out of performing analytical queries and acing that upcoming interview. Most data professionals might be able to provide a use case for ROW_NUMBER(), but what about the differences between RANK() and DENSE_RANK()? Also, are there any situations where you would use NTILE()?

Solution

In this SQL tutorial, I'll explore each of the four ranking functions in detail. We'll start by looking at the fundamental reasons to incorporate ranking into your T-SQL. I'll start with one of my favorites, ROW_NUMBER(), and check out some of its uses. Next, I'll investigate situations where you could use RANK() and DENSE_RANK(). The final one we'll explore is NTILE(). Do you know the clause each of these functions has in common? Do you have to include PARTITION BY? These are just a few questions I'll answer as we go along. By the end, you'll be ready to implement these functions on your next SQL project.

Why SQL Ranking Functions?

Why would you need to use ranking in a T-SQL query? In almost every aspect of life, some form of ranking is involved. Selecting the most crucial task to work on is an ideal way to start your day. Here is another example: Let's imagine you're competing in a martial arts tournament. Usually, organizers rank participants by their skill level. It wouldn't be much of a challenge for a black belt to spar against a white or yellow belt. Most times, there are first-place winners in multiple categories based on years of experience.

Like the examples above, there are situations where ranking by category is a requirement. Let's say you need to rank salespeople by territory. It may be unfair to compare a salesperson from a populist area, for instance, Chicago, to someone in Indianapolis. Here is where the concept of the window comes into play.

Explore Ranking Functions

Ranking functions belong to a class referred to as window functions. Why are they called windows? It has nothing to do with Microsoft Windows. The term represents a window in your house or apartment and the various panes it's divided into. For example, as I'm writing this, a window in my office has four panes of glass. Each window function has something in common, an OVER() clause. Below is the basic syntax you would use for creating a window function:

SELECT MyColum1,
       ROW_NUMBER() OVER (PARTITION BY ImportantCategory ORDER BY MyColumn2 DESC) AS Ranking
FROM MyTable;

The PARTITION BY instructs SQL on how many panes we'll have in our window. We may know this upfront. For example, if we are partitioning by state and know five states are in the dataset, we'll end up with five partitions. The PARTITION BY is optional, but you must include an ORDER BY in the OVER() clause.

Build Our Dataset

Let's create a tiny dataset representing a roster of fighters competing at a martial arts tournament. Our table contains 12 fighters with attributes representing a subjective ranking of their skills. I've also included a belt ranking which you can consider as a category.

DROP TABLE IF EXISTS #KumiteRoster;

CREATE TABLE #KumiteRoster
(
    FighterName VARCHAR(25) NOT NULL,
    SkillLevel INT NOT NULL,
    BeltRank VARCHAR(10) NOT NULL,
    Age INT NOT NULL
);

-- Insert data into the table
INSERT INTO #KumiteRoster
(
    FighterName,
    SkillLevel,
    BeltRank,
    Age
)
VALUES
('Eric Brooks', 7, 'Black',42),
('Ryu', 8, 'Brown',28),
('Daniel LaRusso', 5, 'Green',15),
('Johnny Lawrence', 10, 'Brown',17),
('Kitana', 7, 'Green',32),
('Chun-Li', 9, 'Brown',28),
('Frank Dux', 10, 'Black',27),
('Chong Li', 9, 'Black',35),
('Ray Jackson', 10, 'Brown',35),
('Tong Po', 9, 'Black',31),
('Johnny Cage', 3, 'Green',36),
('Sonya Blade', 7, 'Green',32);
GO

ROW_NUMBER Function

I have a confession: I love using the ROW_NUMBER() function. What does it do? ROW_NUMBER() returns a sequential integer for every row in our partition. ROW_NUMBER() starts over again at one when SQL encounters a new partition. How many partitions will you get if you don't include a PARTITION BY clause? That's easy, and the answer is one.

SELECT FighterName,
       ROW_NUMBER() OVER (ORDER BY FighterName DESC) AS RowNumber
FROM #KumiteRoster;
GO
Using ROW_NUMBER

ROW_NUMBER(), much like duct tape, has multiple uses. In the article "The SQL Server Numbers Table, Explained - Part 1," Aaron Bertrand creates a handy numbers table using the ROW_NUMBER() function. ROW_NUMBER also shines when it comes to finding duplicate rows. In the dataset below, ROW_NUMBER() is used to find duplicates based on the belt ranking of the fighter. We only return the first one from each partition with the highest skill level as shown in the following query.

;WITH cteBaseData
AS (SELECT FighterName,
           BeltRank,
           ROW_NUMBER() OVER (PARTITION BY BeltRank ORDER BY SkillLevel DESC) AS RowNumber
    FROM #KumiteRoster)
SELECT cteBaseData.FighterName,
       cteBaseData.BeltRank,
       cteBaseData.RowNumber
FROM cteBaseData
WHERE cteBaseData.RowNumber = 1;
GO
Removing Duplicates

One downside of ROW_NUMBER is that it doesn't handle ties. What do I mean by ties? Let's check out the example below.

SELECT FighterName,
       ROW_NUMBER() OVER (PARTITION BY BeltRank ORDER BY SkillLevel DESC) AS FighterRank,
       BeltRank,
       SkillLevel
FROM #KumiteRoster;
GO
Ties with Row_Number

Notice that Chong Li and Tong Po are both black belts with a skill level of 9. How can we indicate when SQL encounters a tie in our partition? That's where the next function comes into play.

RANK Function

The RANK() function behaves like ROW_NUMBER() in that it returns a ranking based on the ORDER BY; however, there is one significant difference. RANK() distinguishes ties within our partition, while ROW_NUMBER() does not. To illustrate, consider the syntax below:

SELECT FighterName,
       RANK() OVER (PARTITION BY BeltRank ORDER BY SkillLevel DESC) AS FighterRank,
       BeltRank,
       SkillLevel
FROM #KumiteRoster;
GO
Using the Rank Function

From the screenshot above, you can see that Chong Li and Tong Po both receive the rank of 2 for the black belts since their skills are matched.

Do you notice something? Number 3 is missing from the first partition. SQL assigns number 4 to Eric Brooks. That's because RANK() allows gaps. This behavior may not affect your decision to use it. But what if you don't want holes in your number sequence?

DENSE_RANK Function

The next function is almost identical to RANK(), except for one big difference: DENSE_RANK() doesn't allow gaps. Let's look at an example below.

SELECT FighterName,
       DENSE_RANK() OVER (PARTITION BY BeltRank ORDER BY SkillLevel DESC) AS FighterRank,
       BeltRank,
       SkillLevel
FROM #KumiteRoster;
GO
Using DENSE_RANK()

Notice from the screenshot above DENSE_RANK() doesn't skip 3. What if we wanted to include more data in our decision-making for ranking the fighters? Simple, add the additional columns to the ORDER BY clause.

SELECT FighterName,
       DENSE_RANK() OVER (PARTITION BY BeltRank ORDER BY SkillLevel, Age DESC) AS FighterRank,
       BeltRank,
       SkillLevel
FROM #KumiteRoster;
GO

NTILE Function

The last ranking function we'll look at is NTILE(). I don't think of NTILE() as a ranking function. However, Microsoft groups it in with the others. NTILE() divides the results into equal groups based on your input.

While I find ROW_NUMBER() useful, I am not a fan of NTILE(). However, if you have a result set you want to be divided into multiple groups, it's handy. Below we need to define matches for each of our fighters.

SELECT FighterName,
       NTILE(2) OVER(PARTITION BY BeltRank ORDER BY SkillLevel DESC) Match,
       BeltRank,
       SkillLevel
FROM #KumiteRoster;
GO
Matching Pairs with NTILE

What are some ways you've used NTILE()? Please let me know in the comments section.

Combine Ranking and Aggregate Functions

The last item I wanted to mention is combining ranking with other aggregate functions. An example would be if you wanted to rank the belts by the sum of the fighter's skills since a fighter could have a high skill level but only be a green belt. The code below accomplishes this task.

SELECT DENSE_RANK() OVER (ORDER BY SUM(SkillLevel) DESC) AS BeltRank,
       BeltRank
FROM #KumiteRoster
GROUP BY BeltRank;
GO
Using Aggregate function with DENSE_RANK()

You can see from the screenshot that the accumulation of skills belongs to the brown belts.

Key Takeaways

  • Ranking functions were first introduced in SQL Server 2005 and opened a new world of possibilities for working with analytical queries for calculating and displaying ranks.
  • To use any ranking function, include an OVER clause. However, the PARTITION BY is optional.
  • You can combine ranking functions with other analytical functions such as COUNT() and SUM().
  • Here is a quick summary of each function mentioned in this article.
    • ROW_NUMBER() (1,2,3,4,5..)
    • RANK() (1,2,2,4,5..)
    • DENSE_RANK() (1,2,2,3,4…)
    • NTILE(3) (1,1,1,2,2,2,3,3,3)
Next Steps
  • Would you like to learn more about ranking and analytical functions in T-SQL? Koen Verbeeck created an informative guide covering several. One of my favorites is solving the gaps and island problems with functions.
  • A numbers table can be convenient when solving problems with T-SQL. Please check out Aaron Bertrand's tip for creating one: The SQL Server Numbers Table, Explained - Part 1.
  • Kathi Kellenberger was one of the first people who introduced me to window functions. Kathi created an informative article, Introduction to T-SQL Window Functions, that covers them in detail. You can also find videos by Kathi on Pluralsight and YouTube.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jared Westover Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-05-05

Comments For This Article




Sunday, May 21, 2023 - 1:03:38 PM - Anh Back To Top (91217)
Thanks for ur sharing. It is useful for me