Return TOP (N) Rows using APPLY or ROW_NUMBER() in SQL Server

By:   |   Updated: 2023-04-11   |   Comments   |   Related: > TSQL


Problem

Have you ever heard the old saying that there's more than one way to peel a banana? There's also a variation about cats, but I love cats. The same thing applies to SQL. Most times, there are multiple ways to do the same thing. For instance, if you need to return a specific number of rows from a table-valued function or expression. A typical method developers use for accomplishing this task is the APPLY operator. However, is using APPLY the best method for getting the results? What about using the ROW_NUMBER() window function to return the results? How do you know which one to choose? Also, what makes one better than the other?

Solution

In this article, we'll compare two methods for achieving the same results. First, I'll explore the APPLY operator in detail, including when you typically see it used. Did you know it comes in two variants? Then we'll compare APPLY to ROW_NUMBER() for returning the TOP (n) rows from a table expression. I'll cover the criteria used for measuring performance. Which one do you think will win in a head-to-head comparison? By the end of this article, you'll know which one to choose for your next SQL project.

Exploring APPLY

Microsoft introduced the APPLY operator in SQL 2005. In an article, Arshad Ali describes APPLY as a join clause: "it allows joining between two table expressions, i.e., joining a left/outer table expression with a right/inner table expression." Since both tables can technically be an expression, for the rest of the article and to reduce confusion, I will refer to the first/left table as a table and the second as a table expression.

Another handy use of APPLY is with the OPENJSON function. You can transform JSON data into a relational format. Andrew Villazon created an extensive blog article covering the topic.

The APPLY operator comes in two variants. The first is the CROSS APPLY, which should not be confused with a join that produces a Cartesian product. The second is called the OUTER APPLY.

CROSS APPLY

It is helpful to think of a CROSS APPLY as an INNER JOIN—it returns only the rows from the first table that exist in the second table expression. You'll sometimes refer to this as the filtering or limiting type since you filter rows from the first table based on what's returned in the second.

SELECT ft.ColumnName,
       st.Amount
FROM dbo.FirstTable ft
    CROSS APPLY
(
    SELECT st.Amount
    FROM dbo.SecondTable st
    WHERE st.FirstTableId = ft.Id
) st;

OUTER APPLY

On the other hand, OUTER APPLY shares similarities with an OUTER JOIN. It returns all the rows from the first table and matches from the second. You hear this one referred to as NULL producing. If a row doesn't exist in the table expression, it's represented with a NULL.

SELECT ft.ColumnName,
       st.Amount
FROM dbo.FirstTable ft
    OUTER APPLY
(
    SELECT st.Amount
    FROM dbo.SecondTable st
    WHERE st.FirstTableId = ft.Id
) st;

Return TOP(n) Rows

A typical request you see APPLY used for is returning the TOP (n) rows from the second result set. Now that could be either CROSS or OUTER. It depends on your needs. The syntax for accomplishing this might look like the code below.

SELECT ft.ColumnName,
       st.Amount
FROM dbo.FirstTable ft
    CROSS APPLY
(
    SELECT TOP(2) st.Amount
    FROM dbo.SecondTable st
    WHERE st.FirstTableId = ft.Id
   ORDER BY st.Amount DESC
) st;

The ORDER BY clause dictates in what order the rows are ranked. In the example above, if you wanted to include the two highest, you would use the keyword DESC/DESCENDING.

The second method to return the TOP(n) rows is with ROW_NUMBER(). If you've read any of my other articles on window functions, you know I love it. The syntax below is an example of how this would work.

;WITH cte_HighestSales
AS (
   SELECT ROW_NUMBER() OVER (PARTITION BY FirstTableId ORDER BY Amount DESC) AS RowNumber,
          Amount,
          FirstTableId
   FROM dbo.SecondTable
   )
SELECT ft.ColumnName,
       st.Amount
FROM dbo.FirstTable ft
    INNER JOIN cte_HighestSales st
        ON st.FirstTableId = ft.Id
           AND st.RowNumber < 3;

Which one should you use in practice? To answer that question, we'll need to build a dataset and compare the performance of the two methods.

Build a Test Dataset

The code below creates two tables. The first table is smaller and acts more as a dimension. The second one has the most rows.

USE [master];
GO

IF DATABASEPROPERTYEX(   'PizzaTracker',
                         'Version'
                     ) IS NOT NULL
BEGIN
    ALTER DATABASE PizzaTracker SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
    DROP DATABASE PizzaTracker;
END;
GO

CREATE DATABASE PizzaTracker;
GO

ALTER DATABASE PizzaTracker SET RECOVERY SIMPLE;
GO

USE PizzaTracker;
GO

CREATE TABLE dbo.MutantFighters
(
    Id INT PRIMARY KEY,
    [Name] VARCHAR(50) NOT NULL,
    Species VARCHAR(50) NOT NULL
);

INSERT INTO dbo.MutantFighters
(
    Id,
    Name,
    Species
)
VALUES
(1, 'Leonardo', 'Turtle'),
(2, 'Michelangelo', 'Turtle'),
(3, 'Donatello', 'Turtle'),
(4, 'Raphael', 'Turtle'),
(5, 'Splinter', 'Rat'),
(6, 'Bebop', 'Warthog'),
(7, 'Rocksteady', 'Rhino');

CREATE TABLE dbo.PizzaLog
(
    Id INT IDENTITY(1, 1) NOT NULL,
    MutantId INT NOT NULL,
    Slices INT NOT NULL,
    DateEaten DATE NOT NULL
);

-- Generate dates for February.
DECLARE @StartDate DATE = '2023-02-01';
DECLARE @EndDate DATE = '2023-02-28';

WITH Dates
AS (
   SELECT @StartDate AS Date
   UNION ALL
   SELECT DATEADD(   DAY,
                     1,
                     Date
                 ) AS Date
   FROM Dates
   WHERE Date < @EndDate
   )
INSERT INTO dbo.PizzaLog
(
    MutantId,
    Slices,
    DateEaten
)
SELECT mf.Id,
       CASE
           WHEN mf.Id = 2 THEN
               ABS(CHECKSUM(NEWID()) % 15) + 3 -- Mikey eats the most pizza            
           ELSE
               ABS(CHECKSUM(NEWID()) % 10) + 1
       END AS Slices,
       d.Date
FROM dbo.MutantFighters mf
    CROSS JOIN Dates d
WHERE mf.id <> 5;
GO

Comparing Results

In the query below, I want to retrieve the three days each character consumed the most pizza slices. I'll start with a CROSS APPLY and an INNER JOIN on the ROW_NUMBER(). I will also turn on my actual execution plan and STATISTICS IO. We'll compare the number of logical reads and the plan cost. The plan cost isn't an exact science, but it gives a decent approximation.

SET STATISTICS IO ON;
SELECT mf.[Name],
       pl.DateEaten,
       pl.Slices
FROM dbo.MutantFighters mf
    CROSS APPLY
(
    SELECT TOP (3)
           DateEaten,
           Slices
    FROM dbo.PizzaLog pl
    WHERE pl.MutantId = mf.Id
   ORDER BY Slices DESC, DateEaten ASC
) pl;
SET STATISTICS IO OFF;

SET STATISTICS IO ON;
;WITH cte_pl AS (
SELECT ROW_NUMBER() OVER(PARTITION BY MutantId ORDER BY Slices DESC, DateEaten ASC) AS RowNumber,
MutantId,
Slices,
DateEaten
FROM dbo.PizzaLog
)
SELECT mf.[Name],
       cte_pl.DateEaten,
       cte_pl.Slices
FROM dbo.MutantFighters mf
INNER JOIN cte_pl ON cte_pl.MutantId = mf.Id
WHERE cte_pl.RowNumber < 4
SET STATISTICS IO OFF;
Logical Reads - Statistics IO
SQL Execution Plan

The results above show that CROSS APPLY performed a total of 7 logical reads while ROW_NUMBER() comes in at 3. When executed as a batch, CROSS APPLY accounts for 78% of the cost.

Let's use an OUTER APPLY and a LEFT JOIN for our second test. Here I'm also interested in the character who didn't eat any pizza.

SET STATISTICS IO ON;
SELECT mf.[Name],
       pl.DateEaten,
       pl.Slices
FROM dbo.MutantFighters mf
    OUTER APPLY
(
    SELECT TOP (3)
           DateEaten,
           Slices
    FROM dbo.PizzaLog pl
    WHERE pl.MutantId = mf.Id
   ORDER BY Slices DESC, DateEaten ASC
) pl;
SET STATISTICS IO OFF;


SET STATISTICS IO ON;
;WITH cte_pl AS (
SELECT ROW_NUMBER() OVER(PARTITION BY MutantId ORDER BY Slices DESC, DateEaten ASC) AS RowNumber,
MutantId,
Slices,
DateEaten
FROM dbo.PizzaLog
)
SELECT mf.[Name],
       cte_pl.DateEaten,
       cte_pl.Slices
FROM dbo.MutantFighters mf
LEFT JOIN cte_pl ON cte_pl.MutantId = mf.Id AND cte_pl.RowNumber < 4
SET STATISTICS IO OFF;
Logical Reads - Statistics IO
SQL Execution Plan

As you can see from the screenshots above, the results are nearly identical. Notice that I am pulling in an extra row to account for Splinter. When comparing logical reads and batch cost, the win easily goes to ROW_NUMBER(). With a smaller result set, I wouldn't stress over which method to use. However, if you're using larger tables, it might be worth reconsidering your approach. Please let me know in the comments section if you prefer one over the other.

Key Takeaways

  • You can use the window function ROW_NUMBER() and the APPLY operator to return a specific number of rows from a table expression.
  • APPLY comes in two variants CROSS and OUTER. Think of the CROSS like an INNER JOIN and the OUTER like a LEFT JOIN.
  • It will largely depend on your preference, but I often choose ROW_NUMBER() due to my fondness for it.
  • With a larger dataset, i.e., millions of rows, I'd speculate ROW_NUMBER() might provide a noticeable performance benefit.
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jared Westover Jared Westover is a passionate technology specialist at Crowe, helping to build data solutions. For the past two decades, he has focused on SQL Server, Azure, Power BI, and Oracle. Besides writing for MSSQLTips.com, he has published 12 Pluralsight courses about SQL Server and Reporting Services. Jared enjoys reading and spending time with his wife and three sons when he's not trying to make queries go faster.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-04-11

Comments For This Article

















get free sql tips
agree to terms