How to List First or Last n% Records Using NTILE() in SQL Server

By:   |   Updated: 2023-07-05   |   Comments   |   Related: > Functions System


Problem

Sometimes, we need to find the first or last n% records from a dataset based on a particular column. For instance, a teacher wants to know which students are in the bottom 25% based on marks. Hence, in such situations, the SQL NTILE() function helps. We will leverage this tip to get first or last n% records using NTILE() in SQL Server.

Solution

SQL Server's NTILE function is a powerful tool that allows you to divide records from a dataset into a specified number of groups based on a specified ordering. This further allows you to analyze your data more efficiently and perform various operations and calculations over the split data.

The NTILE function divides a set into equal ranges, with each division consisting of approximately equal records/rows, which is extremely useful when you want to split the data into percentile-based groups such as quartiles, deciles, or any number of ntiles you require.

This allows you to identify trends/patterns within your data and perform operations on each group separately. Hence, the tool is powerful in creating histograms, analyzing rankings, or even dividing the data to allow parallel processing.

NTILE Syntax

This is the basic syntax for the NTILE function:

--MSSQLTips.com

NTILE(Buckets) OVER (PARTITION BY partition_expression ORDER BY order_expression)
  • Buckets specifies the number of groups to divide your data into; therefore, buckets should be an integer greater than zero.
  • PARTITION BY is an optional clause that allows you to partition based on your dataset into subsets before the NTILE function is applied to the dataset. NTILE will then be applied to each separate partition.
  • ORDER BY is a required clause used to specify the column(s) used to rank/order the records within each subset/partition. In essence, it is the criteria upon which the dataset is divided into groups.
  • The partition_expression is the expression upon which the partitions will be made. In this expression, you must provide the column(s) for the partition criteria.
  • The order_expression is the expression upon which the NTILE ordering will be made. In this expression, like a partition expression, you must provide the column(s) for the ordering criteria.

In this tip, our primary focus will be on using the NTILE function to rank records based on specific criteria. Now that we have given a brief overview of the NTILE function and how it is used, it's time to dive into how to use this SQL function.

Create Tables and Data for Testing

Before we can use the NTILE function, we need a dataset to apply it.

Let's create a new database for testing:

--MSSQLTips.com

CREATE DATABASE Information;
GO;
USE Information;

Next, create two tables within this database, one for employee records and the other for student records.

Since we are using the NTILE function, each table must consist of at least one column where we can rank our records. For ranking, our Employees table can have a Salary column, while our Students table will have a Final_Marks column. We also need a column to partition data into multiple subsets to demonstrate how to use the optional PARTITION BY clause. To accommodate this, let's include a Department column in the Employees table and a Course column in the Students table.

Here is the code to create the Employee_Information table:

--MSSQLTips.com

CREATE TABLE Employee_Information
(
   Employee_ID INT PRIMARY KEY IDENTITY(1,1),
   Department VARCHAR(255),
   Salary INT
);

Here is the code to create the Student_Information table:

--MSSQLTips.com

CREATE TABLE Student_Information
(
    Student_ID INT PRIMARY KEY IDENTITY(1,1),
    Course VARCHAR(255),
    Final_Marks INT
);

Now that both tables have been created, let's populate them, starting with the Employee_Information table:

--MSSQLTips.com

INSERT INTO Employee_Information (Department, Salary) VALUES
('IT', 60000),
('IT', 50000),
('IT', 36000),
('IT', 29000),
('IT', 72000),
('IT', 52000),
('Sales', 23000),
('Sales', 53000),
('Sales', 45000),
('Sales', 56000),
('Sales', 54000),
('Sales', 46000),
('Sales', 57000),
('Sales', 55000),
('Sales', 47000),
('HR', 62500),
('HR', 58000),
('HR', 49000),
('HR', 61000),
('HR', 81000),
('HR', 59000),
('HR', 30000),
('HR', 62000),
('HR', 64000),
('HR', 60500),
('HR', 51000),
('HR', 63000);

Populate the Student_Information table now:

--MSSQLTips.com

INSERT INTO Student_Information (Course, Final_Marks) VALUES
('Math', 92),
('Math', 80),
('Math', 84),
('Math', 66),
('Math', 40),
('Math', 51),
('Math', 83),
('Math', 69),
('Math', 76),
('Math', 40),
('Math', 39),
('Math', 30),
('English', 91),
('English', 89),
('English', 76),
('English', 53),
('English', 96),
('English', 65),
('English', 74),
('English', 69),
('English', 48),
('English', 35),
('English', 77),
('English', 54);

Before we move forward, you can display your tables using the following commands.

To display the Employee_Information table:

--MSSQLTips.com

SELECT * FROM Employee_Information
Employees Table

To display the Student_Information table:

--MSSQLTips.com

SELECT * FROM Student_Information
Students Table

Now, let's analyze the data.

Using NTILE Examples on Employee Data

Suppose you are to raise the salaries of the employees whose salaries are in the first quartile and lower the salaries of the employees in the fourth quartile by specific values, irrespective of their departments. To do this, you must fetch the employees in the first and fourth quartiles before you can update the data. And so, first, you have to divide the Employees dataset into four equal groups.

Recall the syntax for NTILE:

--MSSQLTips.com

NTILE(Buckets) OVER (PARTITION BY partition_expression ORDER BY order_expression)

In this case, Buckets will be 4 (for 4 quarters). We will not be using the PARTITION BY clause for this problem. Since we are ranking our employees by Salary, we will use ORDER BY and the order_expression will be Salary.

To apply the function on the Employees table, use the following code:

--MSSQLTips.com

SELECT *, NTILE(4) OVER (ORDER BY Salary) AS Emp_Ranking
FROM Employee_Information;

The code above will display the following result:

Employees Table With NTILE(4)

As you can see, the employees' salaries are divided into four groups ranked in ascending order. The higher the salary, the higher the number. However, that is because the ORDER BY clause orders the column by ascending order by default. You can add a DESC parameter like ORDER BY order_expression DESC to make the order descending.

You can also filter the results to only get the quartiles you want to update, as follows:

--MSSQLTips.com

SELECT *
FROM
  (SELECT *, NTILE(4) OVER (ORDER BY Salary) AS Emp_Ranking
   FROM Employee_Information) AS Temp_Table 
WHERE Emp_Ranking = 4;

The code above will return the employee records whose salary is in the fourth or highest-paid quartile.

Similarly, the code below will return the employee records whose salary is in the first or lowest-paid quartile.

--MSSQLTips.com

SELECT *
FROM 
  (SELECT *, NTILE(4) OVER (ORDER BY Salary) AS Emp_Ranking
   FROM Employee_Information) AS Temp_Table
WHERE Emp_Ranking = 1;

The result of the first snippet:

Employees Table With NTILE(4), Filter NTILE = 4

The result of the second snippet:

Employees Table With NTILE(4), Filter NTILE = 1

Now that you know how to select records using the NTILE function you can, in the same way, update the records as well, to your liking.

In the previous example, we computed the NTILES irrespective of the employee's department. If you, as the manager or the owner of a company, would like to manipulate the salaries in such a way that the salary of the employee in the first quartile of each department should be raised, and the salary of the employee in the fourth quartile of each department should be lowered, you would require the PARTITION BY clause to carry out your selection.

You can do so as follows:

--MSSQLTips.com

SELECT *, NTILE(4) OVER (PARTITION BY Department ORDER BY Salary) AS Emp_Ranking
FROM Employee_Information;

The result obtained from the code snippet above is:

Employees Table With NTILE(4) Partitioned By Department

As you can see, each department has four separate NTILEs now, and you can select the fourth quartile and the first quartile of each department as follows:

--MSSQLTips.com

SELECT *
FROM 
  (SELECT *, NTILE(4) OVER (PARTITION BY Department ORDER BY Salary) AS Emp_Ranking
   FROM Employee_Information) AS Temp_Table
WHERE Emp_Ranking = 4;

The code above will return the employee records whose salary belongs to the fourth quartile, partitioned by their department, or the highest-paid quartile in each department.

Similarly, the code below returns the employee records whose salary belongs to the first quartile partitioned by their departments or the lowest-paid quartile in each department.

--MSSQLTips.com

SELECT *
FROM
  (SELECT *, NTILE(4) OVER (PARTITION BY Department ORDER BY Salary) AS Emp_Ranking
   FROM Employee_Information) AS Temp_Table
WHERE Emp_Ranking = 1;

The result of the first snippet:

Employees Table With NTILE(4), Filter NTILE = 4, Partitioned By Department

The result of the second snippet:

Employees Table With NTILE(4), Filter NTILE = 4, Partitioned By Department

Once this is done, you can update the records of the new values similarly. Now that we have shown you some examples of how NTILE works and how you can use it on the Employee_Information table, we will move on to the Student_Information table.

Using NTILE Examples on Student Table

Suppose you have the final student grades and you want to mark them according to a relative grading scheme:

  • A – Fourth Quartile
  • B – Third Quartile
  • C – Second Quartile
  • F – First Quartile

First, you need to divide your students' grades into four groups using the NTILE function, with the buckets parameter equal to 4 as shown below:

--MSSQLTips.com

SELECT *, NTILE(4) OVER (ORDER BY Final_Marks) AS Student_Grade
FROM Student_Information;

The query above returns the following table:

Students Table With NTILE(4)

Notice how our dataset consists of two courses; however, we did not use the PARTITION BY clause, so the quartiles were made irrespective of which course marks were entered. This was done to emphasize the importance of the PARTITION BY clause. The data may not always make sense without being partitioned concerning other columns, as shown in the image above, and to correct this problem, we will now use the PARTITION BY clause.

Conventionally, a lower number is used to represent a higher rank, so we will use DESC in the ORDER BY clause:

--MSSQLTips.com

SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade
FROM Student_Information;

This query returns the result we desire:

Students Table with NTILE(4), PARTITION BY Course, and ORDER BY Final_Marks DESC

Now, the Student_Grade column makes sense, as each course is now being graded separately. Let's filter our results so that we can assign the grades later on.

The students who will be receiving an 'F' grade are:

--MSSQLTips.com

SELECT *
FROM
  (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade
   FROM Student_Information) AS Temp_Table
WHERE Student_Grade = 4;
Students Table with NTILE(4), PARTITION BY Course, and ORDER BY Final_Marks DESC, Student_Grade = 4 = F

The students who will be receiving a 'C' grade are:

--MSSQLTips.com

SELECT *
FROM
  (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade
   FROM Student_Information) AS Temp_Table
WHERE Student_Grade = 3;
Students Table with NTILE(4), PARTITION BY Course, and ORDER BY Final_Marks DESC, Student_Grade = 3 = C

The students who will be receiving a 'B' grade are:

--MSSQLTips.com

SELECT *
FROM
  (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade
   FROM Student_Information) AS Temp_Table
WHERE Student_Grade = 2;
Students Table with NTILE(4), PARTITION BY Course, and ORDER BY Final_Marks DESC, Student_Grade = 2 = B

The students who will be receiving an 'A' grade are:

--MSSQLTips.com

SELECT *
FROM
  (SELECT *, NTILE(4) OVER (PARTITION BY Course ORDER BY Final_Marks DESC) AS Student_Grade
   FROM Student_Information) AS Temp_Table
WHERE Student_Grade = 1;
Students Table with NTILE(4), PARTITION BY Course, and ORDER BY Final_Marks DESC, Student_Grade = 1 = A

Thus, you can exercise a relative grading system using the NTILE function.

Conclusion

You have learned how the NTILE function groups records and partitions the data. You can now apply this function to your datasets according to your specific issue and achieve your desired results. You can then update and manipulate your data and perform whatever operations you wish on your dataset. You can also analyze the information you retrieve.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Harris Amjad Harris Amjad is a BI Artist, developing complete data-driven operating systems from ETL to Data Visualization.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-07-05

Comments For This Article

















get free sql tips
agree to terms