Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Advanced Use of the SQL Server ORDER BY Clause


By:   |   Read Comments (4)   |   Related Tips: More > T-SQL


ALERT: Share your SQL Server knowledge and become a MSSQLTips author


Problem

The SQL Sever ORDER BY clause is commonly used in T-SQL scripts, if used properly, it can solve lots of problems. Such as:

  • How can I sample [N] records out of a table randomly?
  • How can I find the median value of a numeric column?
  • How can I order a product table by always putting a specific category at the front of the result?
  • How can I find the biggest value records based on a column in each group?

Each question may be resolved multiple ways, but I will focus on using ORDER BY in a simple SELECT statement.

Solution

ORDER BY has seen its biggest change in SQL Server 2012 when OFFSET and FETCH key words are introduced. But before that, ORDER BY can still do lots of interesting things. We will look at a few examples here.

Example 1

I have a big table, and I just want to sample 10 records randomly for my testing purpose, how can I do it?

We will prepare a sample table and then code against it.

use tempdb

-- prepare test table
if object_id('dbo.product', 'U') is not null
 drop table dbo.product;
go
create table dbo.product (id int identity primary key
, [name] varchar(100)
, [color] varchar(20)
, [price] int
)
go

-- populate the table with some data, 
-- there are three distinct colors for each product
; with L0 as (select 1 as c union all select 1 as c)
, L1 as (select L0.c from L0 cross join L0 as T )
, L2 as (select L1.c from L1 cross join L1 as T )
, L3 as (select L2.c from L2 cross join L2 as T )
, L4 as (select L3.c from L3 cross join L3 as T )
, L5 as (select rn = ROW_NUMBER() over (order by (select null) ) from L4)
insert into dbo.product (name, color, price)
select [name]='Product' + cast(rn as varchar(5))
, color = case ( cast (ceiling(rand(rn*7)*100000) as int) %3) when 0 then 'red' when 1 then 'green' else 'blue' end
, price = cast(ceiling(rand(rn*7)*123456) as int)%56789
from L5;

Now let's list 10 records randomly.

-- to list top 10 records randomly, this query is good for medium to large (5K) data set.
-- for small data set, using the 2nd way 
select top 10 * from dbo.product
order by datepart(ns, getdate())% cast((rand(id)*11111111) as int); 

-- another way commonly known
select top 10 * from dbo.product
order by newid();

Each run will return different results as shown below.

random 10 records with the SQL Server ORDER BY Clause

Actually there is a third way using OFFSET and FETCH as shown below. This approach will always randomly grab 10 records in sequence.

select  * from dbo.product
order by (select null)
offset cast(ceiling(rand()*1234567) as int)% (select count(*) from dbo.product) rows
fetch next 10 rows only

Question 2

Using the same table, we will create a small sample data and then find the median value of [Price].

We will populate the table with 9 records and later with 10 records to see whether our solution works.

truncate table dbo.product
declare @i int = 1;
while @i < 10
begin
 insert into dbo.product (name, color, price)
 select 'product'+cast(@i as varchar(3)), case @i%3 when 0 then 'blue' when 1 then 'green' else 'red' end, @i
 set @i += 1; 
end
select * from dbo.product

The results are shown below:

median_odd.jpg

We can see the median value is 5 for [Price]. So to calculate this in T-SQL, we can use the following code:

-- find the row(s) with median value 
select  *  
from dbo.product 
order by price 
offset (select count(*)-1 from dbo.product)/2  rows
fetch next (case (select count(*) from dbo.product) % 2 when 1 then 1 else 2 end) rows only
go

-- find the real median value for [Price] column
with c as (
select *
from dbo.product 
order by price 
offset (select count(*)-1 from dbo.product)/2  rows
fetch next (case (select count(*) from dbo.product) % 2 when 1 then 1 else 2 end) rows only
)
select median = avg(price*1.) from c

Calculate the median value in SQL Server with Offset and Fetch


Calculate the median value with a SQL Server CTE

Assume we insert another record into the table to make it 10 records:

truncate table dbo.product
declare @i int = 1;
while @i < 11 -- change 10 to 11
begin
 insert into dbo.product (name, color, price)
 select 'product'+cast(@i as varchar(3)), case @i%3 when 0 then 'blue' when 1 then 'green' else 'red' end, @i
 set @i += 1; 
end
select * from dbo.product

The result will be like the following, we will see two median value rows just as expected:

Calculate two median values

The key here is the OFFSET and FETCH setting, which are decided by the row count, if row count is an odd number, we set the OFFSET to (count-1)/2 and then FETCH next 1 row. But if row count is an even number, we set the same OFFSET value while set FETCH for the next 2 rows.

For an example, if row count = 9, OFFSET = (9-1)/2 = 4, FETCH = 1. If row count=10, OFFSET=(10-1)/2 = 4 (4.5 becomes 4 due to integer conversion), and FETCH=2

Question 3

In the test table created above, there are three distinct colors, Red, Green, Blue, now for my report, I want Green color product to be always on the top of my report, how can I do so in one T-SQL statement?

First let's prepare the sample data, we will still use the same table as created in Question 1, but we will insert 10 records for a better visualization.

-- populate the test table with 10 records only

truncate table dbo.product;
; with L0 as (select 1 as c union all select 1 as c)
, L1 as (select L0.c from L0 cross join L0 T)
, L2 as (select L1.c from L1 cross join L1 T)
, L3 as (select rn = ROW_NUMBER() over (order by (select null) ) from L2)
insert into dbo.product (name, color, price)
select [name]='Product' + cast(rn as varchar(5))
, color = case ( cast (ceiling(rand(rn*7)*100000) as int) %3) when 0 then 'red' when 1 then 'green' else 'blue' end
, price = cast(ceiling(rand(rn*7)*123456) as int)%100
from L3
where rn <=10;

--list records with Green product on the top
select * from dbo.product
order by case color when 'green' then 0 else 1 end asc

The results are shown below:

Order the green records first with the ORDER BY clause in SQL Server

Notice color Blue and Red are randomly displayed. So another extension to the question could be how to list records in the sequence of Green, Red and Blue

--list records with Green product on the top
select * from dbo.product
order by case color when 'green' then 0 when 'red' then 1 else 2 end asc

In this ORDER BY clause, we can assign the [color] value to a value, such as [green] to 0, [red] to 1. Since the ORDER BY is defined in ASC order, so when ORDER BY is evaluated, Green will be listed before Red because 0 is less than 1 with ASC order.

The results are shown below:

ORDER BY is defined in ASC order, so when ORDER BY is evaluated, Green will be listed before Red because 0 is less than 1 with ASC order

Question 4

How can I list the most expensive i.e. [price], product in each color group?

We will switch back to the first big sample of data to do the demo, we can just re-run the first table population script to reset up the data.

-- find the most expensive product in each color group.
select top 1 with ties *
from dbo.product
order by row_number() over (partition by color order by price desc);

ROW_NUMBER() will list the records in each color group, so the 1st record of each group will have the same value of 1 and they will be automatically selected via TOP 1 WITH TIES. When I first used ORDER BY this way long ago, I was totally amazed by the powerful beauty of ORDER BY clause.

The results are shown below:

1st most expensive in each group

Sometimes, there can be a few products that are same cost, in this case, instead of using row_number(), we should use dense_rank().

-- if there are product that are tied in price in the same group, 
-- we should use dense_rank() instead of row_number()
select top 1 with ties *
from dbo.product
order by dense_rank() over (partition by color order by price desc)

The results are shown below:

all most expensive in each group

To understand this behavior, we should know the difference of ROW_Number and Dense_Rank, the ROW_NUMBER numbers all rows sequentially (for example 1, 2, 3, 4, 5). While DENSE_RANK provides the same numeric value for ties, the rank of a row is one plus the number of distinct ranks that come before the row in question.

In other words, if two values are the same and are both the biggest value, then both rows will have a rank number = 1, while in row_number scenario, one row has row number =1 while another will be 2, thus only the row with 1 will be selected by TOP 1 WITH TIES.

Summary

In this tip, we have examined some interesting uses of ORDER BY, which are very useful under some niche business requirements. Without ORDER BY, we may write lengthy scripts to achieve the same result.

Currently, OFFSET and FETCH still have some limitations, one of which is that the OVER clause does not support OFFSET and FETCH. If this is supported, we may be able to write more elegant code when doing group processing, such as finding the Nth largest value in each group.

All code is tested with SQL Server 2012 Developer Edition.

Next Steps

One item worth digging deeper is to compare the performance of other solutions with the solutions provided in this tip. We can compare their execution plan and the statistics IO/CPU data.

For question 2, if the requirement is to find the median value of each color group, what is your solution?

Also the following links may help to better understand the ORDER BY clause:



Last Update:


signup button

next tip button



About the author
MSSQLTips author Jeffrey Yao Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Saturday, March 18, 2017 - 1:19:45 PM - jeff_yao Back To Top

 

Thanks @David for your comment.

Actually, not many DBAs need to use "WITH TIES" in daily work, that's why I always believe it is better for a DBA to switch to be a dedicated Database Developer for a few months every few years, that will definitely help to explore/understand more new T-SQL features in the context of real business requirement. :-)


Saturday, March 18, 2017 - 6:14:46 AM - David Back To Top

 Excellent article.  I can't believe I've never see "WITH TIES" before

 


Thursday, March 16, 2017 - 11:58:38 AM - jeff_yao Back To Top

Thanks for your kind words @Ken W. Actually OFFSET/FETCH clause in Order By is new feature of SQL Server 2012, so in real world, indeed no many people are using it. But it can be very handy sometimes.


Thursday, March 16, 2017 - 1:02:39 AM - Ken W Back To Top

Nice tips Jeffrey! Thank you for writing this article. I had never seen fetch or offset before so it was nice to learn something new. 

 


Learn more about SQL Server tools