SQL Server 2012 Analytical Functions - Percentile_Cont and Percentile_Disc

By:   |   Comments (1)   |   Related: > Functions System


Problem

SQL Server 2012 introduces new analytical functions PERCENTILE_DISC and PERCENTILE_CONT. In this tip we will be exploring these functions and how to use them.

Solution

PERCENTILE_DISC() : this computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset. For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.

PERCENTILE_CONT() : this calculates a percentile based on a continuous distribution of the column value.

Syntax of these functions are :

PERCENTILE_DISC  | PERCENTILE_CONT ( numeric_literal ) WITHIN GROUP 
( ORDER BY order_by_expression [ ASC | DESC ] )   OVER ( [ <PARTITION_BY_CLAUSE> ] )

Let me explain this using example.

Create Test_Project table on the database TestDB and insert some data:

CREATE DATABASE [TestDB]
 
--Create testable to hold some data
CREATE TABLE [dbo].[test](
 [TestA] [int] NOT NULL,
 [TestB] [nchar](10) NOT NULL,
 [TestC] [int] NOT NULL
) ON [PRIMARY]
--Insert some test data
insert into test values( 101,'A',565)
insert into test values( 101,'B',741)
insert into test values( 101,'C',369)
insert into test values( 101,'D',111)
insert into test values( 102,'E',214)
insert into test values( 102,'F',69)
insert into test values( 102,'G',697)
insert into test values( 103,'H',97)
insert into test values( 103,'I',121)
insert into test values( 103,'J',198)
insert into test values( 104,'K',648)
insert into test values( 104,'L',444)
insert into test values( 104,'M',219)

The query would be as follows:

select TestA,TestB,TestC,
CUME_DIST() OVER (PARTITION BY [TestA] ORDER BY [TestC]) as CD,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY [TestC]) OVER (PARTITION BY [TestA]) AS PD,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [TestC]) OVER (PARTITION BY [TestA]) AS PC
FROM test

syntax query for Percentile_DISC & Percentile_CONT will be

The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them:

RN = (1 + (P * ( N - 1 ))
P = Percentile Specified
N = Number of rows
 
CRN = CEILING(RN) 
FRN = FLOOR(RN).
If (CRN = FRN = RN) then the result is [value of expression from row at RN]
Otherwise the result is
    (CRN - RN) * (value of expression for row at FRN) +
    (RN - FRN) * (value of expression for row at CRN)

So as per given formula for group 101 (column TestA):

Calculating PD
   The first value greater or equal to 0.5 (CD column) is row 2, so PD = 369
Calculating PC
   Number of rows N = 4 ( Even )
   P = 0.5
RN = (1 + (0.5 * ( 4 - 1 )) = 2.5
CRN = CEILING(2.5) = 3
FRN = FLOOR(2.5) = 2
CRN is not equal to FRN so PC is: (3 - 2.5) * 369 + (2.5 - 2) * 565 = 467

PERCENTILE_DISC() function takes a percentile as a input parameters. It returns the value as an answer of which value is equal or greater to the percentile value which is passed. For example we are passing 0.5 into the PERCENTILE_DISC() function. It will go through the resultset and identify which rows have values which are equal to or greater than 0.5. So PERCENTILE_DISC(0.5) looks for values in the CD column for group 101 which are 369 and 565, so the PD value will be 369.

Now let us take a look at another example by passing 0.9 as the input parameter.

PERCENTILE_DISC() function takes a percentile as a imput parameters

Calculating PD
   The only value greater than 0.9 (CD column) is row 4, so PD = 741
Calculating PC
   Number of rows N = 4 ( Even )
   P = 0.9
 
RN = (1+ ( 0.9 * ( 4 - 1 )) = 3.7 
CRN = CEILING(3.7) = 4 
FRN = FLOOR(3.7) = 3
CRN is not equal to FRN so the formula is (4 - 3.7) * 565 + (3.7 - 3) * 741 = 688.2 
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Rajendra Gupta Rajendra Gupta is a Consultant DBA with 14+ years of extensive experience in database administration including large critical OLAP, OLTP, Reporting and SharePoint databases.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Wednesday, May 8, 2013 - 10:11:26 AM - MAYANK JAIN Back To Top (23813)

Hi, Really a nice post. I had 2 questions : 

1. What is RN?

2. How did you get to know the formula for calculating PC & PD?















get free sql tips
agree to terms