Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server 2012 Analytical Functions - Percentile_Cont and Percentile_Disc


By:   |   Last Updated: 2012-06-01   |   Comments (1)   |   Related Tips: More > Functions - System

Problem

SQL Server 2012 introduces new analytical functions PERCENTILE_DISC and PERCENTILE_CONT. In this tip we will be exploring these functions and how to use them.

Solution

PERCENTILE_DISC() : this computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset. For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.

PERCENTILE_CONT() : this calculates a percentile based on a continuous distribution of the column value.

Syntax of these functions are :

PERCENTILE_DISC  | PERCENTILE_CONT ( numeric_literal ) WITHIN GROUP 
( ORDER BY order_by_expression [ ASC | DESC ] )   OVER ( [  ] )

Let me explain this using example.

Create Test_Project table on the database TestDB and insert some data:

CREATE DATABASE [TestDB]
 
--Create testable to hold some data
CREATE TABLE [dbo].[test](
 [TestA] [int] NOT NULL,
 [TestB] [nchar](10) NOT NULL,
 [TestC] [int] NOT NULL
) ON [PRIMARY]
--Insert some test data
insert into test values( 101,'A',565)
insert into test values( 101,'B',741)
insert into test values( 101,'C',369)
insert into test values( 101,'D',111)
insert into test values( 102,'E',214)
insert into test values( 102,'F',69)
insert into test values( 102,'G',697)
insert into test values( 103,'H',97)
insert into test values( 103,'I',121)
insert into test values( 103,'J',198)
insert into test values( 104,'K',648)
insert into test values( 104,'L',444)
insert into test values( 104,'M',219)

The query would be as follows:

select TestA,TestB,TestC,
CUME_DIST() OVER (PARTITION BY [TestA] ORDER BY [TestC]) as CD,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY [TestC]) OVER (PARTITION BY [TestA]) AS PD,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [TestC]) OVER (PARTITION BY [TestA]) AS PC
FROM test

syntax query for Percentile_DISC & Percentile_CONT will be

The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them:

RN = (1 + (P * ( N - 1 ))
P = Percentile Specified
N = Number of rows
 
CRN = CEILING(RN) 
FRN = FLOOR(RN).
If (CRN = FRN = RN) then the result is [value of expression from row at RN]
Otherwise the result is
    (CRN - RN) * (value of expression for row at FRN) +
    (RN - FRN) * (value of expression for row at CRN)

So as per given formula for group 101 (column TestA):

Calculating PD
   The first value greater or equal to 0.5 (CD column) is row 2, so PD = 369
Calculating PC
   Number of rows N = 4 ( Even )
   P = 0.5
RN = (1 + (0.5 * ( 4 - 1 )) = 2.5
CRN = CEILING(2.5) = 3
FRN = FLOOR(2.5) = 2
CRN is not equal to FRN so PC is: (3 - 2.5) * 369 + (2.5 - 2) * 565 = 467

PERCENTILE_DISC() function takes a percentile as a input parameters. It returns the value as an answer of which value is equal or greater to the percentile value which is passed. For example we are passing 0.5 into the PERCENTILE_DISC() function. It will go through the resultset and identify which rows have values which are equal to or greater than 0.5. So PERCENTILE_DISC(0.5) looks for values in the CD column for group 101 which are 369 and 565, so the PD value will be 369.


Now let us take a look at another example by passing 0.9 as the input parameter.

PERCENTILE_DISC() function takes a percentile as a imput parameters

Calculating PD
   The only value greater than 0.9 (CD column) is row 4, so PD = 741
Calculating PC
   Number of rows N = 4 ( Even )
   P = 0.9
 
RN = (1+ ( 0.9 * ( 4 - 1 )) = 3.7 
CRN = CEILING(3.7) = 4 
FRN = FLOOR(3.7) = 3
CRN is not equal to FRN so the formula is (4 - 3.7) * 565 + (3.7 - 3) * 741 = 688.2 
Next Steps


Last Updated: 2012-06-01


get scripts

next tip button



About the author
MSSQLTips author Rajendra Gupta Rajendra is a Consultant DBA with 9+ years of extensive experience in database administration including large critical OLAP, OLTP, Reporting and SharePoint databases.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Wednesday, May 08, 2013 - 10:11:26 AM - MAYANK JAIN Back To Top

Hi, Really a nice post. I had 2 questions : 

1. What is RN?

2. How did you get to know the formula for calculating PC & PD?


Learn more about SQL Server tools