SQL Server 2012 Analytical Functions - Percentile_Cont and Percentile_Disc
SQL Server 2012 introduces new analytical functions PERCENTILE_DISC and PERCENTILE_CONT. In this tip we will be exploring these functions and how to use them.
PERCENTILE_DISC() : this computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset. For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.
PERCENTILE_CONT() : this calculates a percentile based on a continuous distribution of the column value.
Syntax of these functions are :
PERCENTILE_DISC | PERCENTILE_CONT ( numeric_literal ) WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] ) OVER ( [ <PARTITION_BY_CLAUSE> ] )
Let me explain this using example.
Create Test_Project table on the database TestDB and insert some data:
CREATE DATABASE [TestDB] --Create testable to hold some data CREATE TABLE [dbo].[test]( [TestA] [int] NOT NULL, [TestB] [nchar](10) NOT NULL, [TestC] [int] NOT NULL ) ON [PRIMARY] --Insert some test data insert into test values( 101,'A',565) insert into test values( 101,'B',741) insert into test values( 101,'C',369) insert into test values( 101,'D',111) insert into test values( 102,'E',214) insert into test values( 102,'F',69) insert into test values( 102,'G',697) insert into test values( 103,'H',97) insert into test values( 103,'I',121) insert into test values( 103,'J',198) insert into test values( 104,'K',648) insert into test values( 104,'L',444) insert into test values( 104,'M',219)
The query would be as follows:
select TestA,TestB,TestC, CUME_DIST() OVER (PARTITION BY [TestA] ORDER BY [TestC]) as CD, PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY [TestC]) OVER (PARTITION BY [TestA]) AS PD, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY [TestC]) OVER (PARTITION BY [TestA]) AS PC FROM test
The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them:
RN = (1 + (P * ( N - 1 )) P = Percentile Specified N = Number of rows CRN = CEILING(RN) FRN = FLOOR(RN). If (CRN = FRN = RN) then the result is [value of expression from row at RN] Otherwise the result is (CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN)
So as per given formula for group 101 (column TestA):
Calculating PD The first value greater or equal to 0.5 (CD column) is row 2, so PD = 369 Calculating PC Number of rows N = 4 ( Even ) P = 0.5 RN = (1 + (0.5 * ( 4 - 1 )) = 2.5 CRN = CEILING(2.5) = 3 FRN = FLOOR(2.5) = 2 CRN is not equal to FRN so PC is: (3 - 2.5) * 369 + (2.5 - 2) * 565 = 467
PERCENTILE_DISC() function takes a percentile as a input parameters. It returns the value as an answer of which value is equal or greater to the percentile value which is passed. For example we are passing 0.5 into the PERCENTILE_DISC() function. It will go through the resultset and identify which rows have values which are equal to or greater than 0.5. So PERCENTILE_DISC(0.5) looks for values in the CD column for group 101 which are 369 and 565, so the PD value will be 369.
Now let us take a look at another example by passing 0.9 as the input parameter.
Calculating PD The only value greater than 0.9 (CD column) is row 4, so PD = 741 Calculating PC Number of rows N = 4 ( Even ) P = 0.9 RN = (1+ ( 0.9 * ( 4 - 1 )) = 3.7 CRN = CEILING(3.7) = 4 FRN = FLOOR(3.7) = 3 CRN is not equal to FRN so the formula is (4 - 3.7) * 565 + (3.7 - 3) * 741 = 688.2
- Review these other related new functions
About the author
View all my tips
Article Last Updated: 2012-06-01