Calculating the Pearson Product Moment Correlation Coefficient in T-SQL

By:   |   Comments (1)   |   Related: > TSQL

Problem

I need to calculate the Pearson Product-Moment Correlation Coefficient for two sets of measurements that are stored in my SQL Server database. How can I do this?

Solution

The Pearson Product-Moment Correlation Coefficient, also known more simply as the Pearson coefficient, is a mathematical calculation to determine how well two sets of data linearly correlate.

The Pearson coefficient can have a value from -1 to +1 inclusive. The closer the Pearson coefficient is to +1, the stronger the positive correlation. In other words, as the values of the first measurement increase, so do the values of the second measurement. The closer the Pearson coefficient is to -1, the stronger the negative correlation. In other words, as the values of the first measurement increase, the values of the second measurement decrease. A Pearson coefficient near 0, whether positive or negative, indicates there is little to no correlation between the two sets of data.

Once our data is in a table, writing a T-SQL query to calculate the Pearson product-moment correlation coefficient is fairly straightforward. However, because the calculation involves floating point numbers and division, we must be careful that all of our numbers are floating point so we don't end up with a Pearson coefficient having only the integer values -1, 0 or +1. We will look at three examples in this tip. The first is a positive correlation, the second is a negative correlation and the third has no correlation. Examining each of these cases will also help us to test our code to make sure it is accurate. In all three examples, we will be calculating the Pearson product-moment correlation coefficient between Measurement A and Measurement B.

Pearson Product-Moment Correlation Coefficient

The T-SQL we will use to calculate the Pearson product-moment correlation coefficient is shown below. It is crucial for the parentheses to be placed correctly to ensure the proper order of operations.

```(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB)))
/ (StDevP(measurementA) * StDevP(measurementB)) ```

Pearson Positive Correlation Example

Let's begin with our positive correlation example. The T-SQL code below will create a table that will have columns for Measurement A and Measurement B. The insert statements will insert 20 rows into the table.

```create table dbo.tblPositiveCorrelation
(
pkID integer identity(1,1) primary key,
measurementA float,
measurementB float
)
go

insert into dbo.tblPositiveCorrelation values (1.23350258, 1.095695912)
insert into dbo.tblPositiveCorrelation values (2.740809824, 2.515451457)
insert into dbo.tblPositiveCorrelation values (3.921364427, 3.535163576)
insert into dbo.tblPositiveCorrelation values (5.727112392, 5.440233141)
insert into dbo.tblPositiveCorrelation values (7.20400738, 7.034966693)
insert into dbo.tblPositiveCorrelation values (8.631639604, 8.657499396)
insert into dbo.tblPositiveCorrelation values (9.861575884, 10.25266377)
insert into dbo.tblPositiveCorrelation values (11.26773017, 11.26025417)
insert into dbo.tblPositiveCorrelation values (12.5538764, 12.73863763)
insert into dbo.tblPositiveCorrelation values (13.77519226, 14.52156486)
insert into dbo.tblPositiveCorrelation values (15.75008844, 16.27021747)
insert into dbo.tblPositiveCorrelation values (16.8040222, 18.04587966)
insert into dbo.tblPositiveCorrelation values (18.63079433, 19.52058173)
insert into dbo.tblPositiveCorrelation values (20.39947639, 21.38713489)
insert into dbo.tblPositiveCorrelation values (21.97242484, 22.42767485)
insert into dbo.tblPositiveCorrelation values (23.19091439, 24.18502987)
insert into dbo.tblPositiveCorrelation values (24.28721758, 26.09195563)
insert into dbo.tblPositiveCorrelation values (26.24160723, 27.51123988)
insert into dbo.tblPositiveCorrelation values (27.33126123, 28.93676183)
insert into dbo.tblPositiveCorrelation values (28.73483376, 29.97030518)
go
```

Next, we will run the T-SQL below to calculate the Pearson coefficient. We will also output the values for the numerator and denominator to help us verify the formula is correct.

```SELECT
(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB))) / (StDevP(measurementA) * StDevP(measurementB))  as PearsonCoefficient,
(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB))) as numerator,
(StDevP(measurementA) * StDevP(measurementB))  as denominator
FROM dbo.tblPositiveCorrelation
```

The image below shows our results. Notice the value for the Pearson coefficient is close to +1 which indicates a strong positive correlation.

.

Pearson Negative Correlation Example

Next, we will move to our negative correlation example. We will use the T-SQL below to create and populate a separate table.

```create table dbo.tblNegativeCorrelation
(
pkID integer identity(1,1) primary key,
measurementA float,
measurementB float
)
go

insert into dbo.tblNegativeCorrelation values (1.581662726, 29.65152293)
insert into dbo.tblNegativeCorrelation values (3.4369907, 28.01601842)
insert into dbo.tblNegativeCorrelation values (5.20372199, 26.64762377)
insert into dbo.tblNegativeCorrelation values (6.548564275, 25.52941989)
insert into dbo.tblNegativeCorrelation values (8.236812206, 24.46357622)
insert into dbo.tblNegativeCorrelation values (9.37715358, 22.94984262)
insert into dbo.tblNegativeCorrelation values (10.51055702, 21.25097239)
insert into dbo.tblNegativeCorrelation values (11.82382087, 19.31299184)
insert into dbo.tblNegativeCorrelation values (13.40427251, 17.97538895)
insert into dbo.tblNegativeCorrelation values (15.24348405, 16.23958242)
insert into dbo.tblNegativeCorrelation values (16.64768328, 14.69010224)
insert into dbo.tblNegativeCorrelation values (18.04064143, 12.86494034)
insert into dbo.tblNegativeCorrelation values (19.51671012, 11.4152393)
insert into dbo.tblNegativeCorrelation values (20.58177787, 10.30092096)
insert into dbo.tblNegativeCorrelation values (22.04140726, 8.625347334)
insert into dbo.tblNegativeCorrelation values (23.81958423, 6.721438358)
insert into dbo.tblNegativeCorrelation values (25.02552769, 5.340616899)
insert into dbo.tblNegativeCorrelation values (26.9841361, 3.958114624)
insert into dbo.tblNegativeCorrelation values (28.04718847, 2.624742871)
insert into dbo.tblNegativeCorrelation values (29.6213496, 1.264092442)
go
```

The T-SQL that calculates the Pearson coefficient is the same as above except for the differing table name.

```SELECT
(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB))) / --continued
(StDevP(measurementA) * StDevP(measurementB))  as PearsonCoefficient,
(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB))) as numerator,
(StDevP(measurementA) * StDevP(measurementB))  as denominator
FROM dbo.tblNegativeCorrelation
go
```

The results of the query are shown below. Notice how the Pearson coefficient is almost -1, which indicates a negative correlation.

.

Pearson No Correlation Example

Finally, we will look at an example where there is little to no correlation. As in the previous examples, we will use the T-SQL below to create and populate a separate table.

```create table dbo.tblNoCorrelation
(
pkID integer identity(1,1) primary key,
measurementA float,
measurementB float
)
go

insert into dbo.tblNoCorrelation values (0.80544264, 0.16208355)
insert into dbo.tblNoCorrelation values (0.895563605, 0.764382306)
insert into dbo.tblNoCorrelation values (0.169711771, 0.958280698)
insert into dbo.tblNoCorrelation values (0.377226503, 0.26927763)
insert into dbo.tblNoCorrelation values (0.695216156, 0.889767386)
insert into dbo.tblNoCorrelation values (0.864166528, 0.93030605)
insert into dbo.tblNoCorrelation values (0.909937206, 0.146715435)
insert into dbo.tblNoCorrelation values (0.126278394, 0.090460955)
insert into dbo.tblNoCorrelation values (0.743036786, 0.284939579)
insert into dbo.tblNoCorrelation values (0.482858255, 0.958005081)
insert into dbo.tblNoCorrelation values (0.452936554, 0.521588833)
insert into dbo.tblNoCorrelation values (0.232092942, 0.891831068)
insert into dbo.tblNoCorrelation values (0.303048204, 0.543747358)
insert into dbo.tblNoCorrelation values (0.562067989, 0.852912301)
insert into dbo.tblNoCorrelation values (0.752175511, 0.721606105)
insert into dbo.tblNoCorrelation values (0.09224805, 0.443346524)
insert into dbo.tblNoCorrelation values (0.390609317, 0.470526115)
insert into dbo.tblNoCorrelation values (0.925485575, 0.815146232)
insert into dbo.tblNoCorrelation values (0.598509836, 0.060795557)
insert into dbo.tblNoCorrelation values (0.87523696, 0.334668821)
go
```

The T-SQL below will calculate the Pearson coefficient for the table that was created and populated above.

```SELECT
(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB))) /  --continued
(StDevP(measurementA) * StDevP(measurementB)) as PearsonCoefficient,
(Avg(measurementA * measurementB) - (Avg(measurementA) * Avg(measurementB))) as numerator,
(StDevP(measurementA) * StDevP(measurementB))  as denominator
FROM dbo.tblNoCorrelation
go
```

As we can see in the image below, the value for the Pearson coefficient is close to zero, which indicates little to no correlation.

.

Checking Pearson Calculations

Just to make sure our calculations are correct, I entered the values from all three tables into Excel and performed three separate Pearson coefficient calculations using the Pearson function in Excel. As we see below our T-SQL result is similar to the Excel result. One more thing, if the calculated value is outside the range of -1 to +1 inclusive, then there is an error in the query.

.
Next Steps

Please experiment with how different data values affect the Pearson value. Also, please check out these other tips on mathematical functions in T-SQL on MSSQLTips.com.

About the author
Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Comments For This Article

 Tuesday, January 16, 2018 - 3:47:23 PM - Michael Back To Top (74970) Well done!