Learn more about SQL Server tools

 

Tutorials          DBA          Dev          BI          Career          Categories          Events          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Events

DBA    Dev    BI    Categories

 

Using T-SQL to Perform Z-Score Column Normalization in SQL Server


By:   |   Read Comments   |   Related Tips: More > T-SQL

Problem

In data mining, we sometimes need to perform techniques such as Z-score normalization on numeric data type columns to prevent one column from skewing or dominating the models produced by the machine learning algorithms.  How can we perform this in SQL Server with T-SQL code?

Solution

In this tip, we will demonstrate how to use T-SQL to perform a Z-score normalization. The Z-score normalized value equals (the original column value minus the mean column value) divided by the standard deviation of the column values. We will use T-SQL's AVG() function to calculate the mean and the STDEV() function to calculate the standard deviation.

We will create a table that contains two data columns, AttributeX and AttributeY, and perform the Z-score normalization on each of these columns.

IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblExample')
  DROP TABLE dbo.tblExample
go
create table dbo.tblExample
(
  pKey integer identity(1,1) PRIMARY KEY,
  AttributeX decimal(7,3),
  AttributeY decimal(7,3)
)

Next, we will populate this table with 27 rows of data. AttributeY will be 10 times the value of AttributeX.

insert into dbo.tblExample values (24, 240)
insert into dbo.tblExample values (26, 260)
insert into dbo.tblExample values (27, 270)
insert into dbo.tblExample values (27, 270)
insert into dbo.tblExample values (30, 300)
insert into dbo.tblExample values (31, 310)
insert into dbo.tblExample values (31, 310)
insert into dbo.tblExample values (32, 320)
insert into dbo.tblExample values (33, 330)
insert into dbo.tblExample values (33, 330)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (41, 410)
insert into dbo.tblExample values (44, 440)
insert into dbo.tblExample values (44, 440)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (47, 470)
insert into dbo.tblExample values (51, 510)
insert into dbo.tblExample values (56, 560)
insert into dbo.tblExample values (57, 570)
insert into dbo.tblExample values (63, 630)
insert into dbo.tblExample values (81, 810)

After inserting the rows, we will select the row count and the mean and standard deviation values from our data columns.

select 
count(*) as RowQuantity,
avg(AttributeX) as MeanX,
stdev(AttributeX) as StandardDeviationX,
avg(AttributeY) as MeanY,
stdev(AttributeY) as StandardDeviationY
from dbo.tblExample

Data statistics

Now we will use the T-SQL code below to perform Z-score normalizations on AttributeX and AttributeY.

--Declare the variables
declare @meanX as decimal(7,3)
declare @standardDeviationX as decimal(7,3)
declare @meanY as decimal(7,3)
declare @standardDeviationY as decimal(7,3)

--Set the variables
set @meanX=(select avg(AttributeX) from dbo.tblExample)
set @standardDeviationX=(select stdev(AttributeX) from dbo.tblExample)
set @meanY=(select avg(AttributeY) from dbo.tblExample)
set @standardDeviationY=(select stdev(AttributeY) from dbo.tblExample)

--Perform the normalization
select 
AttributeX, 
(AttributeX - @meanX)/(@standardDeviationX) as NormalizedX,
AttributeY, 
(AttributeY - @meanY)/(@standardDeviationY) as NormalizedY 
from dbo.tblExample

The results from the previous query are shown below. Notice how the normalized values for AttributeX and AttributeY are equal on each row, while the original values differ. Also notice how all normalized values less than the mean are negative, while those values greater than the mean are positive.

Original and normalized values
Next Steps

Normalized values can be inserted into a table and then read into data mining algorithms. You can read more about SQL Server data mining in the following tips:



Last Update:






About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

View all my tips


 









Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


Get free SQL tips:

*Enter Code refresh code     



Learn more about SQL Server tools