SQL Server Min Max Column Normalization for Data Mining


By:   |   Updated: 2014-12-23   |   Comments   |   Related: More > T-SQL


Problem

In data mining, we often need to perform min-max normalization on numeric data type columns to prevent one column from skewing or dominating the models produced by the machine learning algorithms.

Solution

In this tip, we will demonstrate how to use T-SQL to perform a min-max normalization from 0.0 to 1.0 inclusive, which can be represented as [0.0, 1.0]. The normalized value equals (the original column value minus the minimum column value) divided by the (maximum column value minus the minimum column value).

We will create a table that contains two data columns, AttributeX and AttributeY.

IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblExample')
  DROP TABLE dbo.tblExample
go
create table dbo.tblExample
(
  pKey integer identity(1,1) PRIMARY KEY,
  AttributeX decimal(7,3),
  AttributeY decimal(7,3)
)

Next, we will populate this table with 27 rows of data. AttributeY will be 10 times the value of AttributeX.

insert into dbo.tblExample values (24, 240)
insert into dbo.tblExample values (26, 260)
insert into dbo.tblExample values (27, 270)
insert into dbo.tblExample values (27, 270)
insert into dbo.tblExample values (30, 300)
insert into dbo.tblExample values (31, 310)
insert into dbo.tblExample values (31, 310)
insert into dbo.tblExample values (32, 320)
insert into dbo.tblExample values (33, 330)
insert into dbo.tblExample values (33, 330)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (36, 360)
insert into dbo.tblExample values (41, 410)
insert into dbo.tblExample values (44, 440)
insert into dbo.tblExample values (44, 440)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (46, 460)
insert into dbo.tblExample values (47, 470)
insert into dbo.tblExample values (51, 510)
insert into dbo.tblExample values (56, 560)
insert into dbo.tblExample values (57, 570)
insert into dbo.tblExample values (63, 630)
insert into dbo.tblExample values (81, 810)

After inserting the rows, we will select the row count and the minimum and maximum values from our data columns.

select 
count(*) as RowQuantity,
min(AttributeX) as MinimumX,
max(AttributeX) as MaximumX,
min(AttributeY) as MinimumY,
max(AttributeY) as MaximumY
from dbo.tblExample


Data statistics

T-SQL Code for Min Max Normalization in SQL Server

Now we will use the T-SQL code below to perform min-max normalizations on AttributeX and AttributeY.

--Declare the variables
declare @minX as decimal(7,3)
declare @maxX as decimal(7,3)
declare @minY as decimal(7,3)
declare @maxY as decimal(7,3)

--Set the variables
set @minX=(select min(AttributeX) from dbo.tblExample)
set @maxX=(select max(AttributeX) from dbo.tblExample)
set @minY=(select min(AttributeY) from dbo.tblExample)
set @maxY=(select max(AttributeY) from dbo.tblExample)

--Perform the normalization
select 
AttributeX, 
(AttributeX - @minX)/(@maxX - @minX) as NormalizedX,
AttributeY, 
(AttributeY - @minY)/(@maxY - @minY) as NormalizedY 
from dbo.tblExample

The results from the previous query are shown below. Notice how the normalized values for AttributeX and AttributeY are equal on each row, while the original values differ.

Top 10 rows from device B
Next Steps

Normalized values can be inserted into a table and then read into data mining algorithms. You can read more about SQL Server data mining in the following tips from MSSQLTips.com.



Last Updated: 2014-12-23


get scripts

next tip button



About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

View all my tips





Comments For This Article





download





Recommended Reading

Cursor in SQL Server

Using MERGE in SQL Server to insert, update and delete at the same time

Rolling up multiple rows into a single row and column for SQL Server data

How to use @@ROWCOUNT in SQL Server

SQL Server Loop through Table Rows without Cursor








get free sql tips
agree to terms


Learn more about SQL Server tools