Learn more about SQL Server tools

mssqltips logo
giveaway
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Using T-SQL to Perform Decimal Scaling Normalization for SQL Server


By:   |   Read Comments   |   Related Tips: More > T-SQL

Problem

In SQL Server data mining, we sometimes need to perform techniques such as decimal scaling normalization on numeric data type columns to prevent one column from skewing or dominating the models produced by the machine learning algorithms.

Solution

In this tip, we will demonstrate how to use T-SQL to perform a decimal-scaling normalization in SQL Server. To perform the decimal scaling, we must first calculate the next largest integer from the base-10 log of the maximum value in the column. Then we divide the original column value by 10 raised to this calculated value to get our normalized value. We will use T-SQL's MAX() function to find the maximum value in the column, the LOG10() function to get the power of 10 to which the column value is raised, the CEILING() function to give us the next largest integer, and the POWER() function to calculate the divisor.

We will create a table that contains two data columns, AttributeX and AttributeY, and perform the decimal scaling normalization on each of these columns.

IF EXISTS (SELECT name FROM sys.tables WHERE name = N'tblExample')
  DROP TABLE dbo.tblExample
go
create table dbo.tblExample
(
  pKey integer identity(1,1) PRIMARY KEY,
  AttributeX decimal(10,3),
  AttributeY decimal(10,3)
)

Next, we will populate this table with 17 rows of data. AttributeY will be 100 times the value of AttributeX.

insert into dbo.tblExample values (2, 200)
insert into dbo.tblExample values (30, 3000)
insert into dbo.tblExample values (270, 27000)
insert into dbo.tblExample values (300, 30000)
insert into dbo.tblExample values (310, 31000)
insert into dbo.tblExample values (320, 32000)
insert into dbo.tblExample values (330, 33000)
insert into dbo.tblExample values (360, 36000)
insert into dbo.tblExample values (410, 41000)
insert into dbo.tblExample values (440, 44000)
insert into dbo.tblExample values (460, 46000)
insert into dbo.tblExample values (470, 47000)
insert into dbo.tblExample values (510, 51000)
insert into dbo.tblExample values (560, 56000)
insert into dbo.tblExample values (570, 57000)
insert into dbo.tblExample values (6300, 630000)
insert into dbo.tblExample values (8100, 810000)

After inserting the rows, we will select the next largest integer of the base-10 logarithm of the maximum value of each column.

select ceiling(log10(max(AttributeX))) as nextLargestPowerOf10ForX 
from dbo.tblExample
select ceiling(log10(max(AttributeY))) as nextLargestPowerOf10ForY 
from dbo.tblExample

We can see the results from the above queries in the image below.

Largest Power Of 10 For Both Columns

Now we will use the T-SQL code below to perform the decimal-scaling normalizations on AttributeX and AttributeY.

--Declare the variables
declare @nextLargestPowerOf10ForX as integer
declare @nextLargestPowerOf10ForY as integer

--Set the variables
set @nextLargestPowerOf10ForX=(select ceiling(log10(max(AttributeX))) from dbo.tblExample)
set @nextLargestPowerOf10ForY=(select ceiling(log10(max(AttributeY))) from dbo.tblExample)

--Perform the normalization
select 
AttributeX, -- the original Attribute X value
(AttributeX/POWER(10, @nextLargestPowerOf10ForX)) as NormalizedX,
AttributeY, -- the original Attribute X value
(AttributeY/Power(10, @nextLargestPowerOf10ForY)) as NormalizedY
from dbo.tblExample

The results from the previous query are shown below. Notice how the normalized values for AttributeX and AttributeY are equal on each row, while the original values differ. Also notice how this process will produce normalized values from zero to one inclusive.

Original and normalized values
Next Steps

Normalized values can be inserted into a table and then read into data mining algorithms. You can read more about SQL Server data mining in the following tips from MSSQLTips.com.



Last Update:






About the author
MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

View all my tips





More SQL Server Solutions











Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Learn more about SQL Server tools