# SQL Server T-SQL Code to Generate A Normal Distribution

By:   |   Comments (2)   |   Related: More > Testing

##### Problem

I need some sample data. How can I generate a normal distribution in T-SQL with a specific mean, standard deviation and with a certain number of values to the right of the decimal point?

##### Solution

The RAND() function alone in T-SQL will not generate a normal distribution (a bell curve), but it should generate values evenly distributed from 0 to 1. In this tip, we will examine T-SQL code that will allow you to specify a mean (average), standard deviation, the precision, and the number of samples to be generated. Our example will use the polar method first described in 1958 by Box and Muller in their article "A Note on the Generation of Random Normal Deviates" which appeared in The Annals of Mathematical Statistics. We will set our mean to 75, standard deviation to 5, precision to 1, and the number of samples to 1000.

Let's start by creating a temporary table using the T-SQL below.

```IF OBJECT_ID('tempdb..#tblNormalDistribution') IS NOT NULL
DROP TABLE #tblNormalDistribution
go
create table #tblNormalDistribution (x float)
go
```

Next, we want to declare and set our variables as shown below. As we will see later, each iteration of our loop will generate two values. So for us to generate 1,000 values, we need 500 iterations.

```declare @pi float, @2pi float, @randNum1 float, @randNum2 float
declare @value1 float, @value2 float
declare @iteration int, @numberOfIterations int
declare @mean float
declare @stdDev float --standard deviation
declare @precision int --number of places to the right of the decimal point

select @iteration = 0
select @pi = pi()
select @2pi = 2.0 * @pi
select @mean =75.0
select @stdDev =5.0
select @precision = 1
select @numberOfIterations = 500 --Two values will be generated each iteration
```

Now it is time to enter into our loop to generate and insert our normally distributed values.

```while (@iteration < @numberOfIterations)
begin
select @randNum1 = rand()
select @randNum2 = rand()
select @value1 = round((sqrt(-2.0*log(@randNum1))*cos(@2pi*@randNum2))*@stdDev, @precision)+@mean
select @value2 = round((sqrt(-2.0*log(@randNum1))*sin(@2pi*@randNum2))*@stdDev, @precision)+@mean

insert into #tblNormalDistribution (x) values (@value1)
insert into #tblNormalDistribution (x) values (@value2)

select @iteration = @iteration + 1
end
```

After loading the data into the temp table, we will execute the following T-SQL code to verify that our data is normally distributed about our specified mean of 75 with our chosen standard deviation of 5.

```select count(*) as [Count],
min(x)   as [Min],
max(x)   as [Max],
avg(x)   as [Average],
stdev(x) as [Standard Deviation] from #tblNormalDistribution
```

The image below shows the results of executing the queries. We can see that the mean is close to 75 and the standard deviation is almost 5. The Min and Max columns demonstrate our precision of 1. We can use the next T-SQL query to generate values for a histogram.

```select round(x,0) as testValue,
count(*) as testValueCount
from #tblNormalDistribution
group by round(x,0)
order by testValue
```

The first 13 rows returned by the previous query are shown here. ##### Next Steps

Adjust the values for mean, standard deviation, precision and number of iterations and watch how the results change. Also, please check out these other tips and tutorials on T-SQL and the RAND() function on MSSQLTips.com. Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

View all my tips

 Monday, April 4, 2016 - 11:41:56 AM - Scott Back To Top (41131) Ray, Converting this to a set-based query is not a bad idea but you didn't exactly follow the original algorithm.  The idea is to generate two random numbers, then generate two data points from those numbers by alternating the SIN and COS functions.  You generated one data point from each pair of random numbers, choosing the SIN or COS formula based on even or odd row numbers.  Not a huge change, but mathemeticians take these things seriously. You need a set-based query that generates two data points for each iteration.  I used (SELECT TOP (n) 1 FROM sys.columns) because the count (50) was so low, for larger numbers another approach would be needed (such as the tally table in your query). DECLARE @2pi FLOAT = 2.0 * PI();DECLARE @mean FLOAT = 75.0;DECLARE @stdDev FLOAT = 5.0;DECLARE @precision INT = 1;DECLARE @iterations INT = 50;DECLARE @two31 FLOAT = 1.0 / 2147483648.0;DECLARE @samples TABLE ( x FLOAT NOT NULL ); INSERT INTO @samples ( x )SELECT  valFROM ( SELECT TOP (@iterations) i = 1 FROM sys.columns ) i CROSS JOIN (    SELECT  Rnd1 = SQRT(-2.0 * LOG(CAST(ABS(CHECKSUM(NEWID())) AS FLOAT) * @two31)) * @stdDev,            Rnd2 = @2pi * CAST(ABS(CHECKSUM(NEWID())) AS FLOAT) * @two31) r CROSS APPLY (    SELECT val = ROUND(r.Rnd1 * COS(Rnd2), @precision) + @mean    UNION ALL    SELECT val = ROUND(r.Rnd1 * SIN(Rnd2), @precision) + @mean) v;SELECT  [Count] = COUNT(*),        [Min] = MIN(x),        [Max] = MAX(x),        Average = AVG(x),        [Standard Deviation] = STDEV(x)FROM @samples;SELECT  testValue = ROUND(x,0),        testValueCount = COUNT(*)FROM @samplesGROUP BY ROUND(x,0)ORDER BY testValue;

 Tuesday, March 29, 2016 - 12:51:08 PM - Ray Back To Top (41083) Hi Doc, I always enjoy reading this type of tip.  I am no math whiz but I do find them interesting. I have been trying to learn more about numbers or tally tables and just as an exercise decided to try a "set-based" approach to your problem.  I certainly am not implying it is better I just wanted to see if I could come close. Based on your tests of Avg, StdDev, etc. I think it works properly. Here is my solution, (I am sure you will recognize most of it as your own:) ).     IF OBJECT_ID('tempdb..#tblNormalDistribution') IS NOT NULL   DROP TABLE #tblNormalDistribution go create table #tblNormalDistribution (Id Integer Not Null, x float) Go Declare @iterationInteger = 100000 , @2PiFloat = 2 * PI() , @meanFloat = 75.0 , @stDevFloat = 5.0 , @precisionInteger = 1 Insert Into #tblNormalDistribution (Id, x ) Select tn.Number , Case When tn.Number % 2 = 0 Then Round(Sqrt(-2.0 * Log(Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31)))) * cos(@2Pi * (Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31)))) * @stDev, @precision) + @mean Else Round(Sqrt(-2.0 * Log(Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31)))) * sin(@2Pi * (Abs(Cast(Checksum(NewId())As Float) / Power(2.0, 31)))) * @stDev, @precision) + @mean End From dbo.theNumbers As tn Where tn.Number < @iteration Select count(*) as Count,        min(x)   as Min,        max(x)   as Max,        avg(x)   as Average,        stdev(x) as [Standard Deviation] from #tblNormalDistribution Select Round(x, 0) As testValue , Count(*) As testValueCount From #tblNormalDistribution Group By Round(x, 0) Order By testValue;