By: Dallas Snider | Last Updated: 2016-03-29 | Comments (2) | Testing
I need some sample data. How can I generate a normal distribution in T-SQL with a specific mean, standard deviation and with a certain number of values to the right of the decimal point?
The RAND() function alone in T-SQL will not generate a normal distribution (a bell curve), but it should generate values evenly distributed from 0 to 1. In this tip, we will examine T-SQL code that will allow you to specify a mean (average), standard deviation, the precision, and the number of samples to be generated. Our example will use the polar method first described in 1958 by Box and Muller in their article "A Note on the Generation of Random Normal Deviates" which appeared in The Annals of Mathematical Statistics. We will set our mean to 75, standard deviation to 5, precision to 1, and the number of samples to 1000.
Let's start by creating a temporary table using the T-SQL below.
IF OBJECT_ID('tempdb..#tblNormalDistribution') IS NOT NULL DROP TABLE #tblNormalDistribution go create table #tblNormalDistribution (x float) go
Next, we want to declare and set our variables as shown below. As we will see later, each iteration of our loop will generate two values. So for us to generate 1,000 values, we need 500 iterations.
declare @pi float, @2pi float, @randNum1 float, @randNum2 float declare @value1 float, @value2 float declare @iteration int, @numberOfIterations int declare @mean float declare @stdDev float --standard deviation declare @precision int --number of places to the right of the decimal point select @iteration = 0 select @pi = pi() select @2pi = 2.0 * @pi select @mean =75.0 select @stdDev =5.0 select @precision = 1 select @numberOfIterations = 500 --Two values will be generated each iteration
Now it is time to enter into our loop to generate and insert our normally distributed values.
while (@iteration < @numberOfIterations) begin select @randNum1 = rand() select @randNum2 = rand() select @value1 = round((sqrt(-2.0*log(@randNum1))*cos(@2pi*@randNum2))*@stdDev, @precision)[email protected] select @value2 = round((sqrt(-2.0*log(@randNum1))*sin(@2pi*@randNum2))*@stdDev, @precision)[email protected] insert into #tblNormalDistribution (x) values (@value1) insert into #tblNormalDistribution (x) values (@value2) select @iteration = @iteration + 1 end
After loading the data into the temp table, we will execute the following T-SQL code to verify that our data is normally distributed about our specified mean of 75 with our chosen standard deviation of 5.
select count(*) as [Count], min(x) as [Min], max(x) as [Max], avg(x) as [Average], stdev(x) as [Standard Deviation] from #tblNormalDistribution
The image below shows the results of executing the queries. We can see that the mean is close to 75 and the standard deviation is almost 5. The Min and Max columns demonstrate our precision of 1.
We can use the next T-SQL query to generate values for a histogram.
select round(x,0) as testValue, count(*) as testValueCount from #tblNormalDistribution group by round(x,0) order by testValue
The first 13 rows returned by the previous query are shown here.
Adjust the values for mean, standard deviation, precision and number of iterations and watch how the results change. Also, please check out these other tips and tutorials on T-SQL and the RAND() function on MSSQLTips.com.
- A More Versatile SQL Server Random Number Function
- Different ways to get random data for SQL Server data sampling
- Create Your Own RANDBETWEEN Function in T-SQL
- Generating Random Numbers in SQL Server Without Collisions
- SQL Server Random Sorted Result Set
- Our complete tutorial list
Last Updated: 2016-03-29
About the author
View all my tips