Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server random numerics data generation using CLR


By:   |   Last Updated: 2011-03-09   |   Comments (53)   |   Related Tips: More > Functions - User Defined UDF

Problem

You need to generate random data directly into SQL Server table columns or close to the database engine as variables or expressions. Looking at the SQL Server available functions, you notice that only RAND function offers support for random data generation. Although RAND([seed]) is a built-in function, it can only return a float value between 0 and 1, and has other limitations in regards to seed values. Because your table columns may be of various data types, and each data type may have a lower value and an upper value, you would prefer to create your custom random data generators. This is when SQL Server CLR functions come into play and provide a viable solution.

Solution

.NET Base Class Library provides the System.Random class which is a pseudo-random number generator, producing a sequence of numbers meeting certain requirements for randomness. The results provided by this class' methods are excellent candidates for our purposes, as they are fast obtained, and the duplicate values produced for a large number of usages, may be also good for representing column data within constraints. One of the constructors of the Random class also takes a seed as a parameter, where the seed generation has lots of flexibility. Listed below are some of the constructors (2 in total) and the methods (12 in total) in the Random class, which will be used in the code samples:

  • Random(Int32): initializes a new instance of the Random class, using the specified seed value
  • Next(Int32, Int32): returns a random number (Int32) within the specified range
  • NextDouble: returns a random number (Double) between 0.0 and 1.0

Describing how these pseudo-random numbers are chosen with equal probability from a finite set of numbers, and the theory behind how the seed value influences the algorithm, is beyond the scope of this article. Here we will use a seed value that is time-dependent, and this is sufficient for all practical purposes.

 

SQL Server data types are organized into 7 categories - like Exact numerics, Date and time - each containing several groups - like float and real groups in the Approximate numerics category. In the tip presented here we will show the code necessary to develop SQL CLR C# functions that can be used to return values in the int group in the Exact numerics category, as well as values in the float group in the Approximate numerics category.

Notice how the seed value is initialized using the current millisecond, and how NULL input parameters are replaced with their minimum - maximum .NET representation for int or double data type.

And now our functions become pretty straightforward, with the code sample below having comments to improve clarity:

using Microsoft.SqlServer.Server;
using System;
using System.Data.SqlTypes;
public class RandomFunctions
{
  [SqlFunction]
  public static SqlInt32 RandomInt(SqlInt32 minValue,
                                  SqlInt32 maxValue)
  {
    // Returns a random int between minValue and maxValue
    try
    {
        
      // If minValue or maxValue is null
      // replace it with an appropriate value
      if (minValue.IsNull)
        // A constant representing the smallest possible value of a SqlInt32
        minValue = SqlInt32.MinValue;
        
      if (maxValue.IsNull)
        // A constant representing the largest possible value of a SqlInt32
        maxValue = SqlInt32.MaxValue;
      // Check if the size of input parameters is properly specified
      if (minValue > maxValue)
        return SqlInt32.Null;
      else if (minValue == maxValue)
        return minValue;
      else
      {
        // Return the random int within the minValue, maxValue range
        return (new Random(DateTime.Now.Millisecond)).Next(minValue.Value, maxValue.Value);
      }
    }
    
    catch (Exception ex)
    {
      
      // On any error, return NULL
      return SqlInt32.Null;
    }
  }  
  
  [SqlFunction]
  public static SqlDouble RandomFloat(SqlDouble minValue,
                                      SqlDouble maxValue)
  {
    try
    {
      // Returns a random float between minValue and maxValue
      if (minValue.IsNull)
        // A constant representing the smallest possible value of a SqlDouble
        minValue = SqlDouble.MinValue;
      if (maxValue.IsNull)
        // A constant representing the largest possible value of a SqlDouble
        maxValue = SqlDouble.MaxValue;
      // Check if the size of input parameters is properly specified
      if (minValue > maxValue)
        return SqlDouble.Null;
      else if (minValue == maxValue)
        return minValue;
      else
      {
        var rnd = new Random(DateTime.Now.Millisecond);
        // Get a random double between 0.0 and 1.0.
        var rand = rnd.NextDouble();
        // Return a random double the minValue, maxValue range
        return (1 - rand) * minValue.Value + rand * maxValue.Value;
      }
    }
    
    catch (Exception ex)
    {
    
      // On any error, return NULL
      return SqlDouble.Null;
    }
  }
    
};

A sample SQL Server run is shown in the script below:

------------------------------------------------------------
-- RandomInt function
------------------------------------------------------------
--
select dbo.RandomInt(null, null)
--possible returns
-- -300396732
-- 726643699
--
select dbo.RandomInt(11, 9)
--returns NULL because minValue (11) is > maxValue (9)
--
select dbo.RandomInt(8, 8)
--returns 8 because minValue (8) is = maxValue (8)
--
select dbo.RandomInt(-22, 4)
--possible returns
-- -13
-- -3
--
select dbo.RandomInt(3667, 80956)
--possible return
-- 63587
--
select dbo.RandomInt(null, 0)
--possible return
-- -1746587208
--
------------------------------------------------------------
-- RandomFloat function
------------------------------------------------------------
--
select dbo.RandomFloat(null, null)
--possible returns
-- 1.59596487705279E+308
-- -4.64457392666594E+307
--
select dbo.RandomFloat(11.22, 9.99)
--returns NULL because minValue (11.22) is > maxValue (9.99)
--
select dbo.RandomFloat(8.88, 8.880)
--returns 8.88 because minValue (8.88) is = maxValue (8.880)
--
select dbo.RandomFloat(-22.333, 4.5)
--possible returns
-- -4.82918367984667
-- 1.67866188385369
--
select dbo.RandomFloat(3666.7, 80955.6785)
--possible return
-- 56206.2634003703
--
select dbo.RandomFloat(0, null)
--possible return
-- 5.68041671739381E+307
--
------------------------------------------------------------

The advantages of implementing the basic numeric random data generation functions are straightforward as they can be used inline of SQL statements, as well as to fill in T-SQL variables or computed table columns within your SQL Server database environment.

Random data generation using .NET methods provide a very powerful functionality that every Developer/DBA should attempt to implement if requirements arise.

Our tip discussed here just gets you started writing your own SQL CLR functions embedding this attractive alternative in your data generation solutions.

Next Steps
  • Compile, deploy, and use the RandomInt and RandomFloat functions; enhance them to suit your needs.
  • Explore uses of other data types as input / output for SQL CLR random functions.
  • Use complex mathematics in your SQL Server CLR development toolkit.
  • Check other SQL CLR tips on this site.


Last Updated: 2011-03-09


get scripts

next tip button



About the author
MSSQLTips author Alex Tocitu Alex Tocitu is an engineer with an MSEE in Automation and MBA in Finance.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Thursday, March 31, 2011 - 8:53:15 AM - Jeff Moden Back To Top

I take that back... I can only show you the T-SQL stuff (after you've showed us the results I asked you for) because I don't allow CLR's on any of my machines for reasons already stated.  The output I'm looking for is simply the 100 random numbers the code I wrote produces with your function.


Thursday, March 31, 2011 - 8:45:27 AM - Jeff Moden Back To Top

And, to be sure, Alex, I wasn't looking for timing on my previous request.  Instead, I want to see the output of the SELECT.


Thursday, March 31, 2011 - 8:44:02 AM - Jeff Moden Back To Top

I'll be happy to do all of that once you post the results from my previous request and you also post such timing results of your own.


Thursday, March 31, 2011 - 8:28:57 AM - Alex Tocitu Back To Top

t0 + 300ms, t0 + 700ms

should read t0 + 3ms, t0 + 7ms


Wednesday, March 30, 2011 - 9:39:44 PM - Alex Tocitu Back To Top

 

Hi Jeff,

Please do the following:

1. Verify the timing part of your script, taking into account that time reading with a datetime data type happens at t0, t0 + 300ms, t0 + 700ms, t0 + 1s, etc. Show us the final script.

2. Develop and deploy two random functions, with two parameters each, with same behavior like the ones in the tip. Create also their supporting views, if needed, and put out the script for that.

3. Compile and deploy the four functions we ended up with here (hint: the two in the tip article, plus two you will get using a tick based seed). Show us compilation and deployment results.

4. Run sample script that will show how long takes to call your functions vs. the ones in the tip when you generate 1,000, 10,000, 100,000 and 1000,000 random numbers. Show us the script(s) and the timing results.

Thanks

 


Wednesday, March 30, 2011 - 8:24:09 PM - Jeff Moden Back To Top

Hi Alex,

A loop formed by "GO n" and a PRINT statement are actually pretty slow in the set-based world of T-SQL.  Would you run the following code on your test box and let us know what the result is?  Thanks. NOCOUNT ON;
WITH

E1
(N) AS (
           SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
          SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
          SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
         ),                         --10
E2(N) AS (SELECT 1 FROM E1 a, E1 b) --100
SELECT SomeRandomInt = dbo.RandomInt(0, 1000)
FROM E2;
SET
NOCOUNT OFF;

SET 


Wednesday, March 30, 2011 - 1:32:20 PM - Alex Tocitu Back To Top

1. Let’s name the function with a tick based seed RandomInt2 or f2,

and leave the function with a millisecond seed RandomInt or f.

Assume the values are read at a slower pace (10 -15 ms) than either of them

generates the seed; then the function with a higher granularity – f2, has a lower rate

 

of duplicates than the function with a lower granularity – f.

2. The results below show a sample run of both functions for getting, for example,

10 random numbers. See how both give good results, with no duplication.

--

set nocount on

print dbo.RandomInt(0, 1000)

go 10

set nocount off

--

--Beginning execution loop

--939

--984

--506

--551

--596

--119

--686

--731

--776

--298

--Batch execution completed 10 times.

--

set nocount on

print dbo.RandomInt2(0, 1000)

go 10

set nocount off

--

--Beginning execution loop

--47

--930

--872

--813

--784

--667

--579

--462

--375

--287

--Batch execution completed 10 times.

--


Wednesday, March 30, 2011 - 1:43:52 AM - Peter Johansson Back To Top

Your latest code assumes that the value of DateTime.Now will update every 100 nanoseconds which is definitely not the case. Rather, the update will happen less frequently than every millisecond on most (all?) systems (see http://msdn.microsoft.com/en-us/library/system.datetime.now.aspx), which would make the problem even worse than the calculations earlier in the comments that assumes a 1 millisecond update rate. The DateTime data structure can represent values with the granularity you explain, but that does not mean that the .Now property gets updated with that frequency...


Monday, March 28, 2011 - 11:32:32 AM - Alex Tocitu Back To Top

Replace the DateTime.Now.Millisecond as a seed with

(int)(DateTime.Now.Ticks % ((long)Int32.MaxValue + 1)) as a seed.

Because:

1 Millisecond = 10,000 Ticks (1 Tick = 100 Nanoseconds)

and

Int32.MaxValue = 2,147,483,647

You get a granularity of a different number every tick

every other 2,147,483,648 ticks.


Sunday, March 27, 2011 - 4:20:12 PM - Alex Tocitu Back To Top

Note for Jeff:

RBAR may be interchanged with

RBAS: row by agonizing script... ;)


Sunday, March 27, 2011 - 4:09:15 PM - Alex Tocitu Back To Top

Jeff has become not only his own RBAR, but mine too, heh ... ;)


Sunday, March 27, 2011 - 2:23:27 PM - Jeff Moden Back To Top

Guess I'll give up with the formatting attempts.  I'm not giving up on you, though, Alex.  You need to "see the light".  SQL Server is a whole lot more flexible than you give it credit for.


Sunday, March 27, 2011 - 2:21:21 PM - Jeff Moden Back To Top

Man, I wish they'd fix the formatting on this forum...  here's all the code again to make it more readable if not nicely formatted...

CREATE

 

 

 

VIEW dbo.MyNewID
AS
SELECT ANewID = NEWID();
GO
CREATE FUNCTION dbo.RandomInt1000()
RETURNS
INT
WITH EXECUTE AS CALLER
AS
BEGIN
RETURN (SELECT ABS(CHECKSUM(ANewID))%1000+1
FROM dbo.MyNewID)
END
;
GO
SELECT dbo.RandomInt1000();
SELECT ABS(CHECKSUM(NEWID()))%1000+1;



Sunday, March 27, 2011 - 2:15:51 PM - Jeff Moden Back To Top

All you're doing is showing exactly how much you don't know about SQL.  I didn't put random number generation in a User Defined Function because 1) The code is simple enough to NOT warrant the use of a function and 2) because I already knew that you can't use NEWID() directly in a function.  CLR's, SQLCLR's, and T-SQL User Defined Functions both have a place in T-SQL and the calculation of random numbers in T-SQL isn't one of them nor, in the face of how simple it is to create random numbers in all of my previous numbers, is there a general need for either.

However, if you know T-SQL, you'll also know that there are ways around a lot of "problems" including how to use NEWID() in a UDF.  I continue to recommend against it because generation of random numbers in normal inline code is so simple, as I've repeatedly demonstrated in the code I've posted.

However, if you want to reduce SQL Server to the quagmire of RBAR instead of how it was meant to be used, here's how to create a random function in T-SQL. 

First, create a view that returns a GUID using NEWID()... like this...

 CREATE VIEW dbo.MyNewID
     
AS
 SELECT ANewID = NEWID();
GO

Next, create the User Defined Function...First, create a view that returns a GUID using NEWID()... like this...

 CREATE

 

 

 FUNCTION dbo.RandomInt1000()
RETURNS
 INT
   WITH EXECUTE AS CALLER
     AS
  BEGIN
 RETURN (SELECT ABS(CHECKSUM(ANewID))%1000+1
          
FROM dbo.MyNewID)
    END        
;
GO

 

Then, use the UDF as with any other scalar function...

 

 SELECT

 

 

dbo.RandomInt1000 ();

But, as I've said, why go through even that tiny bit of code when things can be done so very simply directly in the code without using a UDF...

 SELECT

 

 

 ABS(CHECKSUM(NEWID()))%1000+1;

The real "moral" is that Alex needs to spend more time studying T-SQL instead of making ad hominem attacks against people who do.


Saturday, March 26, 2011 - 7:52:39 PM - Alex Tocitu Back To Top

The moral:

John and Jeff... do something on your own ... like testing before preaching, or ... ;) 


Saturday, March 26, 2011 - 7:38:59 PM - Alex Tocitu Back To Top

...' it is simple to create functions in T-SQL directly to easily convert the value from RAND to the appropriate value, using min/max functionality'...

What  this:

--

CREATE FUNCTION RandomCall()

RETURNS INT

WITH EXECUTE AS CALLER

AS

BEGIN

RETURN Rand()

END

GO

--

Msg 443, Level 16, State 1, Procedure RandomCall, Line 7

Invalid use of a side-effecting operator 'rand' within a function.

--

 

That is pretty ‘simple’, John, heh…;)


Saturday, March 26, 2011 - 1:48:56 PM - Alex Tocitu Back To Top

Your code did not call an user defined function; and here's why:

 

CREATE FUNCTION RandomInt1000()

RETURNS INT

WITH EXECUTE AS CALLER

AS

BEGIN

RETURN ABS(CHECKSUM(NEWID()))%1000+1 --This is all that's need to gen random numbers

END

GO

--Returns

 

Msg 443, Level 16, State 1, Procedure RandomInt1000, Line 7

Invalid use of a side-effecting operator 'newid' within a function.

--

That's why you did not bother coming up with one.

Your script's abstraction level is again low.

You just keep wasting my time, heh...;)

 

 


Saturday, March 26, 2011 - 2:53:22 AM - Jeff Moden Back To Top

I don't need to complain about "most languages"… just the code you've written which returns the same random number for all rows inserted during the same millisecond. 


 To answer your question, the following code generates more than 1100 rows per millisecond on my humble laptop. 


 --========= Suppress the auto-display of rowcounts for performance and appearance
        SET NOCOUNT ON
;
--========= Conditionally drop the target table to make reruns easier
         IF OBJECT_ID('tempdb..#SomeTable','U') IS NOT NULL
            DROP TABLE #SomeTable
;
--========= Create timer variables
    DECLARE @StartTime DATETIME,
            @EndTime   DATETIME
;
--========= Start 2 different "timers"
     SELECT @StartTime = GETDATE()
;
--========= Generate a million random numbers from 1 to 1000 and store them in a table on the fly
 WITH E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
      E2(N) AS (SELECT 1 FROM E1 a, E1 b),
      E4(N) AS (SELECT 1 FROM E2 a, E2 b),
      E8(N) AS (SELECT 1 FROM E4 a, E4 b),
cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8)
     SELECT Random1000 = ABS(CHECKSUM(NEWID()))%1000+1 --This is all that's need to gen random numbers
       INTO #SomeTable
       FROM cteTally
      WHERE N <= 1000000
;
--========= Stop the Timers
     SELECT @EndTime = GETDATE()
;
--========= Display the number of rows process per Millisecond
      PRINT 'Inserted ' + CAST(1000000/DATEDIFF(ms,@StartTime, @EndTime) AS VARCHAR(10)) + ' Rows per Millisecond'
;
      PRINT 'Total duration in Milliseconds = ' + CAST(DATEDIFF(ms,@StartTime, @EndTime) AS VARCHAR(10))
;

Here's the output from my machine: 

Inserted 1103 Rows per Millisecond 
Total duration in Milliseconds = 906 

 


Friday, March 25, 2011 - 8:53:20 PM - Alex Tocitu Back To Top

No, you fix it, because you complain, heh...;

or better yet, go complain to those who designed those 'most languages' you refer to,

or, maybe to those who designed the rest of the languages not included in the 'most languages'.

And now, a question: how many records can you insert in a table in one millisecond?

You claim to know the exactly, so, why don't you tell everyone here?

 


Friday, March 25, 2011 - 8:37:45 PM - Jeff Moden Back To Top

There are no "powerful theorems" or "laws of physics" I'm violating. A random number function is supposed to return a different random number every time it is called no matter how many times it's called in a given timeframe.  Yes, I agree... some of the numbers may be repeated.  Such is the nature of random numbers.  But your code will return the exact same "random" number if it is called within the same millisecond and that is not the correct nature of a random number function.

If you take out the seed from the RandomInt function, and name it RandomIntNoSeed, it will return double values too, like in this run:

So fix it!  Better yet, use the functions in T-SQL which easily do all of this without the same headaches.


Friday, March 25, 2011 - 7:24:47 PM - Alex Tocitu Back To Top

'I'd personally prefer a random function that works the way it's supposed to and that is, no matter how many times you call it in the same millisecond'...

You are defying some very powerful theorems, laws of physics, etc. of which you obviously are unaware of...

but thanks for your engaging comments.

' It would have been so much easier to just not include a seed, though'

 If you take out the seed from the RandomInt function, and name it RandomIntNoSeed, it will return double values too, like in this run:

select dbo.RandomIntNoSeed(-10000, 10000), dbo.RandomIntNoSeed(-10000, 10000)

So, like I wrote, once again, the seed has its role.



Friday, March 25, 2011 - 5:59:28 PM - Jeff Moden Back To Top

Because of the way your wrote your code, the same random number can occur for all the rows generated or inserted into a table in the same milliseond whether it is done by concurrent users or not.  As we both know, you can generate or insert many more than just 1 row in a single millesecond.

So far as the seed goes, if your expected behavior is to allow the same random number to be generated for all rows generated during the same millisecond, then you certainly accomplished what you set out to do.  I'd personally prefer a random function that works the way it's supposed to and that is, no matter how many times you call it in the same millisecond, it won't return the same random number for all those rows unless the constraints on the number of possible random numbers is so small that it becomes a possibility.

As far as the granularity of the seed, millisecond was chosen as an example, to incite research, and because the ultimate function is still not out.

I'm sure that's true especially the last part of that.  It would have been so much easier to just not include a seed, though.


Friday, March 25, 2011 - 4:05:02 PM - Alex Tocitu Back To Top

My question was not like: [have to have ...] but [have ...];

and this was in lieu of the comment 'more than one person hit the code at the same instant'.

Who in the world would want to generate random values concurrently???

Your assumptions go too far, heh... ;)

To make the story short, if you put any of the functions in this tip as column defaults,

or use them as generated values for a table insert nobody will call them 'at the same instant'.

One execution will be at time 't', the other at time 't + n' where n is a number of ticks dependent of the processor.

As far as the granularity of the seed, millisecond was chosen as an example, to incite

research, and because the ultimate function is still not out.

One can choose a different seed like a timespan inspired one, and replace the line containing 'DateTime.Now.Millisecond' seed.

In regards to no need for seed, one needs a seed that has expected behavior.

After the execution of the function one does not have access to a .Next method based on the seed of previous execution.

Therefore the function reseeds every time but may return different values.


Friday, March 25, 2011 - 3:11:35 PM - Jeff Moden Back To Top

If I understand your question correctly, no, you don't have to "have some isolation level in your database" other than what the default is.  If someone has a blocking lock on sys.All_Columns or a Tally Table of whatever your row source is, then you have much bigger problems to worry about.


Friday, March 25, 2011 - 10:42:00 AM - Alex Tocitu Back To Top

Heh... going to what you know best (correct me if I'm wrong), wouldn't

you have some isolation level in your database ? ;)


Friday, March 25, 2011 - 10:27:16 AM - Alex Tocitu Back To Top

Heh... from the way you asked your question, and the whole trail of comments under your name,

you seem to know the answer, to have that 'silver bullet' ... ;)


Friday, March 25, 2011 - 10:20:17 AM - Jeff Moden Back To Top

If you reuse the same Random object, without reseeding, as intended, you will get a series of pseudo-random numbers.

Ah... that's more like it.  Thanks, Peter  So if we took out the seed in the CLR, it would give different random values even if more than one person hit the code at the same instant.


Friday, March 25, 2011 - 10:15:41 AM - Jeff Moden Back To Top

Is a .NET language included in your 'most languages' category?

Heh... if I knew that, I wouldn't have asked the question.  Never mind, I'll find out on my own.

 


Friday, March 25, 2011 - 9:39:54 AM - Alex Tocitu Back To Top

From the article:

...'You need to generate random data directly into SQL Server table columns'...

...'Although RAND([seed]) is a built-in function, it can only return a float value between 0 and 1'...

...'and each data type may have a lower value and an upper value'...

...'enhance them to suit your needs'...

...'Our tip discussed here just gets you started writing your own SQL CLR functions embedding this attractive alternative'...


Anyway, ... I'm not going to rewrite this article as a comment.


Thursday, March 24, 2011 - 11:55:18 PM - Peter Johansson Back To Top

The Random class in .NET (can be used from any language supported by .NET) is first instantiated and after that you call a method (function) on the object (of type Random). If you reuse the same Random object, without reseeding, as intended, you will get a series of pseudo-random numbers.
I don't understand the comment - "Here we will use a seed value that is time-dependent, and this is sufficient for all practical purposes". Is a function that systematically returns the same number for series of calls "good enough" as a random number source?


Thursday, March 24, 2011 - 7:08:22 PM - Alex Tocitu Back To Top

Define most languages; C# implements Random class like any .NET language.

Is a .NET language included in your 'most languages' category?


Thursday, March 24, 2011 - 2:53:55 PM - Jeff Moden Back To Top

Except for SQL Server multi-row returns, most languages have it where the random function will simply return the next pseudo-random number in the sequence when no Seed is given.  Does C# work in the same manner? 


Thursday, March 24, 2011 - 9:28:46 AM - Alex Tocitu Back To Top

This function does not have a flaw; it behaves as expected.

You can, for example, add a timer in the T-SQL code adding data to a table.

From the article:

... 'Describing how these pseudo-random numbers are chosen with equal probability from a finite set of numbers, and the theory behind how the seed value influences the algorithm, is beyond the scope of this article. Here we will use a seed value that is time-dependent, and this is sufficient for all practical purposes.' ...




Thursday, March 24, 2011 - 8:39:42 AM - Jeff Moden Back To Top

Thanks for that, Peter.  That's what I thought but, since I don't write C#, I didn't know for sure.


Thursday, March 24, 2011 - 5:38:36 AM - Peter Johansson Back To Top

I'm not a DBA, but a C# developer. The code in the original tip has a flaw in

var rnd = new Random(DateTime.Now.Millisecond);

in that if you call it many times within a millisecond each call will return the same "random" number...


Sunday, March 13, 2011 - 12:41:45 AM - Jeff Moden Back To Top

You didn't even answer my question about simultaneous calls.  ;-)


Sunday, March 13, 2011 - 12:39:54 AM - Jeff Moden Back To Top

Everyone is entitled to an opinion, even you.  You've just admitted that you were laying for me but I figure that out early.  How very "courteous" of you.  Surprisingly, I agree...  Most of what I said here is not worth your attention.  It wasn't actually for you but for the readers of this post so they'd know there's an alternative.


Saturday, March 12, 2011 - 11:21:48 PM - Alex Tocitu Back To Top

Now it's time for you to introduce yourself, tell a few things about your education,

because in my opinion you lack professional courtesy, Jeff Moden.

Most all what you said here was not worth my attention; I only replied because 

I wanted to see where you may put yourself, but a glimpse of that one could see

from your first comment in this trail...

Take care, and be assured I won't be talking about you at my presentation, or ever, heh... jeff ;)


Saturday, March 12, 2011 - 8:55:58 PM - Jeff Moden Back To Top

My turn for a question:  What result will your code return if two calls to your code occur within the same millisecond?  I believe (because of the code snippet that follows) the answer will be that both calls will return the same "random" number because you're using the same seed.  Try that with NEWID and see if you can make that same thing happen. ;-)

        // Return the random int within the minValue, maxValue range
        return (new Random(DateTime.Now.Millisecond)).Next(minValue.Value, maxValue.Value);


Saturday, March 12, 2011 - 8:51:02 PM - Jeff Moden Back To Top

Low score according to who? You?  BWAA-HAAAA-HAAA!!!!  Good lord, don't make me laugh so hard. 

On 1.  It's plenty adaptable and limited only by the imagination.  I've just proven that I can very quickly create virtually any type of random data you want using only SQL Server functionality.
On 2.  Clarity?  What's so bloody hard about RandomInt % IntegerRange + Offset  or Rand * Range + Offeset?
On 3.  It made for a very compact and simple example. See #6 below.
On 4.  It's the same as yours because they're both based on a pseudo-random generator.
On 5.  Now you're being a knit-picker.  You asked for functions that would return random numbers, not error checking. ;-)  Error checking and input validation can certainly be added and you know it.  Along those same lines, you only return a NULL value as a result of your CATCH.  You could at least return an error message like SQL Server will if something goes haywire. ;-)
On 6.  You can use any table you want.  If a Tally Table is available, you can use that.  If even that's not available, you can use Itzek Ben-Gan's very clever row source generator.


Saturday, March 12, 2011 - 6:57:41 PM - Alex Tocitu Back To Top

So far you've got low score in all of the above  six bullets;

but you may improve your sample and we'll go from there.

Let's see how you do, heh... Jeff ;) 


Saturday, March 12, 2011 - 5:22:40 PM - Alex Tocitu Back To Top

And a few more:

5. How do you deal with error handling in your sample?

6. How do you deal with one's access rights to the system objects you use?


Saturday, March 12, 2011 - 5:09:52 PM - Alex Tocitu Back To Top

Fine, here's a few bullets:

1. What can you tell about the adaptability of your example?

2. What can you tell about the clarity of your example?

3. Why do you need to reference two system tables to insert data into the sample table?

4. What can you tell about the repeatability of your column value generator?

 


Saturday, March 12, 2011 - 4:47:53 PM - Jeff Moden Back To Top

You're welcome but discuss my example HERE, first!


Saturday, March 12, 2011 - 4:13:52 PM - Alex Tocitu Back To Top

No worries about the format, Jeff.

Among other things, I'll be discussing your example at my presentation.

Thanks for providing excellent comments.


Saturday, March 12, 2011 - 4:03:24 PM - Jeff Moden Back To Top

Ah, but I did phat phinger a comment.  The correct comment for SomeMoney should be...

-- "SomeMoney has a range of 1.00 to 10,001.00 non-unique numbers

... because I forget about the rounding that will occur with the conversion of a long FLOAT to DECIMAL(9,2).


Saturday, March 12, 2011 - 3:56:18 PM - Jeff Moden Back To Top

Heh... I guess pre-apologies for possible format errors weren't necessary after all. ;-)


Saturday, March 12, 2011 - 3:51:08 PM - Jeff Moden Back To Top

  

 

How in the world do you think you can stop people using any computer language they want, heh...?  

Heh... Who said anything about me stopping people from doing anything?  I can only try to convince the smart ones that you don't need to use another language to emulate what's already available in SQL Server.  You just need to know SQL Server and a bit of math to pull this programmable random functionality off. ;-)

I will continue this discussion only after you produce two T-SQL functions with the same behaviour like the ones in the article;

BWAA-HAAA!!!!  You'll only continue this discussion only if?  Heh... then, I'll continue the discussion without you. ;-)

For the rest of the folks reading this, here's how easy it is to generate huge amounts of various types of random data in SQL Server without a CLR or even building a UDF in SQL Server using a "pseudo cursor" as a million row row-source and some very, very simple math.  Since it's not likely this forum will preserve my code formatting, I apologize in advance for it all being slammed to the left or possibly it not have CR's at the end of each line when you copy.

 

/****************************************************************************************
Purpose:
Create a voluminous test table with various types of highly randomized data.

--Jeff Moden
****************************************************************************************/
--===== Conditionally drop the test table to make reruns easier
     IF OBJECT_ID('dbo.JBMTest','U') IS NOT NULL
   DROP TABLE dbo.JBMTest
;
--===== Create and populate a 1,000,000 row test table.
     -- "SomeID" has a range of 1 to 1,000,000 unique numbers (pretty standard, no random magic there)
     -- "SomeInt" has a range of 1 to 50,000 non-unique numbers
     -- "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
     -- "SomeMoney has a range of 1.00 to 10,000.99 non-unique numbers
     -- "SomeDate" has a range of >=01/01/2010 and <01/01/2020 non-unique date/times
 SELECT TOP 1000000
        SomeID      
= IDENTITY(INT,1,1),
        SomeInt      = ABS(CHECKSUM(NEWID()))%50000+1,
        SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
                     + CHAR(ABS(CHECKSUM(NEWID()))%26+65),
        SomeMoney    = CAST(RAND(CHECKSUM(NEWID()))*10000+1 AS DECIMAL(9,2)),
        SomeDate     = RAND(CHECKSUM(NEWID()))*DATEDIFF(dd,'2010','2020')+CAST('2010' AS DATETIME)
   INTO dbo.JBMTest
   FROM sys.all_columns ac1
  CROSS JOIN sys.all_columns ac2
;
--===== Add a typical Primary Key.
  ALTER TABLE dbo.JBMTest
    ADD CONSTRAINT PK_JBMTest PRIMARY KEY CLUSTERED (SomeID)
;

 There are two ways to generate random numbers in SQL... one for integers and one for non-integers.  You'll see the common theme for both in the code above.  Those SQL functions are available in every rev of SQL Server I know of.

 


Saturday, March 12, 2011 - 1:22:18 PM - Alex Tocitu Back To Top

You are invited, Jeff, but no preferential treatment ;)

How in the world do you think you can stop people using

any computer language they want, heh...?

Let me tell you this, Jeff... there are thousnads of random

functions around, some generic, some specialized.

I will continue this discussion only after you produce

two T-SQL functions with the same behaviour like the ones in the article;

and... thanks for liking it.

 


Friday, March 11, 2011 - 8:50:30 PM - Jeff Moden Back To Top

That's just great, Alex.  Heh... you paying for the trip and the hotel?  What's wrong with having a little discussion right here? ;-)

Let me ask this, Alex... would you write a CLR to do Modulo (I actually knew a guy that did)?  Probably not because it's a very well known operator of SQL Server.  Why then would you write an SQLCLR to do random numbers when such functionality already exists and is easy to use in SQL Server?

Make no mistake about it.  I'm NOT an anti-CLR zealot and I relished the day when the ability to do such things from SQLServer came.  I'm just not in favor of writing SQLCLR's to replace existing SQL Server functionality.

By the way... I liked your article.  It was well written. 


Wednesday, March 09, 2011 - 4:05:08 PM - Alex Tocitu Back To Top

John and Jeff,

I'll be having a talk at SQL Saturday #70 in Columbia, SC, on 03/19/11 - 'A Best Practice Approach to SQL Server CLR'.

You are invited there for a discussion; feel free to bring samples of code and comparison runs

to support your statements. Please review what the functions in this article achieve and bring equivalents. Thanks for your comments!


Wednesday, March 09, 2011 - 2:36:42 PM - Jeff Moden Back To Top

I not only agree with John but I'll also add that it seems a bit complicated to do something in externally managed code that's so simple to do in SQL Server and which runs at CLR speeds (or maybe even faster because of one less layer to go through).  And RAND ISN'T the only random number function available in SQL Server.  Values of NEWID() are even more random that those of the precision available from RAND and can, in fact be used to very quickly act as a seed for RAND to create different random numbers in a single query.  It can also be used to generate INTEGER random numbers directly using techniques similar to what John posted.


Wednesday, March 09, 2011 - 2:02:24 PM - John Fox Back To Top
While the CLR functions in this discussion do work, I prefer T-SQL functions as they are simpler to implement and maintain (i.e. directly in Server Manager). Since RAND returns a float from 0 to 1, it is simple to create functions in T-SQL directly to easily convert the value from RAND to the appropriate value, using min/max functionality. This method is time tested and has been in use for decades in many programming languages. For instance, the following statement will return a datetime value between now and 5 days in the future: select cast(cast (GETDATE() as float) + (RAND() * 5) as datetime) If you want 5 days before or after, change it to this: select cast(cast (GETDATE() as float) + (RAND() * 10) - 5 as datetime) Strings can be built using the CHAR function, char(49) is a '1', char(65) is an 'A', and char(97) is an 'a'. Creative use of RAND multiplied by an integer to create a range, can be used to create just about any random values needed for testing purposes. I'm not suggesting TSQL is superior to CLR, just providing another possible method. Each installation is different and should determine their own requirements.

Learn more about SQL Server tools