Populating a SQL Server Test Database with Random Data

Problem

If you develop new functionalities for your SQL database then probably you already encountered the typical problem of testing large scale databases. I have to run a series of performance and functional tests on a database with a few million rows, but I do not have the necessary test data. I heard about some excellent commercial tools but they are expensive and my company cannot afford them. How can I generate the test data on my own and populate it to the tables on the test database?

Solution

I will show you some tricks to generate and multiply rows for the test database. Each data type needs a different approach as you can see below.

Numeric data

First of all you will definitely need to generate some numeric data. The hands on idea would be to use the RAND() function. Unfortunately RAND is invoked once per query, not once per row therefore it returns the same value during the course of your query. So I suggest using other methods unless you would like to use a different seed value for each and every call which can be very painful for generating mass volume of data.

If you are looking for a real random value, you will get much better results with generating a NEWID() and calculating the checksum. The following expression sets INTVALUE1 field to a random integer value:

UPDATE TESTTABLE  SET INTVALUE1=CHECKSUM(NEWID())

Further modifying this expression you can generate decimal values and using the ABS() function you can eliminate the negative numbers:

UPDATE TESTTABLE  SET DECVALUE1=ABS(CHECKSUM(NEWID()))/100.0

Fixed range data sets

We can use the above method with some changes to generate values within a fixed data range. For example when you need to generate a Boolean value then you can choose from the data sets (0;1) or (Y;N) etc. You can use the parity of the random integer to convert it into Boolean data. Beware; you cannot use case function to sort NEWID results since NEWID is invoked for every CASE statement. The following expression generates random yes or no values for the column BOOLVALUE1:

–Generate random 0 or 1 value for every row

UPDATE TESTTABLE  SET BOOLVALUE1 = ABS(CHECKSUM(NEWID()))%2  
–Translate the values to Yes or No

UPDATE TESTTABLE  SET BOOLVALUE1 = ‘N’ WHERE BOOLVALUE1=’0′

UPDATE TESTTABLE  SET BOOLVALUE1 = ‘Y’ WHERE BOOLVALUE1=’1′

Text data

Text data requires special attention if you would like to have a database with some reasonable content. You have to build an initial dictionary and then use it to breed more rows. For example it is common to use names in various database fields. The following example shows you how to create a table containing 100 different names in a few seconds.

–Create table for first names

CREATE TABLE [NAMES1] (FIRST_NAME [varchar](20))  
–Create table for family names

CREATE TABLE [NAMES2] (FAMILY_NAME [varchar](20))  
–Fill first names

INSERT INTO NAMES1 VALUES (‘John’)

INSERT INTO NAMES1 VALUES (‘Jack’)

INSERT INTO NAMES1 VALUES (‘Jill’)

INSERT INTO NAMES1 VALUES (‘Bill’)

INSERT INTO NAMES1 VALUES (‘Mary’)

INSERT INTO NAMES1 VALUES (‘Kate’)

INSERT INTO NAMES1 VALUES (‘Kevin’)

INSERT INTO NAMES1 VALUES (‘Matt’)

INSERT INTO NAMES1 VALUES (‘Rachel’)

INSERT INTO NAMES1 VALUES (‘Tom’)  
–Fill family names  INSERT INTO NAMES2 VALUES (‘Smith’)

INSERT INTO NAMES2 VALUES (‘Morgan’)

INSERT INTO NAMES2 VALUES (‘Simpson’)

INSERT INTO NAMES2 VALUES (‘Walker’)

INSERT INTO NAMES2 VALUES (‘Bauer’)

INSERT INTO NAMES2 VALUES (‘Taylor’)

INSERT INTO NAMES2 VALUES (‘Morris’)

INSERT INTO NAMES2 VALUES (‘Elliott’)

INSERT INTO NAMES2 VALUES (‘Clark’)

INSERT INTO NAMES2 VALUES (‘Rock’)  
–Generate 10×10=100 different names

SELECT * INTO TESTTABLE FROM NAMES1 CROSS JOIN NAMES2

You can take the above examples to quickly generate more and more different database records from a few row sample tables. Cross joins and self joins can easily create large datasets so be cautious when using very big tables. For example cross joining two tables with one thousand records generates results of one million records.

Next Steps

Build supporting tables to generate the text values
Create stored procedures to fill test database
Check out these related resources:

Tibor Nagy

I am a technical and business professional based in Budapest, Hungary. I am working in the financial industry on the operations side, but also have test management experience in the educational software domain. My experience includes Microsoft SQL Server 2000-2012, IBM DB2 and MySQL database administration, troubleshooting, migration and development. My focus points are performance tuning and security. I am passionate about sharing my knowledge with the greater audience.

I hold MSc degree in Engineering and Management. I am married and I have a beautiful daughter.

LinkedIn account:
http://www.linkedin.com/in/nagytibor

Twitter account:
https://twitter.com/#!/tnagy_eeft

Leave a ReplyCancel Reply