Generate Test Data Sets for Testing SQL Server Database Applications

By:   |   Updated: 2023-08-21   |   Comments (1)   |   Related: > Testing


Problem

In my line of work with databases and business intelligence, I have observed a common oversight among developers regarding testing their queries. Failing to perform this crucial step can result in production issues, which they may attribute to real-life data instead of improper Unit Testing. We can easily avoid these significant issues by taking the time to test queries before deploying them.

In this tip, we will discuss the process of generating test data to verify query logic without relying on real-life data, along with some helpful guidelines. We will also cover various scenarios and their corresponding solutions.

Solution

Before diving into testing SQL queries, we must first understand what unit testing is and how it applies to SQL code.

Unit Testing is a type of software testing that tests individual software units or components to ensure that each performs as expected. In SQL, when talking about Testing, it typically means data testing, which is validating data against test criteria. On the other hand, Unit Testing validates query logic using fixed inputs without relying on real-life data.

And if you are new to SQL queries, check out the following tips to get started:

Generating Test Data Guidelines

When we develop queries, we commonly encounter two database permissions: full access to Add, Delete, and Modify data to an existing database or only the ability to read the data. In addition, there may be situations where we come across empty databases and need interface development to add data, which prevents us from properly Unit Testing our queries. So, to handle whatever comes your way, below are a few guidelines to follow.

Guideline 1 - Generate Test Data Using APIs

You can generate test data if you can access the interface applications or APIs of the transactional database. Many companies have test environments that you can use to create your queries. By using these applications or APIs, you can gain practical knowledge about the business aspect of the data and even detect any related bugs or issues. If you are new to this process and require help generating test data, don't hesitate to contact the Operation or QA teams.

If you are new to APIs and don't know how to use them, you can use software like Postman to use and test APIs. For more information, check out these links:

Guideline 2 - Generate Test Data Using T-SQL

If you have to add, delete, and update permissions, you can generate test data by creating it yourself. Still, it's essential to understand the business rules related to the data. For instance, a field value in one table may depend on the value(s) of another/other field(s) in the same or a different table. Understanding these rules will allow you to add data accurately.

For example, you need to generate data for a Person table that includes the following fields: ID, NationalID, FirstName, LastName, Gender, and BirthDate. We can create it by using a few predefined functions and simple query techniques as follow:

-- First, we will create the table.
-- MSSQLTips.com
CREATE TABLE Person
                   (
                     ID         NUMERIC(38,0) PRIMARY KEY IDENTITY (1,1),
                     NationalID NVARCHAR(10),
                     FirstName  NVARCHAR(30),
                     LastName   NVARCHAR(30),
                     Gender     NVARCHAR(10),
                     BirthDate  DATE
                   )

Second, we will create a query that will populate the Person table with 39600 rows.

-- MSSQLTips.com
declare @max int = 200;
 
with FirstName(FirstName,Gender) as
(
    SELECT N'John'       , N'Male'   UNION ALL
    SELECT N'Tim'        , N'Male'   UNION ALL
    SELECT N'Laura'      , N'Female' UNION ALL
    SELECT N'Jeff'       , N'Male'   UNION ALL
    SELECT N'Alexander'  , N'Male'   UNION ALL
    SELECT N'Burt'       , N'Male'   UNION ALL
    SELECT N'Christopher', N'Male'   UNION ALL
    SELECT N'Daniel'     , N'Male'   UNION ALL
    SELECT N'Daniela'    , N'Female' UNION ALL
    SELECT N'Eric'       , N'Male'   UNION ALL
    SELECT N'Alexandra'  , N'Female' UNION ALL
    SELECT N'Bertha'     , N'Female' UNION ALL
    SELECT N'Christine'  , N'Female' UNION ALL
    SELECT N'Noor'       , N'Female' UNION ALL
    SELECT N'Leen'       , N'Female' UNION ALL
    SELECT N'Aseel'      , N'Female' UNION ALL
    SELECT N'Erica'      , N'Female' UNION ALL
    SELECT N'Jackson'    , N'Male'   UNION ALL
    SELECT N'Walid'      , N'Male'   UNION ALL
    SELECT NULL          , NULL      UNION ALL --  This row was added because FirstName,Gender columns in Person table are nullable
    SELECT NULL          , N'Male'   UNION ALL --  This row was added because FirstName column in Person table is nullable
    SELECT NULL          , N'Female'           --  This row was added because FirstName column in Person table is nullable
)
, LastNames(LastName) as
(
    SELECT N'Johnson'   UNION ALL
    SELECT N'Hudson'    UNION ALL
    SELECT N'Jackson'   UNION ALL
    SELECT N'Ranallo'   UNION ALL
    SELECT N'Curry'     UNION ALL
    SELECT N'Allaham'   UNION ALL
    SELECT N'Alnatur'   UNION ALL
    SELECT N'Alkhattab' UNION ALL
    SELECT Null --  This row was added because LastName column in Person table is nullable
 
)
, SortedNames(FirstName, LastName,Gender, RowNum) as  -- This code will generate Sorted Names
(
   /* Number of records from this Select will be: Number of Male records in FirstName WITH clause Multiplied By Number of records IN LastNames WITH clause*/
    select
           FirstName
         , LastName
         , Gender
         , ROW_NUMBER() over (Order by newid())
    from FirstName
    cross join LastNames
    where Gender = 'Male'
   UNION
   /* Number of records from this Select will be: Number of Female records in FirstName WITH clause Multiplied By Number of records IN LastNames WITH clause*/
    select
           FirstName
         , LastName
         , Gender
         , ROW_NUMBER() over (Order by newid())
    from FirstName
    cross join LastNames
    where Gender = 'Female'
   UNION
   /* Number of records from this Select will be: Number of Null Gender records in FirstName WITH clause Multiplied By Number of records IN LastNames WITH clause*/
    select 
           FirstName
         , LastName
         , Gender
         , ROW_NUMBER() over (Order by newid())
    from FirstName
    cross join LastNames
    where Gender is null
),
Names AS
(
    select     
         FirstName
       , LastName
       , Gender
    from SortedNames
)
, Multiplier as
 (
   SELECT 1 N
   UNION ALL
   SELECT N+1
   FROM Multiplier
   WHERE N < @max
 )
INSERT INTO [dbo].[Person]([NationalID],[FirstName],[LastName],[Gender],[BirthDate])
select 
  RIGHT(CAST(CAST(CRYPT_GEN_RANDOM(4) AS bigint) AS varchar(10)), 10) NationalID /* Generate random 10 numbers*/              
, FirstName
, LastName
, Gender
, DATEADD(DAY, ABS(CHECKSUM(NEWID()) % 36500), '1940-1-1') BirthDate /* Generate random dates between 1940-1-1 and 2040-01-01; 36500 is around 100 years*/
from Names
cross join Multiplier /*To cross multiply Names WITH clause that returns 198 random {FirstName LastName with gender} combinations with the number that we specify in @max parameter*/
order by NEWID()
option (maxrecursion 0);

I used a few commands in the above code to help me create my test data. If you are unfamiliar with these commands, please refer to the following links for more information:

In addition to creating a query to populate a table, we can use SQL Server INSERT and GO commands to load it with nth records. For example, the following code will run the same INSERT command 20 times:

-- MSSQLTips.com
INSERT INTO [dbo].[Person]
           (
             [NationalID]
            ,[FirstName]
            ,[LastName]
            ,[Gender]
            ,[BirthDate]
           )
VALUES
           (
            RIGHT(CAST(CAST(CRYPT_GEN_RANDOM(4) AS bigint) AS varchar(10)), 10)
            ,N'John'
            ,N'Tim'
            ,N'Male'
            ,'1991-09-02'
           )
GO 20

For more information, check out this tip: Executing a TSQL batch multiple times using GO.

Guideline 3 - Generate Data via Queries

If you have only read permission for a database that does not have a data pattern you need for testing your query, don't give up. Instead, manipulate your data logically.

Whenever I'm testing queries, I follow a personal rule of thumb: "When in doubt, add your data to the query."

After we finish listing the main guidelines to generate test data, we will review two scenarios and how to add our data to the query to validate our query.

Guideline 4 - Generate Data Using Data Generation Tools

In some situations, utilizing test data generation tools provides significant ease in inserting data into the database, ultimately benefiting developers and testers. This guideline is notably more efficient and cost-effective than the complex and expensive manual insertion or script-writing process, especially if you have multiple large databases.

A few of these tools include:

Data Generation Examples

Below we will cover some examples of generating data based on using queries, which was briefly covered in Guideline 3 above.

Scenario #1 - Generate Rows for Missing Data Scenarios

In a table with few fields, an entire row of data that must meet the query logic is missing; for that, we may do the following steps:

  1. Use the dual table to prepare the required row.
  2. Union the new row with the source table.
  3. Use the modified dataset in the FROM or WITH clauses as a sub-query.

For example, in the previously created Person table, we were requested to count the persons in the Person table by age generation, and gender. Also, when age is null, map it to Unknown, and when Gender is null or an empty string, map it to Unknown. Unfortunately, the Person table doesn't have rows with empty BirthDate or empty string gender; at the same time, we don't have permission to create an Age generation table. Don't panic. We can test our code.

First, we will create our dummy record, as the following image shows.

Person table Dummy Record

Second, we will transform our data to map null values and empty strings to Unknown, extract age from the BirthDate column, and find age accordingly.

Transformations Query
-- MSSQLTips.com
With Person_SQ as
(
 SELECT  ID, NationalID, FirstName ,LastName,Gender,BirthDate
 FROM Person
 UNION ALL
 SELECT (SELECT MAX(ID) FROM Person) + 1  ID,
        N'0000000001' NationalID,
        N'Dummy' FirstName ,
        N'Record' LastName,
        N'' Gender,
        NULL BirthDate
)
, AgeGeneration as
 (
  SELECT 1 GenID , 78 FromAge  , 999 ToAge  , N'The Silent Generation' Generation , N'1900-1945' AgeRange
  UNION ALL
  SELECT 2 GenID , 59 FromAge , 77 ToAge     , N'Baby Boomers' Generation , N'59-77' AgeRange
  UNION ALL
  SELECT 3 GenID , 43 FromAge , 58 ToAge , N'Gen X: Born' Generation , N'43-58' AgeRange
  UNION ALL              
  SELECT 4 GenID , 27 FromAge , 42 ToAge , N'Millennials' Generation , N'27-42' AgeRange
  UNION ALL              
  SELECT 5 GenID , 11 FromAge , 26 ToAge , N'Gen Z' Generation , N'11-26' AgeRange
  UNION ALL              
  SELECT 6 GenID , -999 FromAge , 10 ToAge , N'Gen Alpha' Generation , N'<=10' AgeRange
  UNION ALL              
  SELECT 7 GenID , NULL FromAge , NULL ToAge , N'Unknown' Generation , N'Unknown' AgeRange
 )
 
SELECT *
FROM 
   (
     SELECT Person_SQ.ID, NationalID, FirstName ,LastName,Gender,BirthDate , 
          DateDiff(Year,BirthDate, GETDATE()) Age , 
          AG.GenID, AG.Generation , AG.AgeRange,
          CASE WHEN Gender IS NULL OR Gender = N'' THEN N'Unknown' ELSE Gender END MappedGender
     FROM Person_SQ
     LEFT JOIN AgeGeneration AG ON (DateDiff(Year,BirthDate, GETDATE()) BETWEEN AG.FromAge AND ToAge) OR (BirthDate IS NULL AND AG.GenID = 7)
   ) MainQuery
ORDER BY ID DESC;

Third, we will write an aggregation query for the required logic.

Aggregation query result
-- MSSQLTips.com
With Person_SQ as
(
 SELECT  ID, NationalID, FirstName ,LastName,Gender,BirthDate
 FROM Person 
 UNION ALL
 SELECT (SELECT MAX(ID) FROM Person) + 1  ID,
        N'0000000001' NationalID,
        N'Dummy' FirstName ,
        N'Record' LastName,
        N'' Gender,
        NULL BirthDate
)
, AgeGeneration as
 (
  SELECT 1 GenID , 78 FromAge  , 999 ToAge  , N'The Silent Generation' Generation , N'1900-1945' AgeRange
  UNION ALL
  SELECT 2 GenID , 59 FromAge , 77 ToAge     , N'Baby Boomers' Generation , N'59-77' AgeRange
  UNION ALL
  SELECT 3 GenID , 43 FromAge , 58 ToAge , N'Gen X: Born' Generation , N'43-58' AgeRange
  UNION ALL              
  SELECT 4 GenID , 27 FromAge , 42 ToAge , N'Millennials' Generation , N'27-42' AgeRange
  UNION ALL              
  SELECT 5 GenID , 11 FromAge , 26 ToAge , N'Gen Z' Generation , N'11-26' AgeRange
  UNION ALL              
  SELECT 6 GenID , -999 FromAge , 10 ToAge , N'Gen Alpha' Generation , N'<=10' AgeRange
  UNION ALL              
  SELECT 7 GenID , NULL FromAge , NULL ToAge , N'Unknown' Generation , N'Unknown' AgeRange
 )
 
SELECT Generation [Age Generation] , MappedGender Gender , FORMAT(COUNT(*) , '#,#') [Total Number of Persons]
FROM
   (
     SELECT Person_SQ.ID, NationalID, FirstName ,LastName,Gender,BirthDate , 
          DateDiff(Year,BirthDate, GETDATE()) Age , 
          AG.GenID, AG.Generation , AG.AgeRange,
          CASE WHEN Gender IS NULL OR Gender = N'' THEN N'Unknown' ELSE Gender END MappedGender
     FROM Person_SQ
     LEFT JOIN AgeGeneration AG ON (DateDiff(Year,BirthDate, GETDATE()) BETWEEN AG.FromAge AND ToAge) OR (BirthDate IS NULL AND AG.GenID = 7)
   ) MainQuery
GROUP BY Generation , MappedGender , GenID
ORDER BY GenID , MappedGender;

In addition to adding a dummy record to test the needed transformation in the previous example, we used a WITH clause and UNION ALL command to logically create a lookup table that is not physically created in the database.

Although this is a simple example, the same process can be applied to test a wide range of query complexities.

Scenario #2 - Generate Rows for Missing Data Scenarios

In a table with many fields, a particular value(s) in a column that must meet the query logic is missing. For that, we may do the following steps:

  1. Use the case statement to change the value for a few rows to the required value(s).
  2. Write a WITH clause that contains the modified column data, which will be aliased with the original column name.
  3. To test the query, call the modified dataset with a WITH clause in the FROM clause.

For example, in the previously created Person table, we need to return all information for the records in which the NationalID value has two or more occurrences in the dataset, and it goes the same if NationalID is null. Unfortunately, the Person table doesn't have this pattern of data. Still, we can test our code.

Most of the developers will write the query as follows:

-- MSSQLTips.com
 
WITH Person_SQ AS
(
   SELECT ID , 
         CASE
             WHEN ID = 56    THEN N'1046776265'
             WHEN ID = 58    THEN N'3820053562'
             WHEN ID = 59    THEN N'3820053562'          
             WHEN ID = 101   THEN N'3175732123' 
             WHEN ID = 1102  THEN N'3175732123'
             WHEN ID = 1103  THEN N'3175732123'
             WHEN ID = 39600 THEN N'3175732123'
             WHEN ID = 39592 THEN N'3175732123'
             WHEN ID = 39570 THEN N'3175732123'
             WHEN ID = 1     THEN NULL
             WHEN ID = 12    THEN NULL
             WHEN ID = 33    THEN NULL
             WHEN ID = 46    THEN NULL         
            ELSE NationalID
         END NationalID,
         FirstName,
         LastName,
         Gender,
         BirthDate
   FROM Person
)
, MoreThanOneOccurrence  AS
(
SELECT COUNT(*) CNT , NationalID
FROM Person_SQ AS Person
GROUP BY NationalID
HAVING COUNT(*) > 1
)
SELECT Person.* , MT1O.CNT [Number Of Occurrence]
FROM Person_SQ AS Person
INNER JOIN MoreThanOneOccurrence MT1O ON Person.NationalID  =  MT1O.NationalID
ORDER BY MT1O.CNT DESC , Person.NationalID
NationalID_MoreThanOneOccurrence Wrong Result

If null values are not considered in an INNER JOIN, any occurrence of NationalID being null and, in our scenario, occurring more than once, will not be returned. This was discovered by creating data test cases logically. Therefore, the correct code should be adjusted accordingly.

-- MSSQLTips.com
WITH Person_SQ AS
(
   SELECT ID , 
         CASE
             WHEN ID = 56    THEN N'1046776265'
             WHEN ID = 58    THEN N'3820053562'
             WHEN ID = 59    THEN N'3820053562'          
             WHEN ID = 101   THEN N'3175732123' 
             WHEN ID = 1102  THEN N'3175732123'
             WHEN ID = 1103  THEN N'3175732123'
             WHEN ID = 39600 THEN N'3175732123'
             WHEN ID = 39592 THEN N'3175732123'
             WHEN ID = 39570 THEN N'3175732123'
             WHEN ID = 1     THEN NULL
             WHEN ID = 12    THEN NULL
             WHEN ID = 33    THEN NULL
             WHEN ID = 46    THEN NULL         
            ELSE NationalID
         END NationalID,
         FirstName,
         LastName,
         Gender,
         BirthDate
   FROM Person
)
, MoreThanOneOccurrence  AS
(
SELECT COUNT(*) CNT , NationalID
FROM Person_SQ AS Person
GROUP BY NationalID
HAVING COUNT(*) > 1
)
SELECT Person.* , MT1O.CNT [Number Of Occurrence]
FROM Person_SQ AS Person
INNER JOIN MoreThanOneOccurrence MT1O ON ISNULL(Person.NationalID, '') = ISNULL(MT1O.NationalID, '')
ORDER BY MT1O.CNT DESC , Person.NationalID
result set

Conclusion

Although we covered some simple examples, the same process can be applied to test a wide range of query complexities.

We learned that we can Unit Test queries, which is not a myth. And we can test query logic using different methods as described in the Guidelines sections, which cover all types of database permission that any databases and/or business intelligence developers can have.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aseel Al-Laham Aseel AL-Laham is a Business Intelligence (BI) Specialist with over eight years of experience in the BI field. She started as a QA engineer for BI & Databases, which opened the path to becoming a BI Specialist with a QA soul.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-08-21

Comments For This Article




Friday, September 15, 2023 - 10:43:51 AM - R Glen Coopet Back To Top (91569)
Nice article!














get free sql tips
agree to terms