SQL Server Reference Data Best Practices

Problem

Reference data is added to the SQL Server database by multiple developers and testers, thereby causing inconsistencies in the database due to reference data updated by one developer getting overwritten by another developer/tester and so on. In this tip we will look at how this can be solved.

Solution

If you are not familiar with reference data, please refer to my previous tip to get the basics of SQL Server reference data.

Adopting reference data best practices will not only avoid such conflicts, but also keeps the database in a consistent and reliable state. Before we jump into reference data best practices, it is worth considering replicating the problem scenario.

Replicating Problem (Scenario)

To replicate the problem, let’s setup a sample database with reference data (table). This tip assumes you are working with SQL Server Data Tools (SSDT).

CarsSample Database Setup

Create a CarsSample database and populate it using the following scripts using SQL Server Object Explorer:

Creating CarsSample Database Project from the Database

Create a blank database project and call it CarsSample and then import the previously created database into the project:

There is no reference data involved since we have just one table:

Adding Reference Tables

Add the following reference tables in the database project:

CarMake
CarType

-- (1) Adding CarMake reference table
CREATE TABLE [dbo].[CarMake]
(
   [CarMakeId] INT NOT NULL IDENTITY(1,1),
   [Name] VARCHAR(40),
   [Detail] VARCHAR(200)    
   CONSTRAINT [PK_CarMake] PRIMARY KEY CLUSTERED ([CarMakeId])
)
 
-- (2) Adding CarType reference table
CREATE TABLE [dbo].[CarType]
(
   [CarTypeId] INT NOT NULL IDENTITY(1,1),
   [Name] VARCHAR(40),
   [Detail] VARCHAR(200), 
    CONSTRAINT [PK_CarType] PRIMARY KEY Clustered ([CarTypeId])
 
)

The reference tables once defined requires the main table Car to be updated so reference table columns are passed as foreign keys to the main table as follows:

-- Car table updated to fit reference tables columns passed as foreign keys
CREATE TABLE [dbo].[Car] (
    [CarId]   INT          IDENTITY (1, 1) NOT NULL,
    [CarMakeId]    INT NULL,
    [RegYear] INT          NULL,
    [CarTypeId]    INT NULL,
    CONSTRAINT [PK_Car] PRIMARY KEY CLUSTERED ([CarId] ASC), 
    CONSTRAINT [FK_Car_CarMake] FOREIGN KEY ([CarMakeId]) REFERENCES [CarMake]([CarMakeId]), 
    CONSTRAINT [FK_Car_CarType] FOREIGN KEY ([CarTypeId]) REFERENCES [CarType]([CarTypeId]),
)

First, make sure that the target connection is pointing to CarsSample database, also keep Deploy database properties, Always re-create database and Block incremental deployment if data loss might occur options checked as shown below:

Debug the SQL Database Project to deploy the changes we have just made, provide the target connection is set to CarsSample database:

Simulating Dev1 and Dev2 Tasks to Add Reference Data

At this point, we have both a main table and a reference table ready except the data scripts are not yet written.

Assume, Dev1 has been tasked to add data to CarMake table and Dev2 has been tasked to add data to CarType table.

To demonstrate this, we must create two database projects from CarsSample database and call them CarsSampleDev1 and CarsSampleDev2.

Create CarsSampleDev1 database project from CarsSample database to simulate Dev1 local development environment:

Similarly, create another SQL Database Project from CarsSample database and call it CarsSampleDev2:

Both projects are ready now:

Declarative Database Development Style

SQL Server Data Tools (SSDT) offers declarative database development, which has many benefits including not needing to worry about the state of the database when adding new or modifying existing database objects.

When it comes to adding reference data, it is added in the form of data script(s), that run as a post-deployment script during the publish or debug process.

According to Garry Trinder, a declarative database style makes database deployment repeatable, which means it does not matter how many times you deploy the same changes to your debug database, it remains consistent.

By default, data scripts used to populate reference tables are non-repeatable unlike database objects.

Setting up Dev1 Development Environment

Setup development environment for Dev1 represented by CarsSampleDev1 is as follows:

The use of localdb mimics local development environment for Dev1.

The checked deployment option “Always re-create database” ensures that the reference data scripts are repeatable and with each new deployment the reference tables get re-populated.

Adding Dev1 Reference Data (CarMake)

Create a folder and call it Reference Data then add the following script to it:

Also, add the post deployment script to the run data script (after enabling SQLCMD option):

/* 
Post-Deployment Script Template 
-------------------------------------------------------------------------------------- 
This file contains SQL statements that will be appended to the build script. 
Use SQLCMD syntax to include a file in the post-deployment script. 
Example:      :r .\myfile.sql 
Use SQLCMD syntax to reference a variable in the post-deployment script. 
Example:      :setvar TableName MyTable 
               SELECT * FROM [$(TableName)] 
-------------------------------------------------------------------------------------- 
*/ 
:r ".\Reference Data\CarMake.data.sql"

The project should look as follows:

DeDebug the project to deploy changes to the localdb which is like a local development environment for Dev1.

Debug database has been successfully created for the first and reference table for Dev1 (CarMake) has also been populated by the post deployment script.

Adding Dev1 Publish Profile for Dev Server

Next step is to create a publish profile to deploy the SQL Database Project managed locally by Dev1 to Dev Server which is a shared development environment.

Right click on CarsSampleDev1 and click on “Publish” as shown below:

Create publish profile for dev server by pointing to your local SQL instance as we are simulating dev server:

Click “Advanced…” and check “Always re-create database” option in the publish profile.

Running Dev1 Publish Profile

Run publish profile and see all the Dev1 changes getting deployed to shared dev database:

Viewing Shared Dev Database (CarsSample)

Take a closer look at CarsSample shared dev database published by Dev1 using publish profile.

The CarMake table got populated so the reference table managed by Dev1 has been successfully populated as shown below:

Setting up Dev2 Development Environment

Setup development environment for Dev2 the same way Dev1 environment was setup except the reference table CarType needs to be populated instead of CarMake.

Adding Dev2 Reference Data (CarType)

Create a folder and call it Reference Data then add the following script to it:

Also, add post deployment script to run data script:

/* 
Post-Deployment Script Template 
-------------------------------------------------------------------------------------- 
This file contains SQL statements that will be appended to the build script. 
Use SQLCMD syntax to include a file in the post-deployment script. 
Example:      :r .\myfile.sql 
Use SQLCMD syntax to reference a variable in the post-deployment script. 
Example:      :setvar TableName MyTable 
               SELECT * FROM [$(TableName)] 
-------------------------------------------------------------------------------------- 
*/ 
:r ".\Reference Data\CarType.data.sql"

The project should look like as follows:

Adding Dev2 Publish Profile for Dev Server

Add Dev2 publish profile pointing to the same shared database (CarsSample) same as Dev1 profile settings and create publish profile “CarsSampleDev2.publish.xml”.

Run Profile and View Published Database for Dev2

Run the publish profile and view the published database, particularly view reference tables:

This time CarType reference table is populated but CarMake reference table changes done by Dev1 are gone.

So, we have successfully replicated the problem and it is obvious that “Always re-create database” deployment option is best suited if only one developer is responsible for adding reference tables and the data scripts, which is not always the case.

Changes done by Dev1 get overwritten by the changes done by Dev2 because of the repeatable nature of the deployment scripts.

Potential Solution

One of the possible solutions of the above problem is to use alternative way of creating repeatable data scripts so that the changes done by one developer don’t get overwritten by another developer in the context of reference table data.

One thing is for sure we have to avoid using “Always re-create database” deployment option but somehow, we have make the scripts repeatable to comply with standard practices.

This means, we only need to insert data if it does not already exist.

Let us implement this for Dev2 by first unchecking “Always re-create database” and then replacing CarType.data.sql script with the following code:

-- Add reference data to CarType table if does not already exist
SET IDENTITY_INSERT [dbo].[CarType] ON
 
IF NOT EXISTS (SELECT * FROM CarType
WHERE CarTypeId=1 and Name='Estate') 
INSERT INTO [dbo].[CarType] ([CarTypeId], [Name], [Detail]) VALUES (1, N'Estate', NULL)
 
IF NOT EXISTS (SELECT * FROM CarType
WHERE CarTypeId=2 and Name='SUV') 
INSERT INTO [dbo].[CarType] ([CarTypeId], [Name], [Detail]) VALUES (2, N'SUV', NULL)
 
IF NOT EXISTS (SELECT * FROM CarType
WHERE CarTypeId=3 and Name='Saloon') 
INSERT INTO [dbo].[CarType] ([CarTypeId], [Name], [Detail]) VALUES (3, N'Saloon', NULL)
 
IF NOT EXISTS (SELECT * FROM CarType
WHERE CarTypeId=4 and Name='Hatchback') 
INSERT INTO [dbo].[CarType] ([CarTypeId], [Name], [Detail]) VALUES (4, N'Hatchback', NULL)
 
IF NOT EXISTS (SELECT * FROM CarType
WHERE CarTypeId=5 and Name='MPV') 
INSERT INTO [dbo].[CarType] ([CarTypeId], [Name], [Detail]) VALUES (5, N'MPV', NULL)
 
SET IDENTITY_INSERT [dbo].[CarType] OFF

Publish the project to Dev server using the publish profile and see the results:

Now delete first two rows in the table and run the publish profile again.

The table still contains all the five records even though the first two records were deleted before the publish profile was rerun. The data script is working fine.

A similar sort of script can be written for CarMake table managed by Dev1.

The above approach does the job, but a more comprehensive solution and more about reference data best practices is going to be presented in the next part of this article.

Reviewing Reference Data Best Practices

Let’s extract some best practices that we have learned so far when it comes to reference data:

The database objects in the database project do not care about the state of the database which is going to be updated, they straightaway get deployed and end up either adding new objects or modifying existing objects, but this does not hold true for reference data.
The reference data scripts must be repeatable so that running them again and again do not cause any inconsistency in the database.
Always re-create database deployment option is only suitable when deploying changes to debug database inside local dev environment because in that case each developer has his/her own debug database.
Always re-create database deployment option must be avoided when publishing database project to Dev or Test environment.
Publishing database project to Production directly using a publish script is not recommended as the changes must be reviewed by a DBA or infrastructure team before going on Live.
The recommended practice to add reference data to the database is by using data scripts which are run by post deployment script once the project is debugged or published.
The reference data should explicitly handle the identity columns in the data scripts to avoid potential issues caused by identity inserts.

Next Steps

Please try implementing the potential solution completely such that it also removes any record on the dev database which is not part of the predefined table(s).
Please try adding more environments such as Test or QA and extend the solution to handle reference data deployment to these environments.
Please go through my previous tip and try implementing the potential solution mentioned in that tip.

Haroon Ashraf

Haroon Ashraf’s deep interest in logic and reasoning at an early age of his academic career paved his path to become a data professional.

He holds BSc (Gold Medal) and MSc Degrees in Computer Science and also received the OPF merit award.

His programming career began in 2006 working on his first data venture to migrate and rewrite a public sector database driven examination system from IBM AS400 (DB2) to SQL Server 2000 using VB 6.0 and Classic ASP along with developing reports and archiving many years of data.

His work and interest revolves around Business Intelligence and Database Centric Architectures and his expertise include database and reports design, development, testing, implementation and migration.

He has recently earned “Knowledge Management and Big Data in Business” certificate from The Polytechnic University of Hong Kong.

MSSQLTips Awards: Trendsetter (25+ tips) – 2020 | Author of the Year Contender – 2018-2020

SQL Server Reference Data Best Practices – Part 1