Checking Cross-Database Referential Integrity in SQL Server - Part 1

By:   |   Updated: 2023-05-26   |   Comments   |   Related: More > Referential Integrity


Problem

If your database application(s) uses multiple databases within the same SQL Server instance, there is a high chance that there's data in one database related to data in another. If you've ever been in a scenario like this, you probably know by now that you cannot create a foreign key in one database table referencing a table in a different database. Because of this, there's a potential risk that you'll end up with orphan records. Even if developers assure you that's not the case because the backend ensures that no orphan records will be left in the database, I'd say, "Trust in what you can control." We must develop a solution that helps us in such situations.

Solution

This article will explore a solution to help maintain referential integrity between different databases. This is aimed toward new data coming into the database or updating existing data.

Initial Considerations

  • This article covers databases within a SQL Managed Instance, a SQL Server instance in a VM, or an on-prem deployment. Therefore, Azure SQL databases do not apply to this solution.
  • For demonstration purposes and simplicity, I will use a pair of databases called "parent" and "children," a table on each side, and some sample records.
  • I will try to keep the solution very simple so that you can get the overall idea and expand/scale upon it.

Setting Up the Sample Scenario

First, let's create two databases called "parent" and "children".

USE master
GO

CREATE DATABASE [parent]
CREATE DATABASE [children]
GO
Create databases

Then, let's create a table in both databases and add dummy data.

USE [parent]
GO

CREATE TABLE [dbo].[parent_table](
	[id] [int] NOT NULL,
	[description] [varchar](16) NOT NULL,
	PRIMARY KEY CLUSTERED ([id] ASC)
GO

INSERT INTO [dbo].[parent_table] VALUES
(1,'configuration 1'),(2,'configuration 2'),(3,'configuration 3')
GO

USE [children]
GO

CREATE TABLE [dbo].[child_table](
	[id] [int] NOT NULL,
	[configuration_id] [int] NOT NULL,
	PRIMARY KEY CLUSTERED ([id] ASC)
GO

INSERT INTO [dbo].[child_table] VALUES
(1,1),(2,2),(3,3)
GO
Create a table

For the sake of science, let's attempt to create a Foreign Key from child_table to parent_table.

error message

The image above shows an error message that cross-database foreign key references are not supported, so we are on the right track.

Forcing Referential Integrity for Incoming Data

To achieve this, let's use an INSTEAD OF trigger structure in the child_table. An INSTEAD OF trigger is a trigger that allows us to interrupt/bypass a regular DML operation with some logic defined by us.

Here's the code for the trigger (you can download the code at end of this article):

CREATE OR ALTER TRIGGER trg_fk_children ON child_table
INSTEAD OF INSERT,UPDATE
AS
BEGIN
    SET NOCOUNT ON;
    
   CREATE TABLE #tmp_records(
      [id]               INT NOT NULL,
      [configuration_id] INT NOT NULL,
      [parent_exist]     BIT NOT NULL
   )
   
   INSERT INTO #tmp_records
   SELECT inserted.[id],
         inserted.[configuration_id],
         CASE WHEN EXISTS (SELECT 1 FROM parent.dbo.parent_table WHERE id = inserted.configuration_id) THEN 1 ELSE 0 END
   FROM   INSERTED inserted
   
   --If an UPDATE statement was issued
   IF ((SELECT COUNT(*) FROM DELETED) > 0 )
   BEGIN
      UPDATE child_table
      SET    child_table.[id]               = inserted.[id],
         child_table.[configuration_id] = inserted.[configuration_id]
      FROM   child_table child_table
      JOIN   #tmp_records inserted ON child_table.[id] = inserted.[id]
      WHERE  inserted.[parent_exist] = 1 
   END
   ELSE --This means that the operation is an INSERT
   BEGIN
      INSERT INTO child_table([id],[configuration_id])
      SELECT 
            [id],
            [configuration_id]
      FROM  #tmp_records 
      WHERE [parent_exist] = 1    
   END
   
   --Return the invalid records, if there's at least 1
   IF((SELECT COUNT(*) FROM #tmp_records WHERE [parent_exist] = 0) > 0)
   BEGIN
      DECLARE @errorMessage VARCHAR(512) = (SELECT CONCAT('The following elements have unresolved references to parent.dbo.parent_table:',CHAR(13),CHAR(10)))
      DECLARE @children NVARCHAR(100) = (SELECT STRING_AGG(configuration_id,',') FROM #tmp_records WHERE [parent_exist] = 0)
      
      IF LEN(@children) > 0 SET @errorMessage += CONCAT('Parent ID:',@children,CHAR(13),CHAR(10))                            
      RAISERROR(@errorMessage,16,1)
   END
   RETURN
END

Let me explain why the trigger is crafted like this:

  • I'm creating a temp table (#tmp_records) because our solution must be able to work with batches of DML operations at once. If you code your solution thinking that you will always deal only with single records, you will potentially end up with weird/inconsistent things in your data (precisely what we are trying to address).
  • In the temp table, I'm adding an extra field called "parent_exist," which we will use to validate whether that ID exists in the parent table. If your table has multiple fields that need to be checked, add one "_exist" field per each.
  • Populate the temp table with whatever data the DML operation contains and update the parent_exist field of each record accordingly.
  • The core logic: If the intended operation is an UPDATE, then update all the records with parent IDs in the parent table. Otherwise, insert the new records with parent IDs in the parent table.
    • Quick pause here: I'm using the same trigger for an INSERT or UPDATE operation in this example. Therefore, to distinguish if the intent of the operation is one or the other, we do this:
      • Usually, triggers use the internal INSERTED and DELETED tables to store information related to the table you are interacting with.
      • If the DELETED table is not empty, then SQL Server will store the current data of the targeted record for modification and will store in the INSERTED table the new data.
  • Lastly, if there are records in the temp table with a parent_exist value of 0, then prepare an error message that will be returned to the client. This article written by Jared Westover, SQL RAISERROR to Better Handle Errors, covers the RAISERROR function in more detail.

Let's look at a couple of examples of DML operations against our sample tables to see the trigger in action.

First, here's the data currently in the parent and child tables:

Current data

Now let's try to insert a couple of records into the children.dbo.child_table:

Insert records

The trigger worked as expected because the configuration_ids 4 and 5 do not exist in the parent table. Not only that, but the trigger also gave us the list of parent Ids that have unresolved references to the parent table, separated by a comma.

Now let's try an UPDATE operation to attempt to assign a new invalid configuration_id to an existing record:

UPDATE opertion

The UPDATE record was perfectly intercepted as well.

To wrap up this demonstration, let's insert a couple of valid records to see if the trigger allows them to pass through:

Insert records
Insert records

Success, both DML operations worked as expected!

Insights of this Approach

  • In large batches of DML operations, the performance might suffer a penalty due to the logic implemented in the trigger, so carefully study and benchmark your use cases.
  • You must be very careful of the applied logic in the trigger to check if the parent record exists, or you can end up with inaccuracies in your data.
  • If the structure of your table is constantly changing (adding/removing fields), you might have a hard time maintaining this solution because for each new or removed field, you need to update the trigger accordingly, or it will fail. For situations like this, you probably want to eliminate the burden and instead implement a job or other automation solution to notify you when orphan data is detected within your tables. I will be covering this specific solution in part 2 of this series.
Next Steps
  • In part 2, I will show the trigger for the parent side of the equation because we need to react somehow if a parent record suffers a modification or a deletion.
  • In addition to the trigger at the parent side, I will show an alternate solution in case you still need some cross-database referential integrity but do not want to live with the cons of this technique. This solution involves developing a control point to notify you when orphan records are detected in the specified tables. Stay tuned!
  • Download the code for this article.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Alejandro Cobar Alejandro Cobar is an MCP, SQL Server DBA and Developer with 10+ years of experience working in a variety of environments.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-05-26

Comments For This Article