Make your SQL Server database changes backward compatible when changing a relationship

By: Aaron Bertrand | Comments | Related: 1 | 2 | 3 | 4 | > TSQL

Problem

Let's face it: requirements change. There is usually a lot of churn during the design and initial development stages, but changes can happen to mature applications, too. The key is to introduce those changes with the least amount of effort and risk. One change I had to deal with recently was changing a 1 : many relationship to a many : many relationship. This meant changing procedures to both accept and return multiple values representing a relationship, when previously they only had to deal with one.

Solution

In case you haven't sensed a growing theme here, my approach to changes is to ensure that database changes can be made in advance of the surrounding applications. I do this both as a selfish way to preserve control over the model, and because I know that the applications won't necessarily be updated on the same schedule.

Up front, I'll state my typical disclaimer that I am treating a fairly simple case here... the system does not have any replication, that the tables are not subject to BULK INSERT or other ad hoc techniques, that all data access is enforced through stored procedures, and that a policy to always explicitly name stored procedure parameters is being followed. So in this case, changing the tables to support a different type of relationship can be handled in a fairly controlled manner. Not all scenarios are like this, so I urge you to fully understand all of the ways your applications interface with the database before deciding how to approach such a change.

With that out of the way, let's get more specific. First, what kind of entities are we talking about here? Let's imagine you have a social networking site, and a table that contains information about Members. There is also a table that represents the Networks that a Member could belong to. When the site was first launched, Members could pick a Network (and only one Network). Here is how the core parts of the schema might look (showing only the relevant columns/parameters, and leaving out error handling for brevity):

    
CREATE TABLE dbo.Networks
(
    NetworkID  INT IDENTITY(1,1) PRIMARY KEY,
    Name       NVARCHAR(64) NOT NULL UNIQUE
    -- , ...
);
GO
CREATE TABLE dbo.Members
(
    MemberID   INT IDENTITY(1,1) PRIMARY KEY,
    EMail      VARCHAR(320) NOT NULL UNIQUE,
    NetworkID  INT NOT NULL FOREIGN KEY
               REFERENCES dbo.Networks(NetworkID)
    -- , ...
);
GO
CREATE PROCEDURE dbo.Member_Create
    @EMail     VARCHAR(320),
    @NetworkID INT
    -- , ...
    @MemberID  INT OUTPUT
AS
BEGIN
    INSERT dbo.Members
    (
        EMail,
        NetworkID
        -- , ...
    )
    SELECT
        @EMail,
        @NetworkID
        -- , ...
    ;
        
    SELECT @MemberID = SCOPE_IDENTITY();
END
GO
CREATE PROCEDURE dbo.Member_Update
    @MemberID  INT,
    @EMail     VARCHAR(320),
    @NetworkID INT
    -- , ...
AS
BEGIN
    UPDATE dbo.Members
    SET 
        EMail     = @EMail,
        NetworkID = @NetworkID
    WHERE
        MemberID  = @MemberID;
END
GO
CREATE PROCEDURE dbo.Member_GetDetails
    @MemberID INT
AS
BEGIN
    SELECT
        EMail,
        NetworkID
        -- , ...
    FROM
        dbo.Members
    WHERE
        MemberID = @MemberID;
END
GO

Clearly the current schema enforces that each Member can belong to one and only one Network. If we change the requirement so that a Member can belong to multiple (and again, in the spirit of keeping it simple, unlimited) Networks, we would need to introduce a mapping table that allows a many-to-many relationship to exist. We will also need to make the NetworkID column nullable, since the stored procedures will start using the mapping table for the data instead; once all of the stored procedures have been changed and the existing data has been migrated, we can drop it entirely.

    
CREATE TABLE dbo.NetworkMembers
(
    NetworkID  INT NOT NULL FOREIGN KEY 
               REFERENCES dbo.Networks(NetworkID),
    MemberID   INT NOT NULL FOREIGN KEY
               REFERENCES dbo.Members(MemberID),
    PRIMARY KEY (NetworkID, MemberID)
);
GO
ALTER TABLE dbo.Members
    ALTER COLUMN NetworkID INT NULL
    FOREIGN KEY REFERENCES dbo.Networks(NetworkID);

In order to handle a virtually unlimited number of Networks when creating or updating a Member, you will need to decide how to pass more than one value into the _Create and _Update stored procedures. Personally, I prefer a comma-separated list that is split up using a Split() function (see this blog post for some background on splitting integers, and if you're on SQL Server 2008 or above, don't forget to consider Table-Valued Parameters). How you handle this (and the function itself) is beyond the scope of this tip, but I'm going to assume you have such a function, and that you're not going to add parameters like @Network1, @Network2, @Network3, etc. So given that, we can make the @NetworkID parameter optional, and add another optional parameter called @NetworkList. For the _Create procedure, it is a simple insert into the NetworkMembers table; for the _Update case, since you don't have an easy way to tell which Networks have been added/removed/unchanged, we'll just delete the entire set of Networks for the Member and re-insert the new list. If you are on SQL Server 2008, you can look into using the MERGE statement, which will be more efficient... especially if Members will belong to many Networks.

ALTER PROCEDURE dbo.Member_Create
    @EMail       VARCHAR(320),
    @NetworkID   INT = NULL,
    @NetworkList VARCHAR(MAX) = NULL
    -- , ...
    @MemberID    INT OUTPUT
AS
BEGIN
    INSERT dbo.Members
    (
        EMail
        -- , ...
    )
    SELECT
        @EMail
        -- , ...
    ;
    
    SELECT @MemberID = SCOPE_IDENTITY();
    DECLARE @Networks TABLE (NetworkID INT);
    
    IF @NetworkID IS NOT NULL
    BEGIN
        INSERT @Networks 
            SELECT @NetworkID;
    END
            
    IF @NetworkList IS NOT NULL
    BEGIN
        INSERT @Networks 
            SELECT [Value]
            FROM dbo.SplitIntegers(@NetworkList);
    END
    INSERT dbo.NetworkMembers
    (
        NetworkID,
        MemberID
    )
    SELECT
        [Value],
        @MemberID
    FROM
        @Networks;
END
GO
ALTER PROCEDURE dbo.Member_Update
    @MemberID    INT,
    @EMail       VARCHAR(320),
    @NetworkID   INT = NULL,
    @NetworkList VARCHAR(MAX) = NULL
    -- , ...
AS
BEGIN
    UPDATE dbo.Members
    SET 
        EMail     = @EMail,
        NetworkID = NULL
    WHERE
        MemberID  = @MemberID;
    
    DECLARE @Networks TABLE (NetworkID INT);
    
    IF @NetworkID IS NOT NULL
    BEGIN
        INSERT @Networks 
            SELECT @NetworkID;
    END
            
    IF @NetworkList IS NOT NULL
    BEGIN
        INSERT @Networks 
            SELECT [Value]
            FROM dbo.SplitIntegers(@NetworkList);
    END
    DELETE dbo.NetworkMembers
        WHERE MemberID = @MemberID;
        
    INSERT dbo.NetworkMembers
    (
     NetworkID,
        MemberID
    )
    SELECT
        [Value],
        @MemberID
    FROM 
        @Networks;            
END
GO

In a simplistic case, you will only have one application that calls the stored procedures directly, or all of your applications use a common data layer (and in the latter case, the data layer will need to compensate for the different applications and what parameters to expect). So you won't have to worry about some applications passing / retrieving only one Network, and other applications passing / retrieving multiple Networks.

In a more complex case, you may have multiple applications that call the stored procedures directly, and a situtation that might occur is one application (let's say App A) is updated to understand multiple Networks, and another app (App B) is still using the old one Network mindset. It can get complicated when App A updates a Member to be in multiple Networks:

Which Network will App B see in the _GetDetails procedure?
What will happen when App B updates Member using a single Network (since that's all it knows)?

When App A calls the _GetDetails stored procedure, it should be updated to retrieve a second resultset: the *set* of Networks that a Member belongs to. When App B calls _GetDetails, if the Member has been updated to belong to multiple networks, then we can just pick an arbitrary NetworkID for now (and we'll deal with what happens when the Member then gets updated by App B shortly). With the _Create and _Update procedures, we can tell whether App A or App B is calling the procedure, because either @NetworkID will be populated or @NetworkList will be populated. However for _GetDetails, we don't have a parameter that tells us which App is being called, and we can't assume that we can make a minor change to the old application by adding a parameter. So, in order to make the procedure functionality different depending on whether it is App A or App B making the call, you have two choices: (1) add an optional parameter to signal that the newer functionality should be used, and update App A to pass the parameter, or (2) create separate procedures. I prefer (1) because it is less messy, easier to revert to non-forked behavior once the old functionality has been deprecated, and because I never have a good answer for the question, "What do you call the new copy of Member_GetDetails? Member_GetDetails_2?"

So let's see what Member_GetDetails will look like depending on which application has made the call (we'll use a new, optional parameter called "@MultipleNetworks" to provide the fork, and App A should be updated to pass this parameter in):

ALTER PROCEDURE dbo.Member_GetDetails
    @MemberID         INT,
    @MultipleNetworks BIT = 1
AS
BEGIN
    SELECT
        EMail,
        NetworkID = CASE 
            WHEN @MultipleNetworks = 1 THEN NULL 
            ELSE 
            (
                SELECT TOP 1 ArbitraryNetworkID = NetworkID 
                    FROM dbo.NetworkMembers 
                    WHERE MemberID = @MemberID
            )
        END
        -- , ...
    FROM
        dbo.Members
    WHERE
        MemberID = @MemberID;
        
    IF @MultipleNetworks = 1
    BEGIN
        SELECT 
            NetworkID
        FROM
            dbo.NetworkMembers
        WHERE
            MemberID = @MemberID;
    END
END
GO

Again, this will cause a problem if App B has retrieved an arbitrary NetworkID, and then attempts to save changes (whether or not it has changed the NetworkID). We'll need to add some logic to the _Update procedure so that it knows this is the old app and shouldn't believe everything it says about Networks. You may decide different ways to change the behavior; the simplest way, in my opinion, is to add the Network if it is not already associated, and ignore it if it is. There is the weird possibility that a Member could be in Network Y and Network Z, and the arbitrary Network pulled to App A by _GetDetails is Network Y, and the user "thinks" that by changing the Member to Network Z, that they should no longer be associated with Network Z... but in fact, they will still be associated with both. Keep in mind that no matter what you do, the old apps that still think only one Network is valid will behave strangely depending on what the user has done and what they expect to see, in addition to the behavior that you decide is your preference (do you cater to the new app or the old app?). Here we'll assume we're going to favor the new app and let users discover on their own that the behavior in the old app is no longer exactly as they remembered:

ALTER PROCEDURE dbo.Member_Update
    @MemberID    INT,
    @EMail       VARCHAR(320),
    @NetworkID   INT = NULL,
    @NetworkList VARCHAR(MAX) = NULL
    -- , ...
AS
BEGIN
    UPDATE dbo.Members
    SET 
        EMail     = @EMail
    WHERE
        MemberID  = @MemberID;
        
    IF @NetworkID > 0 AND NOT EXISTS
    (
        SELECT 1
            FROM dbo.NetworkMembers
            WHERE NetworkID = @NetworkID
            AND   MemberID  = @MemberID
    )
    BEGIN
        INSERT dbo.NetworkMembers
        (
            NetworkID,
            MemberID
        )
        SELECT
            @NetworkID,
            @MemberID;
    END
    
    IF @NetworkList IS NOT NULL
    BEGIN
        DELETE dbo.NetworkMembers
            WHERE MemberID = @MemberID;
            
        INSERT dbo.NetworkMembers
        (
            NetworkID,
            MemberID
        )
        SELECT
            [Value],
            @MemberID
        FROM 
            dbo.SplitIntegers(@NetworkList);
    END       
END
GO

At this point, it is as good a time as any to migrate all of the data for existing Members to the new table. At this point, there should be one row for each Member, but soon the applications that you update will be able to associate a Member with multiple Networks.

INSERT dbo.NetworkMembers
(
    NetworkID,
    MemberID
)
SELECT
    NetworkID,
    MemberID
FROM
    dbo.Members;
    
UPDATE dbo.Members
    SET NetworkID = NULL;
    
-- and, when you're sure you are ready:
ALTER TABLE dbo.Members
    DROP COLUMN NetworkID;

That takes care of just about all of the issues involved with this data model change, again, assuming a fairly simple set of applications and business rules around the model. Not all changes can be deployed perfectly at a specific point in time, but with a little forethought, you should be able to introduce changes with a very small amount of interruption. With unlimited time, money and bodies, you can accomplish just about anything... unfortunately, in a lot of cases we are allowed to pick only two of those, and in some cases, just one.

Next Steps

Examine planned database changes and determine if they could be introduced with minimal intrusion.
If possible, split up your planned changes so that database modifications do not depend on application modifications.
Plan to push database changes ahead of application changes, if necessary.
Review the following tips and other resources:

About the author

Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Make your SQL Server database changes backward compatible when changing a relationship

Problem

Solution

Next Steps

About the author

Comments For This Article