Make your SQL Server database changes backward compatible when changing a relationship
By: Aaron Bertrand | Comments | Related: 1 | 2 | 3 | 4 | More > TSQL
Let's face it: requirements change. There is usually a lot of churn during the design and initial development stages, but changes can happen to mature applications, too. The key is to introduce those changes with the least amount of effort and risk. One change I had to deal with recently was changing a 1 : many relationship to a many : many relationship. This meant changing procedures to both accept and return multiple values representing a relationship, when previously they only had to deal with one.
In case you haven't sensed a growing theme here, my approach to changes is to ensure that database changes can be made in advance of the surrounding applications. I do this both as a selfish way to preserve control over the model, and because I know that the applications won't necessarily be updated on the same schedule.
Up front, I'll state my typical disclaimer that I am treating a fairly simple case here... the system does not have any replication, that the tables are not subject to BULK INSERT or other ad hoc techniques, that all data access is enforced through stored procedures, and that a policy to always explicitly name stored procedure parameters is being followed. So in this case, changing the tables to support a different type of relationship can be handled in a fairly controlled manner. Not all scenarios are like this, so I urge you to fully understand all of the ways your applications interface with the database before deciding how to approach such a change.
With that out of the way, let's get more specific. First, what kind of entities are we talking about here? Let's imagine you have a social networking site, and a table that contains information about Members. There is also a table that represents the Networks that a Member could belong to. When the site was first launched, Members could pick a Network (and only one Network). Here is how the core parts of the schema might look (showing only the relevant columns/parameters, and leaving out error handling for brevity):
CREATE TABLE dbo.Networks ( NetworkID INT IDENTITY(1,1) PRIMARY KEY, Name NVARCHAR(64) NOT NULL UNIQUE -- , ... ); GO CREATE TABLE dbo.Members ( MemberID INT IDENTITY(1,1) PRIMARY KEY, EMail VARCHAR(320) NOT NULL UNIQUE, NetworkID INT NOT NULL FOREIGN KEY REFERENCES dbo.Networks(NetworkID) -- , ... ); GO CREATE PROCEDURE dbo.Member_Create @EMail VARCHAR(320), @NetworkID INT -- , ... @MemberID INT OUTPUT AS BEGIN INSERT dbo.Members ( EMail, NetworkID -- , ... ) SELECT @EMail, @NetworkID -- , ... ; SELECT @MemberID = SCOPE_IDENTITY(); END GO CREATE PROCEDURE dbo.Member_Update @MemberID INT, @EMail VARCHAR(320), @NetworkID INT -- , ... AS BEGIN UPDATE dbo.Members SET EMail = @EMail, NetworkID = @NetworkID WHERE MemberID = @MemberID; END GO CREATE PROCEDURE dbo.Member_GetDetails @MemberID INT AS BEGIN SELECT EMail, NetworkID -- , ... FROM dbo.Members WHERE MemberID = @MemberID; END GO
Clearly the current schema enforces that each Member can belong to one and only one Network. If we change the requirement so that a Member can belong to multiple (and again, in the spirit of keeping it simple, unlimited) Networks, we would need to introduce a mapping table that allows a many-to-many relationship to exist. We will also need to make the NetworkID column nullable, since the stored procedures will start using the mapping table for the data instead; once all of the stored procedures have been changed and the existing data has been migrated, we can drop it entirely.
CREATE TABLE dbo.NetworkMembers ( NetworkID INT NOT NULL FOREIGN KEY REFERENCES dbo.Networks(NetworkID), MemberID INT NOT NULL FOREIGN KEY REFERENCES dbo.Members(MemberID), PRIMARY KEY (NetworkID, MemberID) ); GO ALTER TABLE dbo.Members ALTER COLUMN NetworkID INT NULL FOREIGN KEY REFERENCES dbo.Networks(NetworkID);
In order to handle a virtually unlimited number of Networks when creating or updating a Member, you will need to decide how to pass more than one value into the _Create and _Update stored procedures. Personally, I prefer a comma-separated list that is split up using a Split() function (see this blog post for some background on splitting integers, and if you're on SQL Server 2008 or above, don't forget to consider Table-Valued Parameters). How you handle this (and the function itself) is beyond the scope of this tip, but I'm going to assume you have such a function, and that you're not going to add parameters like @Network1, @Network2, @Network3, etc. So given that, we can make the @NetworkID parameter optional, and add another optional parameter called @NetworkList. For the _Create procedure, it is a simple insert into the NetworkMembers table; for the _Update case, since you don't have an easy way to tell which Networks have been added/removed/unchanged, we'll just delete the entire set of Networks for the Member and re-insert the new list. If you are on SQL Server 2008, you can look into using the MERGE statement, which will be more efficient... especially if Members will belong to many Networks.
ALTER PROCEDURE dbo.Member_Create @EMail VARCHAR(320), @NetworkID INT = NULL, @NetworkList VARCHAR(MAX) = NULL -- , ... @MemberID INT OUTPUT AS BEGIN INSERT dbo.Members ( EMail -- , ... ) SELECT @EMail -- , ... ; SELECT @MemberID = SCOPE_IDENTITY(); DECLARE @Networks TABLE (NetworkID INT); IF @NetworkID IS NOT NULL BEGIN INSERT @Networks SELECT @NetworkID; END IF @NetworkList IS NOT NULL BEGIN INSERT @Networks SELECT [Value] FROM dbo.SplitIntegers(@NetworkList); END INSERT dbo.NetworkMembers ( NetworkID, MemberID ) SELECT [Value], @MemberID FROM @Networks; END GO ALTER PROCEDURE dbo.Member_Update @MemberID INT, @EMail VARCHAR(320), @NetworkID INT = NULL, @NetworkList VARCHAR(MAX) = NULL -- , ... AS BEGIN UPDATE dbo.Members SET EMail = @EMail, NetworkID = NULL WHERE MemberID = @MemberID; DECLARE @Networks TABLE (NetworkID INT); IF @NetworkID IS NOT NULL BEGIN INSERT @Networks SELECT @NetworkID; END IF @NetworkList IS NOT NULL BEGIN INSERT @Networks SELECT [Value] FROM dbo.SplitIntegers(@NetworkList); END DELETE dbo.NetworkMembers WHERE MemberID = @MemberID; INSERT dbo.NetworkMembers ( NetworkID, MemberID ) SELECT [Value], @MemberID FROM @Networks; END GO
In a simplistic case, you will only have one application that calls the stored procedures directly, or all of your applications use a common data layer (and in the latter case, the data layer will need to compensate for the different applications and what parameters to expect). So you won't have to worry about some applications passing / retrieving only one Network, and other applications passing / retrieving multiple Networks.
In a more complex case, you may have multiple applications that call the stored procedures directly, and a situtation that might occur is one application (let's say App A) is updated to understand multiple Networks, and another app (App B) is still using the old one Network mindset. It can get complicated when App A updates a Member to be in multiple Networks:
- Which Network will App B see in the _GetDetails procedure?
- What will happen when App B updates Member using a single Network (since that's all it knows)?
When App A calls the _GetDetails stored procedure, it should be updated to retrieve a second resultset: the *set* of Networks that a Member belongs to. When App B calls _GetDetails, if the Member has been updated to belong to multiple networks, then we can just pick an arbitrary NetworkID for now (and we'll deal with what happens when the Member then gets updated by App B shortly). With the _Create and _Update procedures, we can tell whether App A or App B is calling the procedure, because either @NetworkID will be populated or @NetworkList will be populated. However for _GetDetails, we don't have a parameter that tells us which App is being called, and we can't assume that we can make a minor change to the old application by adding a parameter. So, in order to make the procedure functionality different depending on whether it is App A or App B making the call, you have two choices: (1) add an optional parameter to signal that the newer functionality should be used, and update App A to pass the parameter, or (2) create separate procedures. I prefer (1) because it is less messy, easier to revert to non-forked behavior once the old functionality has been deprecated, and because I never have a good answer for the question, "What do you call the new copy of Member_GetDetails? Member_GetDetails_2?"
So let's see what Member_GetDetails will look like depending on which application has made the call (we'll use a new, optional parameter called "@MultipleNetworks" to provide the fork, and App A should be updated to pass this parameter in):
ALTER PROCEDURE dbo.Member_GetDetails @MemberID INT, @MultipleNetworks BIT = 1 AS BEGIN SELECT EMail, NetworkID = CASE WHEN @MultipleNetworks = 1 THEN NULL ELSE ( SELECT TOP 1 ArbitraryNetworkID = NetworkID FROM dbo.NetworkMembers WHERE MemberID = @MemberID ) END -- , ... FROM dbo.Members WHERE MemberID = @MemberID; IF @MultipleNetworks = 1 BEGIN SELECT NetworkID FROM dbo.NetworkMembers WHERE MemberID = @MemberID; END END GO
Again, this will cause a problem if App B has retrieved an arbitrary NetworkID, and then attempts to save changes (whether or not it has changed the NetworkID). We'll need to add some logic to the _Update procedure so that it knows this is the old app and shouldn't believe everything it says about Networks. You may decide different ways to change the behavior; the simplest way, in my opinion, is to add the Network if it is not already associated, and ignore it if it is. There is the weird possibility that a Member could be in Network Y and Network Z, and the arbitrary Network pulled to App A by _GetDetails is Network Y, and the user "thinks" that by changing the Member to Network Z, that they should no longer be associated with Network Z... but in fact, they will still be associated with both. Keep in mind that no matter what you do, the old apps that still think only one Network is valid will behave strangely depending on what the user has done and what they expect to see, in addition to the behavior that you decide is your preference (do you cater to the new app or the old app?). Here we'll assume we're going to favor the new app and let users discover on their own that the behavior in the old app is no longer exactly as they remembered:
ALTER PROCEDURE dbo.Member_Update @MemberID INT, @EMail VARCHAR(320), @NetworkID INT = NULL, @NetworkList VARCHAR(MAX) = NULL -- , ... AS BEGIN UPDATE dbo.Members SET EMail = @EMail WHERE MemberID = @MemberID; IF @NetworkID > 0 AND NOT EXISTS ( SELECT 1 FROM dbo.NetworkMembers WHERE NetworkID = @NetworkID AND MemberID = @MemberID ) BEGIN INSERT dbo.NetworkMembers ( NetworkID, MemberID ) SELECT @NetworkID, @MemberID; END IF @NetworkList IS NOT NULL BEGIN DELETE dbo.NetworkMembers WHERE MemberID = @MemberID; INSERT dbo.NetworkMembers ( NetworkID, MemberID ) SELECT [Value], @MemberID FROM dbo.SplitIntegers(@NetworkList); END END GO
At this point, it is as good a time as any to migrate all of the data for existing Members to the new table. At this point, there should be one row for each Member, but soon the applications that you update will be able to associate a Member with multiple Networks.
INSERT dbo.NetworkMembers ( NetworkID, MemberID ) SELECT NetworkID, MemberID FROM dbo.Members; UPDATE dbo.Members SET NetworkID = NULL; -- and, when you're sure you are ready: ALTER TABLE dbo.Members DROP COLUMN NetworkID;
That takes care of just about all of the issues involved with this data model change, again, assuming a fairly simple set of applications and business rules around the model. Not all changes can be deployed perfectly at a specific point in time, but with a little forethought, you should be able to introduce changes with a very small amount of interruption. With unlimited time, money and bodies, you can accomplish just about anything... unfortunately, in a lot of cases we are allowed to pick only two of those, and in some cases, just one.
- Examine planned database changes and determine if they could be introduced with minimal intrusion.
- If possible, split up your planned changes so that database modifications do not depend on application modifications.
- Plan to push database changes ahead of application changes, if necessary.
- Review the following tips and other resources:
- Making your database changes backward compatible: Adding a new column
- Making your database changes backward compatible: Dropping a column
- Making your database changes backward compatible: Renaming an entity
- Inserting, Updating, and Deleting Data by Using MERGE
- Table-Valued Parameters
- Erland Sommarskog's Arrays and Lists in SQL Server
About the author
View all my tips