SQL UPDATE from SELECT, JOIN or MERGE

By:   |   Updated: 2021-08-05   |   Comments   |   Related: More > TSQL


Problem

A SQL UPDATE query is used to alter, add, or remove data within some or all tuples in existing rows of a table. In the typical format, we usually update one tuple at a time in a table. We can also update multiple tuples at a time when we make use of the WHERE clause. Alternatively, we can update all the tuples in a column with a simple UPDATE statement. However, in the later scenario, all the rows will have the same value in that column. Now, what if you want to update all the tuples in one or more columns of a table with unique corresponding values in another table. Our simple UPDATE TABLE NAME SET COLUMN NAMES statement will not work in this scenario. Okay, actually it will, but it would be a long tedious script when you are working with a few hundred or a few thousand rows of data.

Solution

In this tutorial, we will explore three options that will update all the tuples in one or more columns with unique values that are stored in a separate table. In the first option, we will use the JOIN as a form of a SELECT statement and in the second option we will use the MERGE statement and finally, we will use a nested SELECT statement. Remember, you will need to make sure that the columns you are updating are of the same or compatible datatypes as the source columns. Example, you cannot put a VARCHAR value in an INT field.

Use a Test System

Always use a test server and a test database for testing and learning.  In my sample, I am using a test database named "AdventureWorks2014".

Getting a Test Database

For simplicity's sake and for keeping with a standard test database, we will be working with the AdventureWorks2014 database. If you already have this sample database installed, don’t worry, we will not be changing any of the tables or data. We will, however, create a couple of new tables along with a new schema to work with. Afterwards, we will simply dump both tables as well as the schema. (If you so choose.)

If you do not have the AdventureWorks2014 database installed already, you can get a backup (BAK) version for free. Once it’s downloaded, simply follow the basic steps to restore from a ".BAK" file.

Building our Test Tables

In the following query, we will create a schema named "Test" and two tables, "Test.Person" and "Test.PersonAddress".  We will also create a primary – foreign key constraint to link the two tables. Finally, we will populate the tables with data from the existing Person.Person and Person.Address tables provided in the Adventurworks2014 database. This would allow us to work with and manipulate the data without disturbing any tables that pre-exist in the database.  Here is the syntax we can run in SQL Server Management Studio (SSMS):

USE AdventureWorks2014;
GO
 
CREATE SCHEMA Test AUTHORIZATION dbo;
GO
 
CREATE TABLE Test.Person(
     BusinessEntityID INT NOT NULL
   , FirstName NVARCHAR(50)
   , LastName NVARCHAR(50)
   , City NVARCHAR(30)
   , PostalCode NVARCHAR(15)
   );
GO

ALTER TABLE Test.Person ADD CONSTRAINT PK_Person_BusinessEntityID PRIMARY KEY CLUSTERED(BusinessEntityID);
GO
 
INSERT INTO Test.Person
SELECT
     p.BusinessEntityID
   , p.FirstName
   , p.LastName
   , a.AddressID AS City
   , a.AddressID AS PostalCode
FROM Person.Person AS p
INNER JOIN Person.Address AS a ON p.BusinessEntityID = a.AddressID;
GO
 
CREATE TABLE Test.PersonAddress(
     BusinessEntityID INT
   , AddressID INT
   , City NVARCHAR(30)
   , PostalCode NVARCHAR(15)
   );
GO

ALTER TABLE Test.PersonAddress ADD CONSTRAINT FK_PersonAddress_AddressID FOREIGN KEY(AddressID)
REFERENCES Test.Person(BusinessEntityID);
GO
 
INSERT INTO Test.PersonAddress
SELECT 
     p.BusinessEntityID
   , a.AddressID
   , a.City
   , a.PostalCode
FROM Person.Person AS p
INNER JOIN Person.Address AS a ON p.BusinessEntityID = a.AddressID
ORDER BY BusinessEntityID;
GO

Results:

query results

Defining the issue

In our test environment, we want to update all the rows in the "City" and "PostalCode" columns in the "Test.Person" table with the correct and corresponding data from the "Test.PersonAddress" table. Since each tuple in each row is unique, we could use the standard form "update – set" method, but that would take too long since it will have to be done one row at a time. As luck would have it, we can accomplish this task with either the "JOIN" (technically known as the "INNER JOIN"), the "MERGE" or the "SELECT" commands as shown in the examples below. 

Update using the INNER JOIN Syntax

Using this method, we will update the values in the "City" and "PostalCode" columns of the customers table with the data held in the "City" and "PostalCode" columns of the "Test.PersonAddress" table. We will use the "BusinessEntityID" and "AddressID" columns as reference link between the two tables to keep our data organized.

USE AdventureWorks2014;
GO
 
UPDATE Test.Person
SET Test.Person.city = Addr.city, Test.Person.PostalCode = Addr.PostalCode
FROM Test.Person AS pers
INNER JOIN Test.PersonAddress AS addr ON pers.BusinessEntityID = addr.AddressID;
GO

Explaining how the INNER JOIN option works

The INNER JOIN clause links two or more tables by a relationship between two columns, one on each table. Both columns will have related (corresponding) data and compatible datatypes and often those columns will have a primary / foreign key relationship, although the keys are not necessary. In our sample tables, the common link is the "BusinessEntityID" column and the "AddressID" column.

With the SET keyword, we specified which columns in the target table we want updated and set them to equal the values found in the source table, matching the columns by aliases. Finally, we linked the source and target tables via an INNER JOIN by using the matching BusinessEntityID and AddressID columns respectively from each table.

We could take it a step further by adding a WHERE clause to filter out any columns from the referenced or updated tables. This will allow for updating only certain rows while leaving the other rows untouched. For example, if you only wanted to update the tuples in the Test.Person table that have a certain PostalCode or only in certain cities this would be specified in the WHERE condition.

Note: Instead of doing a simple SELECT statement to view the results, I’m adding an OFFSET / FETCH option to return a different set of data. The reason being that the first ten rows in the AdventureWorks2014 database will have the same city and postal code even though they have different "AddressID’s". So, I am returning rows 21 through 26 for the sake of clarity in the result set.

fyi iconGenerally, the OFFSET-FETCH can only be applied to an ordered set. A simple SELECT statement will not return an ordered set and consequently, we will need to add an ORDER BY clause. In the sample script below, I am applying the ORDER BY clause to the "BusinessEntityID" and "AddressID" columns (respectively) on each table.

Below is the SELECT query that I used to return this set of data. Feel free to alter the code to suit your needs.

USE AdventureWorks2014;
GO
 
SELECT *
FROM Test.Person
ORDER BY BusinessEntityID
OFFSET 20 ROWS
FETCH NEXT 6 ROWS ONLY;
GO
 
SELECT *
FROM Test.PersonAddress
ORDER BY AddressID
OFFSET 20 ROWS
FETCH NEXT 6 ROWS ONLY;
GO

Results:

query results

Update using the MERGE statement

The MERGE statement is used to manipulate (INSERT, UPDATE, DELETE) a target table by referencing a source table for the matched and unmatched rows. The MERGE statement can be very useful for synchronizing the target table with data from any source table that has the correct corresponding data and datatypes.

USE AdventureWorks2014;
GO
 
MERGE Test.Person AS Per 
USING (SELECT * FROM Test.PersonAddress) AS Addr   
ON Addr.AddressID=Per.BusinessEntityID 
WHEN MATCHED THEN
UPDATE SET  
  Per.PostalCode =Addr.PostalCode ,
  Per.city = Addr.city;
GO 

Explaining how the MERGE option works

We listed the Test.Person table just after the MERGE statement making it the target table. The USING statement declares our source table where we want to pull the new data from. Notice that we chose to select all (SELECT *), but we could have just as easily specified which columns to select by using (SELECT PostalCode, city) instead of the select all. In a production environment, the later would be the preferred method. For this test sample, we went with the simpler method of simply selecting all. Remember, just because you select all from a table, doesn’t mean you have to use all of the columns in your results. The ON statement sets the common denominator of the two tables using the BusinessEntityID and AddressID columns in each table respectively. Finally, when those match up, we update the target table with the corresponding values pulled from the source table by using the aliases assigned earlier in the script.

As in the previous example, "Update using the INNER JOIN method", we are going to use the "OFFSET-FETCH" clause to return only rows 21 through 26 for readability's sake.

USE AdventureWorks2014;
GO
 
SELECT *
FROM Test.Person
ORDER BY BusinessEntityID
OFFSET 20 ROWS
FETCH NEXT 6 ROWS ONLY;
GO
 
SELECT *
FROM Test.PersonAddress
ORDER BY AddressID
OFFSET 20 ROWS
FETCH NEXT 6 ROWS ONLY;
GO

Results:

query results

Update using the SELECT Subquery Statement

A subquery is a query that is nested inside another query. This allows for multiple SELECT statements within one script to run in the order they are listed. It also allows for fine tuning exactly what you want the select statements to do and how they should behave.

USE AdventureWorks2014;
GO
 
UPDATE Test.Person
SET Person.PostalCode =(SELECT PersonAddress.PostalCode
                         FROM Test.PersonAddress                           
                         WHERE PersonAddress.AddressID = Person.BusinessEntityID);
GO

UPDATE Test.Person
SET Person.city =(SELECT PersonAddress.city
                   FROM Test.PersonAddress                     
                   WHERE PersonAddress.AddressID = Person.BusinessEntityID);
GO

Explaining how the SELECT subquery statement works

This one is a relatively easy script to follow. One thing you will notice that is different in this sample as opposed to the previous two, is this one is actually two different scripts that are independent of each other. Each one updating the values of a single column. Unfortunately, SQL does not allow more than one SET condition per query. The UPDATE statement lists the target table, the SET statement specifies the column on that target table and the SELECT statement lists the source table and column. In this scenario, we use a WHERE clause to link the two tables via a common column.

Again, we are going to use the "OFFSET-FETCH" clause to return only rows 21 through 26 for readability's sake.

USE AdventureWorks2014;
GO
 
SELECT *
FROM Test.Person
ORDER BY BusinessEntityID
OFFSET 20 ROWS
FETCH NEXT 6 ROWS ONLY;
GO
 
SELECT *
FROM Test.PersonAddress
ORDER BY AddressID
OFFSET 20 ROWS
FETCH NEXT 6 ROWS ONLY;
GO

Results:

query results

Performance Results Compared

Option 1 – INNER JOIN

A look at the over-all execution plan for the JOIN option. This returns a simple, almost straight forward flow with the exception of the offset "Table Scan" with a cost of 5%.

query plan for inner join

Option 2 – MERGE

query plan for merge

Option 3 – SELECT (Sub-Query)

Notice that this returns in two parts. Query 1 displays the query plan for the first SELECT statement, and Query 2 for the second SELECT statement.

query plan for subquery

A quick look at the performance cost just on the "Clustered Index Scan" for each of these methods after updating the City and PostalCode columns. We can see that the cost is 6% for the JOIN option, 9% for the MERGE option and 2% for each step in the SELECT option. The combined cost of the two steps within the SELECT option, totaling 4%, is the lowest clustered index scan of the three options while the MERGE options 9% being the highest resource hog in that respect.

Cleanup (Optional)

Now for the cleanup process if you choose to do so. You are certainly welcome to keep these new tables and the corresponding schema for future testing. Should you decide to remove them, the following block of code will remove the tables and schema from your AdventureWorks2014 database.

USE AdventureWorks2014;
GO

DROP TABLE Test.PersonAddress; 
DROP TABLE Test.Person; 
DROP SCHEMA Test; 

fyi iconThe two "DROP TABLE" commands must be run in the order listed in the code block above. This is due to the Primary-Foreign key constraint between the two tables. Also, you cannot drop the schema while there are one or more objects using/referencing that schema. In this scenario, the two tables are Test.Person and Test.PersonAddress.

Summary

While any of these three options listed above are excellent ways of updating multiple rows at a time with distinct data, you the user, will find yourself leaning more toward one than another. So, go with what is comfortable for you. However, you also may want to consider the performance cost of the option you choose from this SQL tutorial. On this simple example, the performance cost is minimal regardless of which option you select. On a larger database, it could be a resource hog.

Generally, like with many aspects of SQL, it’s a best practice to pick one method and stick with it across the board - not just for yourself, but for all co-workers as well. The company as a whole, should implement the same usage/style practices to keep some degree of order on how things are accomplished.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Aubrey Love Aubrey Love has been a Database Administrator for about 8 years and is currently working as a Microsoft SQL Server Business Intelligence specialist.

View all my tips


Article Last Updated: 2021-08-05

Comments For This Article

















get free sql tips
agree to terms