By: John Miner | Comments (3) | Related: > JOIN Tables
Problem
The famous computer scientist, E.F. Codd, published a paper in 1970 on the relational model of data. In this model, a tuple is an ordered set of data. A relationship can be defined between the tuples of data. For instance, a person lives at a certain address. The person and address tuples are related by a key. To query data between the two tuples, Codd proposed a relational algebra. The operations in this algebra are the basis for all table join types defined by modern databases.
Microsoft has finally updated their online documentation to cover both logical JOINS as well as how the database engine retrieves the data pages using physical JOINS. The types of joins supported in SQL Server are: INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN and CROSS JOIN. Please see the online documentation for more details.
In this brief tutorial, we will focus examples and discussion on the LEFT OUTER JOIN clause. This JOIN syntax can be used in the SELECT statement as well as the UPDATE and DELETE statements.
Solution
Today, I will show you how to create a set of tables in TEMPDB that can be used to test sample queries. I would not try this technique on a production server since you probably do not have rights. Also, the database administrators will be displeased if they find permanent tables in TEMPDB. On the other hand, if you are using the Development version of SQL Server on your laptop and do not want to create a new database for testing queries, this action is fine. Just remember, a reboot of your laptop recreates TEMPDB from scratch. Thus, the tables will be lost.
One appealing fact about a simple dataset is the ability to see the results of a query given a limited number of rows. I have used similar tables in the past when posting answers to questions on Stack Overflow. The complete set of Transact SQL examples for LEFT OUTER JOIN conditions is enclosed at the end of the article.
People Dataset
We will be modeling three tables in a sample database for the Golden Age Pets company. The purpose of this company is to pair up people in retirement homes with rescued CATS and DOGS. The code below can be re-executed anytime since the table is deleted and recreated. It is important to note that [TEMPDB].[DBO] prefixes the [PEOPLE] table. If you have the rights, you will be creating tables in TEMPDB.
Each table will be using effective dating. The start date denotes the birthday of a person. The end date denotes when a person dies at the retirement home. Given those dates, we know how long the person was around. Since we have the start (birth) date of a given person, it would be nice to automatically calculate a person’s age. This can be done with a computed column. I chose not to persist the data in the table. The PEOPLE_ID is the primary key constraint on this table.
-- Drop existing table DROP TABLE IF EXISTS [Tempdb].[dbo].[PEOPLE] GO -- Create new table CREATE TABLE [Tempdb].[dbo].[PEOPLE] ( [PEOPLE_ID] [int] NOT NULL, [PEOPLE_FIRST_NAME] [varchar](64) NOT NULL, [PEOPLE_LAST_NAME] [varchar](64) NOT NULL, [PEOPLE_ADDRESS_LINE] [varchar](128) NOT NULL, [PEOPLE_CITY] [varchar](64) NOT NULL, [PEOPLE_STATE] [varchar](32) NOT NULL, [PEOPLE_POSTAL_CODE] [varchar](16) NOT NULL, [PEOPLE_COUNTRY] [varchar](64) NOT NULL, [START_DATE] [date] NULL, [END_DATE] [date] NULL, [PERSONS_AGE] AS CASE WHEN START_DATE IS NULL THEN NULL ELSE DATEDIFF(YY, START_DATE, GETDATE()) END ) GO -- Add new primary key ALTER TABLE [Tempdb].[dbo].[PEOPLE] ADD CONSTRAINT [PK_TBL_PEOPLE] PRIMARY KEY CLUSTERED ([PEOPLE_ID] ASC) GO
Now that we have a table, we need to fill it with data. The Transact SQL statement below uses a table value constructor to pass multiple tuples of data to the INSERT statement.
-- Insert sample data INSERT [Tempdb].[dbo].[PEOPLE] ( [PEOPLE_ID], [PEOPLE_FIRST_NAME], [PEOPLE_LAST_NAME], [PEOPLE_ADDRESS_LINE], [PEOPLE_CITY], [PEOPLE_STATE], [PEOPLE_POSTAL_CODE], [PEOPLE_COUNTRY], [START_DATE], [END_DATE] ) VALUES (1, N'Gail', N'Erickson', N'9435 Breck Court', N'Bellevue', N'Washington', N'98004', N'United States','1943-12-10', NULL), (2, N'Diane', N'Margheim', N'475 Santa Maria', N'Everett', N'Washington', N'98201', N'United States','1949-09-10', NULL), (3, N'Dylan', N'Miller', N'7048 Laurel', N'Kenmore', N'Washington', N'98028', N'United States','1945-02-18', NULL), (4, N'Rob', N'Walters', N'5678 Lakeview Blvd.', N'Minneapolis', N'Minnesota', N'55402', N'United States','1946-10-01', NULL), (5, N'Ken', N'Sánchez', N'4350 Minute Dr.', N'Newport Hills', N'Washington', N'98006', N'United States','1936-04-01', '2020-09-15'), (6, N'Roberto', N'Tamburello', N'2137 Birchwood Dr', N'Redmond', N'Washington', N'98052', N'United States','1950-07-30', NULL), (7, N'Terri', N'Duffy', N'7559 Worth Ct.', N'Renton', N'Washington', N'98055', N'United States','1947-06-08', NULL); GO -- Show the data SELECT * FROM [Tempdb].[dbo].[PEOPLE]; GO
The very last statement in the code snippet above does a SELECT all the records in the PEOPLE table. This image below shows the output from the SQL Server Management Studio (SSMS). Seven people have been added to the table.
The next step is to create a PETS table.
Pets Dataset
The [PETS] table keeps track of the key information for each rescued pet. The T-SQL code in the following statements will re-create the given table every time from scratch. The PET_ID is the primary key constraint on this table.
-- Drop existing table DROP TABLE IF EXISTS [Tempdb].[dbo].[PETS] GO -- Create new table CREATE TABLE [Tempdb].[dbo].[PETS] ( [PET_ID] [int] NOT NULL, [PET_TYPE] [varchar](16) NOT NULL, [PET_NAME] [varchar](16) NOT NULL, [PET_BREED] [varchar](16) NOT NULL, [PET_GENDER] [varchar](16) NOT NULL, [PET_AGE] [numeric](2, 1) NOT NULL, [START_DATE] [date] NULL, [END_DATE] [date] NULL ) -- Add new primary key ALTER TABLE [Tempdb].[dbo].[PETS] ADD CONSTRAINT [PK_TBL_PETS] PRIMARY KEY CLUSTERED ([PET_ID] ASC) GO
Again, we will use the INSERT statement with a table value constructor to load the table with six pets.
-- Insert sample data INSERT [Tempdb].[dbo].[PETS] ( [PET_ID], [PET_TYPE], [PET_NAME], [PET_BREED], [PET_GENDER], [PET_AGE], [START_DATE], [END_DATE] ) VALUES (1, N'CAT', N'SAMMY', N'ABYSSINIAN', N'MALE', 2.5,'2020-01-01', NULL), (2, N'DOG', N'HENRY', N'AMERICAN BULLDOG', N'MALE', 1.3, '2020-02-01', NULL), (3, N'CAT', N'LUCY', N'BOMBAY', N'FEMALE', 4.0, '2020-03-01', NULL), (4, N'DOG', N'CLEO', N'AMERICAN PITBULL', N'FEMALE', 6.9, '2020-04-01', NULL), (5, N'CAT', N'BELLA', N'PERSIAN', N'FEMALE', 8.7, '2020-05-01', '2020-09-05'), (6, N'DOG', N'ZAC', N'BASSET HOUND', N'MALE', 4.0, '2020-06-01', NULL); GO -- Show the data SELECT * FROM [Tempdb].[dbo].[PETS]; GO
The image below shows the data in the PETS table. If you have been paying attention, some records have end dates. This means a pet is deceased. This condition also occurs in the PEOPLE table.
Own Dataset
To keep track of the owners of each pet, one must create a OWN table that bridges both the PEOPLE and PETS table. In real life, people own pets! An associative table is used to bridge two tables together. This allows for a N to M relationship. For instance, one person could own multiple pets. Therefore, the PEOPLE_ID will show up multiple times in the table. On the other hand, a pet could be owned by multiple people. I am not talking about time sharing the pet. But we are talking about people whose age is over 70 years old. Therefore, a person might die, and another resident might adopt the pet. In this scenario, the PET_ID will show up multiple times in the table.
To implement the table, we will drop and create the table each time the script is executed. Unlike other tables, we will not have a primary key. Instead, we will have two foreign keys. See the script below for details.
-- Drop existing table DROP TABLE IF EXISTS [Tempdb].[dbo].[OWN] GO -- Create new table CREATE TABLE [Tempdb].[dbo].[OWN] ( [PEOPLE_ID] [int] NOT NULL, [PET_ID] [int] NOT NULL, [START_DATE] [date] NULL, [END_DATE] [date] NULL ) ON [PRIMARY] GO -- Add foreign key 4 people ALTER TABLE [Tempdb].[dbo].[OWN] WITH CHECK ADD CONSTRAINT [FK_TBL_OWN_PEOPLE] FOREIGN KEY([PEOPLE_ID]) REFERENCES [Tempdb].[dbo].[PEOPLE] ([PEOPLE_ID]) GO -- Add foreign key 4 pets ALTER TABLE [Tempdb].[dbo].[OWN] WITH CHECK ADD CONSTRAINT [FK_TBL_OWN_PETS] FOREIGN KEY([PET_ID]) REFERENCES [Tempdb].[dbo].[PETS] ([PET_ID]) GO
Finally, we have to match up 6 of the people with 6 of the pets. This can be done by inserting data into the associative table.
-- Insert sample data INSERT INTO [Tempdb].[dbo].[OWN] ( [PEOPLE_ID], [PET_ID], [START_DATE], [END_DATE] ) VALUES (1, 1, '2020-01-08', NULL), (2, 2, '2020-02-09', NULL), (3, 3, '2020-03-03', NULL), (4, 4, '2020-04-11', NULL), (5, 5, '2020-05-15', '2020-09-05'), (6, 6, '2020-06-03', NULL) GO -- Show the data SELECT * FROM [Tempdb].[dbo].[OWN]; GO
The image below shows the records that bridge the PEOPLE and PETS tables.
To recap the design of the Golden Age Pets company, I decided to use a table to describe the tuples and relationships. In this table, PK stands for primary key and FK stands for foreign key. For more complex data models, I suggest you use an entity relationship model tool.
Table Name | Description | Constraints |
---|---|---|
PEOPLE | Retired people that want pets. | (PEOPLE_ID, PK) |
PETS | Pets to be rescued by people. | (PET_ID, PK) |
OWN | Associate people with pets | (PEOPLE_ID, FK), (PET_ID, FK) |
Normal Forms
I was first introduced to Database Design and Normalization when I was a Junior at the University of Rhode Island (URI). My professor at the time was Joan Peckham. We talked about the Codd’s 3 normal forms in that URI class many years ago. Again, Edgar F. Codd proposed these normal forms within his 1970 paper on the relational model. Use of these forms reduce data redundancy and promote data integrity. Today, I see professionals at various companies create bad database designs. Therefore, a review of these concepts is appropriate before we talk about table JOINS. If the table design is bad, then queries get more complex and convoluted.
In the first normal form, each column (attribute) of the table (tuple) must be atomic. The PEOPLE tuple below shows a person record violating first normal form. The first and last names are one column. Also, the address is one big string that is comprised of street, city, state and zip code. We need to break up this data into separate columns. What happens if we want to update just address and city? We have to read up, parse the string, make the changes and store the data. This is definitely a bad design.
PEOPLE (0NF)
Name | Address |
---|---|
Gail Erickson | 9435 Breck Court, Bellevue, WA, 98004 |
A better table design that follows first normal form (1NF) is shown below.
PEOPLE (1NF)
People Id | First Name | Last Name | Address | City | State | Zip Code |
---|---|---|---|---|---|---|
1 | Gail | Erickson | 9435 Breck Court | Bellevue | WA | 98004 |
Second normal form is dependent upon the fact that the table is already in first normal form. A sample PET tuple below observes the atomic rule. In second normal form (2NF), every non candidate attribute is dependent upon the whole of the candidate key (primary key). In our case, our case a surrogate key named id is being automatically generated. Almost all columns are associated with the candidate key. The adoption fee is functional depend upon the animal type. Thus, second normal form is violated in this example.
PETS (1NF)
Pet Id | Type | Name | Breed | Gender | Adoption Fee |
---|---|---|---|---|---|
1 | CAT | SAMMY | ABYSSINIAN | MALE | $50.00 |
3 | DOG | CLEO | AMERICAN PITBULL | FEMALE | $100.00 |
To fix this design, we can move the Adoption fee to another table. The example below shows the PET tuples without any adoption fee information. The adoption data has been moved to is own table to support second normal form. The type column can be called a natural key when working with PETS.
PETS (2NF)
Pet Id | Type | Name | Breed | Gender |
---|---|---|---|---|
1 | CAT | SAMMY | ABYSSINIAN | MALE |
3 | DOG | CLEO | AMERICAN PITBULL | FEMALE |
ADOPTION FEE (2NF)
Id | Type | Adoption Fee |
---|---|---|
1 | CAT | $50.00 |
2 | DOG | $100.00 |
Third normal form is dependent upon the fact that the table is already in second normal form. In third normal form (3NF), all non-prime attributes depend only on the candidate keys and do not have a transitive dependency on another key. The example below shows the PEOPLE table who own PETS. However, we have to duplicate data since we include pet id. This column has a transitive dependency on the PETS table. Therefore, it violates third normal form. To fix this problem, we must use an associative table to allow for N to M relationships.
PEOPLE (2NF)
People Id | Pet Id | First Name | Last Name | Address | City | State | Zip Code |
---|---|---|---|---|---|---|---|
1 | 1 | Gail | Erickson | 9435 Breck Court | Bellevue | WA | 98004 |
2 | 3 | Gail | Erickson | 9435 Breck Court | Bellevue | WA | 98004 |
The final diagrams below show all tuples (tables) in third normal form.
PEOPLE (3NF)
People Id | First Name | Last Name | Address | City | State | Zip Code |
---|---|---|---|---|---|---|
1 | Gail | Erickson | 9435 Breck Court | Bellevue | WA | 98004 |
PETS (3NF)
Pet Id | Type | Name | Breed | Gender |
---|---|---|---|---|
1 | CAT | SAMMY | ABYSSINIAN | MALE |
3 | DOG | CLEO | AMERICAN PITBULL | FEMALE |
OWN (3NF)
People Id | Pet Id | Start Date | End Date |
---|---|---|---|
1 | 1 | 2020-01-08 | NULL |
1 | 3 | 2020-03-03 | NULL |
ADOPTION FEE (3NF)
Id | Type | Adoption Fee |
---|---|---|
1 | CAT | $50.00 |
2 | DOG | $100.00 |
Simple SQL LEFT JOIN
A LEFT OUTER JOIN between tables A and B can be described as always taking all records from table A and returning any matching rows from table B. When table B has no records, NULL values are returned as a portion of the result set. The use of the key word OUTER is optional when describing the join. Let us now use this join to solve some basic questions about the Golden Age Pets company. How many people want a pet? There is a rule at the Golden Age Pets company; a retiree can only have one pet.
-- How many people want pets? SELECT P.PEOPLE_ID, P.PEOPLE_FIRST_NAME, P.PEOPLE_LAST_NAME, P.PERSONS_AGE FROM [Tempdb].[dbo].[PEOPLE] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PEOPLE_ID = O.PEOPLE_ID WHERE O.PEOPLE_ID IS NULL GO
The above T-SQL statement LEFT joins the PEOPLE left table to the OWN right table on the PEOPLE_ID. Any records on the right-hand side of the join with NULL ids are without a pet. Currently, Terri Duffy is the only person waiting for a pet. See the image below for output from the T-SQL statement.
Another question that a company representative might want to know is "How many people currently have a pet?". This query is just the opposite of the one above. We want to find all PEOPLE that have a matching OWN record. Therefore, we want to look for the IS NOT NULL condition.
-- How many people have pets? SELECT P.PEOPLE_ID, P.PEOPLE_FIRST_NAME, P.PEOPLE_LAST_NAME, P.PERSONS_AGE FROM [Tempdb].[dbo].[PEOPLE] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PEOPLE_ID = O.PEOPLE_ID WHERE O.PEOPLE_ID IS NOT NULL GO
The image below shows that 6 people have pets. But this answer is time dependent. Unless you worked at the retirement home, you might not have know that Bella, the Persian female cat died in early September of 2020. The owner, Mr. Sanchez was so heart broken that he died shortly after. The screen shot below shows people who owned peoples in 2020.
How can we get an accurate count of people who owned pets by a time given period? We need to use the effective dating columns that are part of the tables to narrow down our selection by time.
More Complex LEFT JOIN Examples
In this section, we will explore hypothetical queries that a company representative of the Golden Age Pets company might ask the IT department. His first question in June is "Who owned pets between January and May of 2020?".
This a philosophical debate on where to place additional JOIN logic that is not directly related to the relationship between the two tables. It can be placed in either the ON clause or WHERE clause. We want to use a date range filter on both the start and end date. A NULL end date means the record is currently active while a valid end date means the PERSON not longer OWNS the PET.
-- Pet owners between 01 Jan & 31 May 2020 SELECT P.PEOPLE_ID, P.PEOPLE_FIRST_NAME, P.PEOPLE_LAST_NAME, P.PERSONS_AGE, O.* FROM [Tempdb].[dbo].[PEOPLE] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PEOPLE_ID = O.PEOPLE_ID AND (O.[START_DATE] >= '2020-01-01') AND O.[START_DATE] < '2020-06-01') WHERE O.PEOPLE_ID IS NOT NULL GO
We can see that five people owned pets during this 5 month time period.
The same company representative visits the retirement home on September 15, 2020 and asks the same question about the number of pet owners. We just need to change the date range to find people who have owned pets year-to-date. The T-SQL statement filters data between the start of the year and today. The GETDATE() function is key to finding these dates.
-- Pet owners year to date SELECT P.PEOPLE_ID, P.PEOPLE_FIRST_NAME, P.PEOPLE_LAST_NAME, P.PERSONS_AGE FROM [Tempdb].[dbo].[PEOPLE] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PEOPLE_ID = O.PEOPLE_ID AND (O.[START_DATE] >= DATEADD(yy, DATEDIFF(yy, 0, GETDATE()), 0)) AND (O.[END_DATE] IS NULL OR O.[END_DATE] < GETDATE()) WHERE O.PEOPLE_ID IS NOT NULL GO
The image lists the owners of pets using a year-to-date logic. Mr. Sanchez only had Bella for a very short time. The company representative heard about the deaths and wants a report on any deceased PEOPLE or PETS.
The query below UNIONS the deceased records between the PEOPLE and PETS table.
-- Any deceased pets or owners SELECT P.PEOPLE_ID AS ID, P.PEOPLE_FIRST_NAME + ' ' + P.PEOPLE_LAST_NAME AS NAMES, P.PERSONS_AGE AS AGE FROM [Tempdb].[dbo].[PEOPLE] AS P WHERE YEAR(P.END_DATE) = 2020 UNION SELECT P.PET_ID AS ID, P.PET_NAME AS NAMES, P.PET_AGE AS AGE FROM [Tempdb].[dbo].[PETS] AS P WHERE YEAR(P.END_DATE) = 2020 GO
We can see that Bella was a very young CAT when Mr. Sanchez was a very old PERSON.
During the month of October a new pet was adopted by Terri Duffy. The two INSERT statements add data to both the PETS and OWN tables.
-- Add a new pet INSERT [Tempdb].[dbo].[PETS] VALUES (7, N'CAT', N'SHIRLEY', N'RUSSIAN BLUE', N'FEMALE', 1.5, '2020-10-05', NULL); -- Add a new owner INSERT INTO [Tempdb].[dbo].[OWN] VALUES (7, 7, '2020-07-07', NULL);
During December, the same company representative wants to know the number of PETS that were owned in 2020. In real life, there are probably more people who adopted pets either this year or prior years. This toy database does not have that data. However, we want to account for that data by using the correct in-equality in our expression.
-- How many pets & people in 2020 SELECT 'PEOPLE' AS THING, COUNT(*) AS TOTAL FROM [Tempdb].[dbo].[PEOPLE] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PEOPLE_ID = O.PEOPLE_ID AND (YEAR(O.START_DATE) <= 2020) AND (O.[END_DATE] IS NULL OR YEAR(O.[END_DATE]) = 2020) UNION SELECT P.PET_TYPE AS THING, COUNT(*) AS TOTAL FROM [Tempdb].[dbo].[PETS] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PET_ID = O.PET_ID AND (YEAR(O.START_DATE) <= 2020) AND (O.[END_DATE] IS NULL OR YEAR(O.[END_DATE]) = 2020) GROUP BY P.PET_TYPE GO
The image below shows we had 4 cats and 3 dogs who were owned by people at the retirement home. So far, we have been using an equal statement when joining the PEOPLE table to the OWN table on ID. This is called an equijoin. However, we can get some interesting results when using another inequality symbol.
Theta Join
Relational Algebra has defined a theta join as a join that uses an inequality symbol. Please refer to Codd’s paper for more details. The company representative wants to show prior adoptions pictures on a pin board to any new prospective owners. How can we write a T-SQL statement to find what prior adoptions were shown to each owner before they adopted their given pet? That is where a theta join comes in and some knowledge of the data. The order of the people datasets guarantees that PETS are given out in ascending order. The greater than sign is used to join the PEOPLE table to the OWN table.
-- Theta Join - Prior Pet Adoptions SELECT P.PEOPLE_ID, P.PEOPLE_FIRST_NAME, P.PEOPLE_LAST_NAME, O.PEOPLE_ID, O.PET_ID, A.PET_TYPE, A.PET_NAME, A.PET_GENDER FROM [Tempdb].[dbo].[PEOPLE] AS P LEFT JOIN [Tempdb].[dbo].[OWN] AS O ON P.PEOPLE_ID > O.PEOPLE_ID LEFT JOIN [Tempdb].[dbo].[PETS] AS A ON O.PET_ID = A.PET_ID ORDER BY P.PEOPLE_ID, O.PET_ID GO
Since Gail Erickson was the first person to adopt a pet from the Golden Age Pet company, she did not see any prior pet adoption pictures. We can see the number of pets seen by a given adopter increases as time progresses. Terri on the other hand saw all six pets on the board before she adopted Shirley.
In general, most JOINS use a equality symbol. However, it is important to note that a theta join can solve interesting questions.
Table Cleanup
The three key tables that we worked with are physically defined in TEMPDB. If this was a production system, a new database called GOLDEN_AGE_PETS would have been created. However, this is a toy schema designed for just this article. Like any good guest, one should cleanup TEMPDB before one leaves.
The following T-SQL script drops the tables used in the article from TEMPDB. The OWN table must be dropped first due to the foreign keys that relate it to the other two tables. This is a typical issue when DELETING data from tables that have referential integrity.
-- Drop existing table 3 DROP TABLE IF EXISTS [Tempdb].[dbo].[OWN] GO -- Drop existing table 1 DROP TABLE IF EXISTS [Tempdb].[dbo].[PEOPLE] GO -- Drop existing table 2 DROP TABLE IF EXISTS [Tempdb].[dbo].[PETS] GO
Unplanned User Update
The next two sections are dedicated to showing users how remote tables can be accessed when using data manipulation languages (DML) statements: INSERT, SELECT, UPDATE, and DELETE. Please note, we are using three part notation in the first T-SQL statement to reference our tables.
The Adventure Works DW database is a star schema design with a lot of relationships. To simulate an update that effects one percent of the records, we will need to drop or disable existing table constraints. The code below creates a derived table named R. This table contains the composite key plus a random number between 1 and 100. We are going to update the Sales Territory Key in the Fact Internet Sales table to zero for any records with a random value of 1.
-- Drop Foreign Key ALTER TABLE [AdventureWorksDW].[dbo].[FactInternetSales] DROP CONSTRAINT [FK_FactInternetSales_DimSalesTerritory]; GO -- Update - random 1% of records UPDATE F SET F.SalesTerritoryKey = 0 FROM [AdventureWorksDW].[dbo].[FactInternetSales] AS F LEFT JOIN ( SELECT [SalesOrderNumber], [SalesOrderLineNumber], CAST(RAND(CHECKSUM(NEWID())) * 100 as INT) + 1 AS RN FROM [AdventureWorksDW].[dbo].[FactInternetSales] ) AS R ON F.[SalesOrderNumber] = R.[SalesOrderNumber] AND F.[SalesOrderLineNumber] = R.[SalesOrderLineNumber] WHERE R.RN = 1 GO
The image below shows 1 percent of the data being randomly changed. The current query window session is in the Adventure Works 2019 database but we are updating data in the Adventure Works DW database. This update sets 614 records to have a zero for a sales territory id. An end user has reported this invalid change to the IT department.
The manager of the IT department asks you to backup the sales order number, sales order line number and sales territory key columns to a backup table named [BadRecords]. Again, three dot notation is used to execute commands in one database from another. I am assuming the [bac] schema already exists.
-- Save keys of unmatched records SELECT F.[SalesOrderNumber], F.[SalesOrderLineNumber], F.[SalesTerritoryKey] INTO [AdventureWorksDW].[bac].[BadRecords] FROM [AdventureWorksDW].[dbo].[FactInternetSales] AS F LEFT JOIN [AdventureWorksDW].[dbo].[DimSalesTerritory] AS D ON F.SalesTerritoryKey = D.SalesTerritoryKey WHERE D.[SalesTerritoryKey] IS NULL GO
The IT department manager knows there is a pristine copy of the records in an Azure SQL database table that is used for reporting. We will use this good copy of the data to replace the bad data. Another teammate has been asked to delete the 614 records from the original table. They are more familiar with the OPENROWSET and OPENDATASOURCE functions. Unlike three dot notation, the example below can be used with tables that are not located in the same database server or database flavor (Oracle). In this example, we are just pointing to the local database server by name. Again, we have to delete or disable another constraint for this command to work.
-- Drop the constraint ALTER TABLE [dbo].[FactInternetSalesReason] DROP CONSTRAINT [FK_FactInternetSalesReason_FactInternetSales] GO -- Delete records DELETE FROM F FROM OPENROWSET( 'SQLNCLI', 'Server=vm4sql19;Trusted_Connection=yes;', AdventureWorksDW.dbo.FactInternetSales ) AS F JOIN OPENDATASOURCE('SQLNCLI', 'Server=vm4sql19;Trusted_Connection=yes;').AdventureWorksDW.bac.BadRecords AS B ON F.[SalesOrderNumber] = B.[SalesOrderNumber] AND F.[SalesOrderLineNumber] = B.[SalesOrderLineNumber] GO
The command below uses the SQL Server client native (SQLNCLI) driver to create a trusted connection. The double from clause is required when referencing a table alias. With these commands that return tables, there is no other way to write this query. The good thing is that the correct number of rows are deleted from the [FactInternetSales] table. In the next section, we will pull down a good records from a table copy that resides in Azure SQL database.
Reversing The Update
I am assuming that you have a linked server named MyAzureDb which is setup with the correct credentials to access the Azure SQL database. This prior tip goes shows you how to setup this connection. When using a linked server, we need to use 4 part notation: database server, database, schema and table name. The T-SQL below inserts records into our local database table using the results of a full join between the azure database table and the bad records table.
-- Insert records from cloud to on-premise INSERT INTO AdventureWorksDW.dbo.FactInternetSales SELECT AF.* FROM MyAzureDb.dbs4advwrks.dbo.FactInternetSales AS AF JOIN AdventureWorksDW.bac.BadRecords AS WF ON AF.[SalesOrderNumber] = WF.[SalesOrderNumber] AND AF.[SalesOrderLineNumber] = WF.[SalesOrderLineNumber] GO
The output below that the correct number of records have replace the bad records. At this time, you could re-create the drop constraints to preserve the integrity of the database going forward.
Summary
Before we can talk about JOINS, we must have a solid understanding of relational database design. Edgar Codd’s paper on the relational model of data in 1970 is the basis for all modern-day databases. In that paper, tables are described as tuples and relationships are used to ensure integrity. Three normal forms of data can be used during the logical design of a database. The first normal form (1NF) states that each column (attribute) must be unique. Both the second and third normal forms assume that the previous form has been satisfied. The second normal form states that all non-primary key columns are dependent upon the primary key. Last but not least, the third normal form states that any non-primary key columns cannot have a transitive dependency with another table.
The Golden Age Pets schema was implemented in TEMPDB. This is a quick technique for developers who do not want to create a new database and who are testing out sample queries or designs. For production releases, this technique should not be used since TEMPDB is the workspace of the database engine. Even on a development server, make sure you cleanup after you are done prototyping.
Two types of LEFT JOINS were covered in today’s article. An equijoin is accomplished when the equal sign is used to join the tables on the relationship. Most of the queries seen in the wild fall into this category. However, a theta join can be used to solve interesting problems. This type of join uses an inequality sign when joining the tables on the relationship. We solve the prior adopt pets problem quite easily with this join type.
Remote data can be used with the LEFT JOIN clause. Either three dot notation (same server) or four dot notation (linked server) can be used when crafting queries. If you want to use other types of local databases, please look at the OPENROWSET or OPENDATASORCE functions. Today, we reversed an unwanted user action by replace the local bad records with remote good records. These queries were crafted using the above syntaxes.
Next Steps
- In this article, a complete coverage of the LEFT JOIN clause has been reviewed. Enclosed is the full T-SQL script used during this SQL tutorial.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips