By: Haroon Ashraf | Last Updated: 2018-11-19 | Comments | Integration Services Development
A SQL Server table was created long ago without a primary key probably as a result of an import. It began to get populated over time. As a SQL Server Developer, I have been tasked to automate the process of adding a primary key to this table (without removing duplicate rows) in such a way that the Primary Key column becomes the first column of the table and there should be no data loss during this operation.
This solution is based on the idea of creating a new table based on existing table with an additional auto increment id column set as the primary key and then copying the data from the existing table to the new table which has a primary key.
The script can be created and run manually, but we can automate the process by using Integration Services.
This tip is focused on creating a backup table with primary key followed by creating and running an Integration Services package to achieve the goal.
Adding SQL Server Primary Key to a Table using an SSIS Package
We are ready to add a primary key to a table which does not have a primary key.
This tip assumes that SQL Server Data Tools (SSDT) with the latest version of Visual Studio has already been installed and the readers are familiar with SQL Server Data Tools (SSDT).
The steps can be summarized as follows:
- Creating a sample database
- Creating a table without primary key in the sample database
- Populating the table
- Renaming the table as old table
- Creating a new table based on old table
- Adding primary key with identity column to the new table
- Creating an SSIS Package to copy data from old table to new table with primary key
- Automating the process through SSIS Package by running the SSIS Package
This is illustrated as follows:
Sample Database Setup
Open SQL Server Object Explorer (SSOX) and Add New Database called “University” or alternatively use the following script to create the sample database:
-- Creating Sample Database University CREATE DATABASE University
Let us use University database created earlier on to create the Course table.
Right click University and click New Query…:
Writing the following code and execute to create the Course table:
-- Creating course table CREATE TABLE [dbo].[Course]( [Name] VARCHAR(30) NOT NULL, [Detail] VARCHAR(200) NULL, ); GO
Populate the table by using the following script:
-- Populating course table without primary key INSERT INTO [dbo].[Course]([Name], [Detail]) VALUES(N'DevOps for Databases', N'This is about DevOps for Databases') INSERT INTO [dbo].[Course]([Name], [Detail]) VALUES(N'Power BI Fundamentals', N'This is about Power BI Fundamentals') INSERT INTO [dbo].[Course]([Name], [Detail]) VALUES(N'T-SQL Programming', N'About T-SQL Programming') INSERT INTO [dbo].[Course]([Name], [Detail]) VALUES(N'Tabular Data Modeling', N' This is about Tabular Data Modeling') INSERT INTO [dbo].[Course]([Name], [Detail]) VALUES(N'Analysis Services Fundamentals', N'This is about Analysis Services')
View the data:
Renaming Course Table
Right click on the Course table and click New Query and write and execute the following script to rename the existing table Course as Course_Old:
-- Renaming Course table as Course_Old EXEC sp_rename "Course","Course_Old"
Creating New Course Table with a Primary Key
Create a new table using the Query window and execute the following script against the University database:
-- Creating new Course table with primary key and auto increment identity column CREATE TABLE [dbo].[Course]( [CourseId] INT IDENTITY(1,1), [Name] VARCHAR(30) NOT NULL, [Detail] VARCHAR(200) NULL, CONSTRAINT [PK_Course] PRIMARY KEY([CourseID]) ); GO
Create a New SSIS Project
Create a new Integration Services Project and call it PrimaryKeyProject:
Adding Data Flow Task
Add a Data Flow Task and drag and drop the OLE DB Source and OLE DB Destination:
Configure Source and Destination
Configure database connection to point to the University database:
Select Course_Old table as the Source:
Join the OLE DB Source with the OLE DB Destination to create a pipeline:
Configure the OLE DB Destination component to point to a newly created Course table which contains the primary key:
Check the column mappings by clicking Mappings on the left side and please note that CourseId in the mappings will be ignored since it is only present in the destination table:
Running the SSIS Package
Run the SSIS Package to start the data transfer from old table to new table:
Go to SQL Server Object Explorer (SSOX) and right click Course and click View Data:
Congratulations! You have successfully added a primary key column (in the first column) to the table without losing any data.
Once you are certain all rows have been successfully copied to the new table you can delete the old table.
- Please try adding duplicate records in original course table and then use Sort transformation in SSIS Package to remove duplicate rows during the data transfer and see if you can successfully add primary key to a table which was originally created without primary key along with removing duplicate rows. (Hint: please refer to Removing Duplicates Rows withSSIS Sort Transformation tip)
- Please refer to my previous tip Adding a Primary Key to a Prepopulated Table using SQL Server Object Explorer in SSDT and try adding Student table and link it with Course table through CourseId and then add primary keys to both tables using the solution mentioned in this tip
- Please refer to my previous tip Developing Similar Structured Multi-Customer Databases with SQL Server Data Tools (SSDT) and try to create Client, Service and ServiceOrder tables without primary and foreign keys and then populate these tables and finally add primary and foreign keys using the method followed by creating and running SSIS Package in this tip
- Please refer to my previous tip SQL Server Reference Data Best Practices - Part 1 and try creating reference tables CarMake and CarType without primary keys then add primary key by implementing the approach mentioned in this tip.
Last Updated: 2018-11-19
About the author
Haroon Ashraf's interests are Database-Centric Architectures and his expertise includes development, testing, implementation and migration along with Database Life Cycle Management (DLM).
View all my tips