Problem One nice feature of SQL Server that is used quite a bit is the use of identity columns. This function gives you a simple way of creating a unique value for every row in your table. Adding a new column and making it an identity column is an easy thing to do as well as dropping an existing column that is an identity column, but how can you modify an existing column to make it an identity column or remove the identity property from an existing column?
Solution Not sure how much you have researched this one so far, but there is no easy way to do this. By design there is no simple way to turn on or turn off the identity feature for an existing column. The only clean way to do this is to create a new column and make it an identity column or create a new table and migrate your data.
Let's take a look at a few examples:
Here is a simple table that has two columns and one column is the identity column.
If we use SQL Server Management Studio to get rid of the identity value on column "id", a new temporary table is created, the data is moved to the temporary table, the old table is dropped and the new table is renamed. This can be seen in the script below.
To get this script use Management Studio to make the change and then right click in the designer and select "Generate Change Script".
/* To prevent any potential data loss issues, you should review this script in detail before running it outside the context of the database designer.*/ BEGIN TRANSACTION SET QUOTED_IDENTIFIER ON SET ARITHABORT ON SET NUMERIC_ROUNDABORT OFF SET CONCAT_NULL_YIELDS_NULL ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON COMMIT BEGIN TRANSACTION GO CREATE TABLE dbo.Tmp_Test1 ( id INT NOT NULL, name NCHAR(10) NULL ) ON [PRIMARY] GO IF EXISTS(SELECT * FROM dbo.Test1) EXEC('INSERT INTO dbo.Tmp_Test1 (id, name) SELECT id, name FROM dbo.Test1 WITH (HOLDLOCK TABLOCKX)') GO DROP TABLE dbo.Test1 GO EXECUTE sp_rename N'dbo.Tmp_Test1', N'Test1', 'OBJECT' GO COMMIT
If we make this example a little more complicated, by having a primary key and creating a second table with a foreign key constraint referencing back to the first table we can see that even more work needs to be done.
CREATE TABLE [dbo].[Test2]( [id] [int] NULL, [name2] [nchar](10) NULL ) ON [PRIMARY] GO
ALTER TABLE [dbo].[Test2] WITH CHECK ADD CONSTRAINT [FK_Test2_Test1] FOREIGN KEY([id]) REFERENCES [dbo].[Test1] ([id]) GO
ALTER TABLE [dbo].[Test2] CHECK CONSTRAINT [FK_Test2_Test1] GO
If we do the same thing and use Management Studio to get rid of the identity value on column "id" in table "test1" and script out the change, we can see that even more steps need to take place.
First a temp table "Tmp_Test1" is created with the correct column attributes
The data is moved to "Tmp_Test1" from "Test1"
The FK constraint on "Test2" is dropped
Table "Test1" is dropped
Table "Tmp_Test1" is renamed to "Test1"
The primary key is created on table "Test1"
And lastly the FK constraint is recreated on table "Test2". That's a lot of steps.
/* To prevent any potential data loss issues, you should review this script in detail before running it outside the context of the database designer.*/ BEGIN TRANSACTION SET QUOTED_IDENTIFIER ON SET ARITHABORT ON SET NUMERIC_ROUNDABORT OFF SET CONCAT_NULL_YIELDS_NULL ON SET ANSI_NULLS ON SET ANSI_PADDING ON SET ANSI_WARNINGS ON COMMIT BEGIN TRANSACTION GO CREATE TABLE dbo.Tmp_Test1 ( id INT NOT NULL, name NCHAR(10) NULL ) ON [PRIMARY] GO IF EXISTS(SELECT * FROM dbo.Test1) EXEC('INSERT INTO dbo.Tmp_Test1 (id, name) SELECT id, name FROM dbo.Test1 WITH (HOLDLOCK TABLOCKX)') GO ALTER TABLE dbo.Test2 DROP CONSTRAINT FK_Test2_Test1 GO DROP TABLE dbo.Test1 GO EXECUTE sp_rename N'dbo.Tmp_Test1', N'Test1', 'OBJECT' GO ALTER TABLE dbo.Test1 ADD CONSTRAINT PK_Test1 PRIMARY KEY CLUSTERED ( id ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO COMMIT BEGIN TRANSACTION GO ALTER TABLE dbo.Test2 ADD CONSTRAINT FK_Test2_Test1 FOREIGN KEY ( id ) REFERENCES dbo.Test1 ( id ) ON UPDATE NO ACTION ON DELETE NO ACTION
The same holds true if we want to change an existing column and make one of the columns an identity column. This scenario is probably less likely, but there may be a need.
Another approach would be to add a new column and make it an identity column or add a new column without the identity property and migrate the data from the old column to the new column. After that you could drop the old column and then rename the column using the sp_rename stored procedure. If you have indexes, foreign keys and other constraints on these columns you would still need to drop these prior to making the changes, so this approach is not much faster.
As you can see there really is not any easy way to do this. There are some other approaches that you can find on the internet that modify values in the system tables. These approaches do work, but if you make a mistake you could totally mess up your data, so make sure you understand what is in store before modifying system tables.
I wish there was a simpler way of turning on and turning off the identity property. For small tables or for databases that are not that busy this approach works without much issue. But if you have large tables or your databases are very busy it is kind of hard to drop constraints and tables on the fly like this. I guess we will have to wait and see if there are any changes in SQL 2008.
Now that we have seen there is no easy way to do this, keep this in mind when designing your tables. Identity columns are great, but if you do need to change one it could be more painful then you think
Try to avoid using identity columns as your primary key and for foreign key constraints just for this reason. I know once you start using them it is difficult to stop using them, but be aware of potential issues you may face down the line.
Last Update: 1/4/2008
About the author
Greg Robidoux is the President of Edgewood Solutions and a co-founder of MSSQLTips.com.
I think this statement is a good topic for discussion
to avoid using identity columns as your primary key and for foreign key
constraints just for this reason. I know once you start using them it
is difficult to stop using them, but be aware of potential issues you
may face down the line.
I use identities for PKs and FK's I would expect that having natural keys would lead to even more headaches if natural keys in the database end up changing, no? You'd have to promote any changes through all child tables. I think of a customer table or an employee table. A company or a person who gets married can change their name which leads to cascading headaches with child tables. In addition, if I need my inserts to be fast, I'd prefer the table had a clustered identity PK as opposed to a clustered natural key to minimize the overhead of page splits and fragmentation. Applying a FILL FACTOR could help but then queries have to read more pages than necessary. I'm not sure if I agree with this statement as being a general rule - to me, it seems too broad.
I've never had the need to remove an identity but I suspect its removal would be due more to a flaw in the initial physical modeling.
As far as modifying system tables, while it can be done in SQL Server 2000 it can't be done in SQL Server 2005. Well, actually, it can but it's a secret. I was at Microsoft for some perf testing a year ago and the response to my question on modifying 2005 system tables was something along the lines of "We could tell you, but then we'd have to kill you". In general, modifying SQL Server system tables is not a good idea because (a) it'll cost you money to fix anything you screw up and (b) whatever you screw up could end up really corrupting your database big time.
Couple of points here. If you have enterprise edition, you have a method to change the identity property without much data movement. (Ofcourse you may have to drop and create the foreign keys of other tables referencing this Identity column.)
On question of whether we should have identity column for PK or not, we can argue for both sides. I have seen a lot of benifits of identity during inserts and updates as it reduces the new pages for inserts and page splits. Also, oftena complex multi column clustered index could be replaced with a PK and it reduces space usage. When the space needed for PK reduces it allows the non clustered indexes to be smaller, and makes the index scans, seeks to be faster.
I have also seen the downside of it. one of the major issue isssue is you will not know the value until you insert it. It needs an additional traffic over the network, and complex queries may suffer due to this.
I have used Identity from my first project with SQL Server 6.0, and I am happy about it. (Ofcourse I had a headache with Identity on those days, partially as I was new to this) So far, I have seen identity is a great option which gives the control to the designer.
Finally, one standard question: If you want to insert data, with an explicit identity value, you can do that. Why on earth you want to change the identity property? Is it because you of bad design?
Saturday, September 06, 2008 - 8:04:47 PM - aprato
I had a customer accidentally delete some child rows involved in a PK-FK relationship from their database. I had to take an older backup and reconstruct the rows. To do that, I had to re-insert them using IDENTITY_INSERT ON
Is your "Child" table work as a parent to another table?
If so, how can a person delete the records?
If not, why bother about the value of the column. It is inserted automatically, and you don't have to worry anythign about it. You can enter any value.
In case if your child record is some other database, or you dont maintain referential integrity, do you allow users to delete rows directly?
do the application take care of this validations, violations?
Apart from the fact that you need to type additional words to insert the rows, is there any problem with the row?
Let me give you a scenario, where you need Identity column: If you want to create full text index on a table, you need the table with unique, non nullable, single column index. Identity column is a good candidate to create such an index.
Yes, it was a parent to another table that was part of a DELETE CASCADE FK.
The delete occurred to a bug in the application code. I'll just summarize and say that these rows shouldn't have been deleted because a business rule was not coded correctly in the logic tier. I wanted to re-create these rows as they originally appeared rather than re-generate new identities.
I use identities as a clustered PK index on my high volume tables so I can avoid the expense that page splitting causes. I then leverage some of my key non-clustered indexes as covering indexes (and in some cases use included columns) so I can leverage the fact that the engine logically sorts the index data in key order. This way, if I have a query that requires a sort can use the covering index.
For instance, my model has a table that stores user scores when he/she takes an exam. The key reporting query wants the user's name, the exam name, and the score achieved sorted by date. Since tens of thousands of users are taking exams at any one time, I want my score inserts fast so I use the identity as the clustered index and I have an index on (userid, date_taken) include (exam_name, score). It works really well and is very fast.
[quote user="aprato"]Since tens of thousands of users are taking exams at any one time, I want my score inserts fast so I use the identity as the clustered index[/quote]
I totally agree. This is one of the big points of Identity. As consequent inserts go in the same page. this approach reduces the unnecessary page loads.
Additionally, any data created by user is subjected to errors. People may make spelling mistakes and some of them will be discovered much much later. Keeping them as Primary key duplicates the error into all referenced tables and they may appear before clients, and unless you do some costly cascade operations, they will not be eliminated. identity column eliminates this risk completely.
I'm sorry, I should've added that I needed these to appear as they originally did because we have a trigger action that inserts into an audit table that's used by a separate system that does data warehousing. This table stores the id of the row and the action (I,U,X- for Insert, Update, Delete).
Thursday, April 08, 2010 - 1:56:52 PM - peterruselmalinsvig
If I want to add a column to a millions records table, and also need to update the value of that new column. That table need to be able to access all the time. What would you recommend? Beside do the alter table and run the update command. Thank you.
Monday, November 15, 2010 - 5:28:56 PM - Greg Robidoux
You could do an update in several steps. Instead of doing one large update you could do the update in increments of 1000 rows or something like that. You could use the primary key or clustered index to use as the key to determine what group of rows to update. This way the locking could be done in smaller chunks instead of locking the entire table for the entire update duration.