Delete duplicate rows with no primary key on a SQL Server table

By:   |   Comments (47)   |   Related: > TSQL


Problem

Every once in awhile a table gets created without a primary key and duplicate records get entered. The problem gets even worse when you have two identical rows in the table and there is no way to distinguish between the two rows. So how do you delete the duplicate record?

Solution

One option that SQL Server gives you is the ability to set ROWCOUNT which limits the numbers of records affected by a command. The default value is 0 which means all records, but this value can be set prior to running a command. So let's create a table and add 4 records with one duplicate record.

Create a table called duplicateTest and add 4 records.

CREATE TABLE dbo.duplicateTest 
( 
[ID] [int] , 
[FirstName] [varchar](25), 
[LastName] [varchar](25)  
) ON [PRIMARY] 

INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith') 
INSERT INTO dbo.duplicateTest VALUES(2, 'Dave','Jones') 
INSERT INTO dbo.duplicateTest VALUES(3, 'Karen','White') 
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')

If we select all data we get the following:

SELECT * FROM dbo.duplicateTest
ID FirstName LastName
1 Bob Smith
2 Dave Jones
3 Karen White
1 Bob Smith

If we try to select the record for Bob Smith will all of the available values such as the following query:

SELECT * FROM dbo.duplicateTest WHERE ID = 1 AND FirstName = 'Bob' AND LastName = 'Smith'

We still get 2 rows of data:

ID FirstName LastName
1 Bob Smith
1 Bob Smith

DELETE Duplicate Records Using ROWCOUNT

So to delete the duplicate record with SQL Server we can use the SET ROWCOUNT command to limit the number of rows affected by a query. By setting it to 1 we can just delete one of these rows in the table. Note: the select commands are just used to show the data prior and after the delete occurs.

SELECT * FROM dbo.duplicateTest 

DECLARE @id int = 1

IF EXISTS (SELECT count(*) FROM dbo.duplicateTest WHERE ID = @id HAVING count(*) > 1 )
BEGIN
   SET ROWCOUNT 1 
   DELETE FROM dbo.duplicateTest WHERE ID = @id 
   SET ROWCOUNT 0 
END

SELECT * FROM dbo.duplicateTest
query results

Here is a note from Microsoft about using SET ROWCOUNT:

Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in the next release of SQL Server. Avoid using SET ROWCOUNT together with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. Also, for DELETE, INSERT, and UPDATE statements that currently use SET ROWCOUNT, we recommend that you rewrite them to use the TOP syntax.

I tested the ROWCOUNT option with SQL Server 2017 and this option still works.

DELETE Duplicate Records Using TOP

With SQL Server 2005 and later we can also use the TOP command when we issue the delete, such as the following. Note: the select commands are just used to show the data prior and after the delete occurs.

-- delete all records and add records again
DELETE FROM dbo.duplicateTest
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith') 
INSERT INTO dbo.duplicateTest VALUES(2, 'Dave','Jones') 
INSERT INTO dbo.duplicateTest VALUES(3, 'Karen','White') 
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')	
	
SELECT * FROM dbo.duplicateTest 

DECLARE @id int = 1

IF EXISTS (SELECT count(*) FROM dbo.duplicateTest WHERE ID = @id HAVING count(*) > 1 )
   DELETE TOP(1) FROM dbo.duplicateTest WHERE ID = @id 

SELECT * FROM dbo.duplicateTest
query results

So as you can see with SQL Server 2005 and later there are two options to allow you to delete duplicate identical rows of data in your tables.

DELETE Multiple Duplicate Records Using TOP

One of the downsides to the above approaches is that they only delete one record at a time. So if there are more than two duplicates you have to rerun the commands.

Here is another option submitted by one of our readers Basharat Bhat if there are more than two duplicates.

-- delete all records and add records again
DELETE FROM dbo.duplicateTest
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith') 
INSERT INTO dbo.duplicateTest VALUES(2, 'Dave','Jones') 
INSERT INTO dbo.duplicateTest VALUES(3, 'Karen','White') 
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')	
	
SELECT * FROM dbo.duplicateTest 

DECLARE @id int = 1

DELETE TOP (SELECT COUNT(*) -1 FROM dbo.duplicateTest WHERE ID = @id)  
FROM dbo.duplicateTest  
WHERE ID = @id

SELECT * FROM dbo.duplicateTest 

DELETE Multiple Duplicate Records Using CTE

Here is another option submitted by one of our readers. This approach checks all of the columns to make sure that each column is a duplicate versus just the ID column in the above examples. This will delete records when there are 2 or more duplicate rows.

DELETE FROM dbo.duplicateTest
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith') 
INSERT INTO dbo.duplicateTest VALUES(2, 'Dave','Jones') 
INSERT INTO dbo.duplicateTest VALUES(3, 'Karen','White') 
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')	

SELECT * FROM dbo.duplicateTest; 

with temp(rank1,id ,fname,lname)
as (
   select row_number() over ( partition by ID, FirstName, LastName order by ID, FirstName, LastName ) , * 
   from duplicateTest
)
delete from temp where rank1 > 1;

SELECT * FROM dbo.duplicateTest;

DELETE Multiple Duplicate Records Using %%lockres%%

Here is another option submitted by another one of our readers. This approach checks all of the columns to make sure that each column is a duplicate versus just the ID column in the above examples. This will delete records when there are 2 or more duplicate rows.

DELETE FROM dbo.duplicateTest
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith') 
INSERT INTO dbo.duplicateTest VALUES(2, 'Dave','Jones') 
INSERT INTO dbo.duplicateTest VALUES(3, 'Karen','White') 
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')
INSERT INTO dbo.duplicateTest VALUES(1, 'Bob','Smith')	

SELECT * FROM dbo.duplicateTest; 

DELETE FROM a
FROM dbo.duplicateTest a
JOIN
(
SELECT MAX(%%lockres%%) pseudoID, id, FirstName, LastName
FROM dbo.duplicateTest
GROUP BY id, FirstName, LastName
) b ON b.id = a.id AND b.LastName = a.LastName AND b.FirstName = a.FirstName AND b.pseudoID <> a.%%lockres%%

SELECT * FROM dbo.duplicateTest;
Next Steps
  • Take a look how the ROWCOUNT command can be used to affect the results of your query
  • Also take a look at the TOP command and changes that have been implemented with SQL Server
  • Start using TOP instead of ROWCOUNT for SQL Server


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Greg Robidoux Greg Robidoux is the President and founder of Edgewood Solutions, a technology services company delivering services and solutions for Microsoft SQL Server. He is also one of the co-founders of MSSQLTips.com. Greg has been working with SQL Server since 1999, has authored numerous database-related articles, and delivered several presentations related to SQL Server. Before SQL Server, he worked on many data platforms such as DB2, Oracle, Sybase, and Informix.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, March 22, 2024 - 1:28:03 PM - Brent Shaub Back To Top (92106)
Helped me out of a big data jam today. Duplicate rows were exact and no primary key differentiated them. Thank you, Greg.

Wednesday, January 10, 2024 - 7:36:38 AM - Davood Taherkhani Back To Top (91847)
That's Great.

Friday, September 11, 2020 - 9:40:19 AM - Greg Robidoux Back To Top (86458)
Hi Josh, that is an option as well. Several ways to solve the problem.

-Greg

Friday, September 11, 2020 - 9:38:37 AM - Greg Robidoux Back To Top (86457)
Thanks Chuck. Let me test it out and I can update the article with your code.

Thanks
Greg

Friday, September 11, 2020 - 8:04:51 AM - Chuck Back To Top (86456)
Delete *all* duplicates using undocumented pseudo column %%lockres%% which acts as sort of a pseudo primary key for this statement...

DELETE FROM a
FROM dbo.duplicateTest a
JOIN
(
SELECT MAX(%%lockres%%) pseudoID, id, FirstName, LastName
FROM dbo.duplicateTest
GROUP BY id, FirstName, LastName
) b ON b.id = a.id AND b.LastName = a.LastName AND b.FirstName = a.FirstName AND b.pseudoID <> a.%%lockres%%

Thursday, September 10, 2020 - 9:55:06 PM - Josh Back To Top (86453)
Couldn't you just solve this by creating a new table? INSERT INTO #newtable SELECT DISTINCT from the original.

Sunday, August 23, 2020 - 5:55:44 PM - Jeff Moden Back To Top (86346)
Awesome, Greg. Thanks for the feedback and for taking the time to make the safety modifications to the article.

Tuesday, August 18, 2020 - 11:48:38 AM - Greg Robidoux Back To Top (86325)
I made some updates to the article to make sure that a delete only occurs if there is more than one record. Also added a variable to use so the same value doesn't need to be put in multiple places.

-Greg

Tuesday, August 18, 2020 - 11:26:53 AM - Greg Robidoux Back To Top (86324)
Hi Jeff,

Thanks for your feedback. What you said makes total sense. I guess I was assuming that someone using this already knows they have a duplicate so would only delete the record in that case. But I see what you are saying that it would be easy to make a mistake.

I will make some updates to this.

-Greg

Tuesday, August 18, 2020 - 9:53:49 AM - Jeff Moden Back To Top (86322)
Since the first two methods demonstrate are actually straight DELETEs and will delete whether a row is a duplicate or not, I think that a very bold, red warning should be added to the examples ad as a comment in the code telling people that every run will delete a row whether it's a duplicate or not.

The third example is almost as dangerous because you have to update the ID of the duplicate rows in two spots. If you forget, it will delete rows that you didn't intend to delete and it deserves the same kind of warning emblazoned just above the code and as a comment in the code.

Personally, I wouldn't have published any of the 3 as a way to delete duplicates... rather I'd have included them in a well identified section of the article to say "DON'T EVER DO THIS BECAUSE...."

Wednesday, December 18, 2019 - 10:07:50 AM - Jason Back To Top (83468)

I haven't run into something like this since before CTEs and didn't even think of a CTE being updatable. Good little tidbit, thank you!


Tuesday, December 18, 2018 - 8:32:56 AM - Rick Dobson Back To Top (78522)

 nice read.  thanks.


Wednesday, October 31, 2018 - 3:08:41 AM - Shahriar Back To Top (78120)
very much effective this query,
Thanks.

"with
temp(rank1,id ,fname,lname) as ( select row_number() over ( partition by ID, FirstName, LastName order by ID, FirstName, LastName ) , * from duplicateTest ) delete from temp where rank1 > 1;"

 

 


Monday, July 13, 2015 - 9:49:29 AM - Kris Maly Back To Top (38191)

I would like to share the video below which gives another good example

Part 4 Delete duplicate rows in sql

https://www.youtube.com/watch?v=ynWgSZBoUkU


Monday, November 24, 2014 - 4:29:50 AM - Archana Back To Top (35378)

Sir please tell how to delete updated recod in sql 2008


Monday, May 26, 2014 - 11:09:01 AM - Rahul Back To Top (31944)

 

with temp(rank1,id ,fname,lname)
as (
select row_number()over ( partition by [ID]  ,
[FirstName] ,
[LastName] 
order by  ID ,[FirstName] ,
[LastName]     ),* from duplicateTest
)

delete from temp where rank1>1


Tuesday, May 14, 2013 - 8:52:24 AM - Greg Robidoux Back To Top (23939)

@Ajay

You can run the command twice. 

The first time it will remove the first duplicate and the second time it will remove the second duplicate, etc...


Tuesday, May 14, 2013 - 6:21:20 AM - Ajay Back To Top (23937)
First of All thank for your Post, I Need to remove both Duplicate Record in SQL any other way to resolve this.

Friday, December 21, 2012 - 8:42:12 AM - Greg Robidoux Back To Top (21079)

I guess this depends on what data is in the table.  If you only have these records you could simply do a DELETE as follows:

DELETE FROM table WHERE name <> 'Dorababu'

If there are lots of other variations this gets more difficult because you would need to know which records match and what to keep and what to delete.

If the tables had an ID and the ID was all the same, but the name was different you could do the following:

DELETE FROM table WHERE ID = XX and name <> 'Dorababu'

Not sure if this answers your question or not.

 


Friday, December 21, 2012 - 6:39:11 AM - Dorababu Back To Top (21076)

Hi recently I was asked by an interviewer and the question is as follows, I am having duplicate names irrespective of the original name. For example my name is Dorababu some of them inserted as Dhorababu or dorebabu like this. I would like to delete the remaining except Dorababu from the table how can I do this


Monday, November 5, 2012 - 12:05:11 AM - Rasika Back To Top (20210)

I need to remove duplicated records from the below table. HNO field has a running number and I need to retain the last record of a given reference number

 

Also before deleting records I need to take a count of all reference numbers that are having duplicates

 

 

 

Dept          Reference     HNO           Date                 Amount

0001          111112        1             1100224              15000000

0001          111112        2             1100224              15000000

0001          111112        3             1100224              15000000

0002          111115        1             1110912              34231566

0002          111115        2             1110912              11470000

0002          111115        3             1111024              67500000

0002          111115        4             1111025                300000

 

Your help in this regard is much appreciated

 


Sunday, July 29, 2012 - 5:46:48 AM - Surjit Lakhnotra Back To Top (18838)

 

another way to delete the duplicate rows when we dont know how many duplicate rows are in the table

 

SET

 

ROWCOUNT 1

 

DELETE

 

store1 FROM dbo.store1 a WHERE(selectcount(*)

 

from

 

store1 b where b.storenamenew=a.storenamenew and

b

.storeaddnew=a.storeaddnew)>1

 

while

 

@@rowcount>0

 

DELETE

 

store1 FROM dbo.store1 a WHERE(selectcount(*)

 

from

 

store1 b where b.storenamenew=a.storenamenew and

b

.storeaddnew=a.storeaddnew)>1

 

SET

 

ROWCOUNT 0


Wednesday, June 6, 2012 - 6:29:02 AM - bhaskar Back To Top (17813)

hi,

good article and easy to understand

 

bhaskar

http://csharpektroncmssql.blogspot.com


Friday, April 27, 2012 - 4:57:45 PM - varshini Back To Top (17165)

 

Nice Job

 


Friday, April 27, 2012 - 8:45:00 AM - Jeff Back To Top (17154)

Seems like you could also use select top 1 


Thursday, April 12, 2012 - 8:59:47 AM - Greg Robidoux Back To Top (16873)

Kisan - if you delete records from a table that uses an identity column you are going to have gaps in the numbering.

If there were only 5 records in the table 1,2,3,4,5 and you delete record number 5 the next identity value will still be 6.  You could use the RESEED option to fix this and make the next number 5 again.  Take a look at this tip: http://www.mssqltips.com/sqlservertip/1123/managing-and-maintaining-sql-server-identity-values/

But if you have records with values 1,2,3,4,5 and you delete record number 3 you will still have a row without a number 3 value.  To get around this you could use SET IDENTITY INSERT.  Take a look at this tip: http://www.mssqltips.com/sqlservertip/1061/using-identity-insert-to-keep-sql-server-table-keys-in-sync/

 


Thursday, April 12, 2012 - 3:00:31 AM - kisan Back To Top (16864)

 

sir I have One More Question

Suppose i create auto id like

create table demo

(

 id int identity(1,1) primary key

)

if i inserted  5 records and i delte  record Number 5

so agin if i inserted  record so it will give id no:6

but i want  id 5

so how will be do sir....


Wednesday, April 11, 2012 - 12:56:01 PM - kisan Back To Top (16858)

thanku...  sir.......

 

this  i try this query it will work fine....... thanku muchhhhhh..


Wednesday, April 11, 2012 - 12:55:19 PM - Greg Robidoux Back To Top (16857)

Gadi35 - yes you are correct that it will only delete one record.  I was referring to the example that "kisan" listed that just had two records, so this would delete one of those two records, but I agree if there were more than two duplicates only one record would be deleted.

 


Wednesday, April 11, 2012 - 12:14:13 PM - Gadi35 Back To Top (16856)

Greg, that last code will cancel anything that exists and only leave the duplicates.


Wednesday, April 11, 2012 - 8:49:12 AM - Greg Robidoux Back To Top (16850)

Kisan - if you run this query this will delete only one of these records. Since they are identical it doesn't matter which one you delete.

DELETE TOP(1) FROM dbo.duplicateTest WHERE ID = 1

 


Wednesday, April 11, 2012 - 8:13:02 AM - kisan Back To Top (16849)

I got your answer 

 if i  delete record it will delete both  beczz their is no. primary key use it...

if i m taking any column so i cant delete but thier is redudency  it will delete both record......

 

so telll me sir what synatx should i use in delete query to deleted secound record..


Tuesday, April 10, 2012 - 7:31:14 AM - Greg Robidoux Back To Top (16832)

Hi Kisan,

if the records are identical it doesn't really matter which record you are deleting also you have no control over what record SQL Server will present to you.

If there are other columns in the table that have different values then you can add that to the where clause if not you won't have any real control.

You can also look the post above from Gadi35.


Tuesday, April 10, 2012 - 6:50:11 AM - kisan Back To Top (16831)

I want To delete Secound Record  how i can delete:

ID FirstName LastName
1 Bob Smith
1 Bob Smith

 


Wednesday, March 14, 2012 - 1:48:47 PM - Greg Robidoux Back To Top (16396)

Hi Erni, not exactly sure what you need to do.

If you are trying to eliminate duplicate primary key values in a table, make sure the table has a PRIMARY KEY setup so there is not a chance to get duplicate records.  The front end application would need to handle the error message from SQL Server, but this would ensure you don't have duplicates.


Wednesday, March 14, 2012 - 1:42:54 PM - erni Back To Top (16395)

how to delete duplicate using phpmyadmin?is working when insert data,data can not read..that he publish 1234567 are duplicate primary key and when insert 3456789,he also publish 1234567?

how i can i do?plz


Friday, March 9, 2012 - 8:24:37 AM - Greg Robidoux Back To Top (16314)

Gopi, not sure what you are asking.

If the records already exist in both the FACT table and FACTERROR then can you just delete the records from the FACTERROR table?


Friday, March 9, 2012 - 6:24:41 AM - gopi Back To Top (16310)

There are duplicate records in Fact and Facterror table(both table have same records in them some 100 records).i need to delete the records from facterror table and insert it into Fact table.


Friday, March 2, 2012 - 1:49:40 AM - manolitobsj Back To Top (16234)

This post have been posted for more than two year but it helped me alot. Thanks.


Wednesday, February 22, 2012 - 12:30:20 AM - karthick Back To Top (16123)

hi,

im facing one problem in ur example you knows the duplicate data's key value so you can delete easily.

What my question in there is 1000's records with few duplicate (in composit key one is different) then how to extract the that particular duplicate  records and delete it

 


Friday, May 29, 2009 - 2:22:18 AM - kanagarajum Back To Top (3477)

Hai,  can u try this code?

 

DELETE FROM TestTable WHERE
EXISTS
(SELECT Sid FROM TestTable T WHERE T.Tname = TestTable.Tname AND I.SId < TestTable.SId)

 

 

 


Sunday, May 18, 2008 - 11:57:57 PM - balakumar.sk Back To Top (1007)

It didnt work Prakash,

gave me the follwoing error

Msg 156, Level 15, State 1, Line 2

Incorrect syntax near the keyword 'As'."

 


Thursday, May 15, 2008 - 7:31:44 AM - Prakash M Back To Top (995)

Hi Friend Pls test the following one, dis s a very simple method

Delete Top (Select Count(*)-1 From EmpTest Where EmpId=A.EmpId)
From EmpTest As A

 

 


Friday, May 9, 2008 - 2:29:02 PM - admin Back To Top (965)

Thank you both for the feedback.

Thank you,
The MSSQLTips.com Team


Tuesday, May 6, 2008 - 1:16:11 PM - Gadi35 Back To Top (949)

There is another very good way to do this, in SQL Server 2005, if you do not have the key set up, but want to effectively use a column or more as your primary key.

 ;with DelDup as (select row_number() over (partition by
FirstName, LastName
order by FirstName, LastName) as RowNofrom duplicateTest)
Delete from DelDup where RowNo> 1

--alternate--

 ;with DelDup as (select row_number() over (partition by
ID
order by ID) as RowNofrom duplicateTest)
Delete from DelDup where RowNo> 1

You can easily adjust this to use one column or as many as you wish, as long as you update the column(s) after the partition by and after the order by.


Friday, April 4, 2008 - 7:38:20 AM - admin Back To Top (829)

Bals,

Can you please post the code you are having an issue with?

Thank you,
The MSSQLTips.com Team


Tuesday, March 18, 2008 - 2:29:33 AM - Bals Back To Top (743)

The corelated query in TOP clause is not working .. Basharat Bhat has given only for single dup entry















get free sql tips
agree to terms