Problem
This tip demonstrates how find and list all duplicate rows in a dataset using a Common Table Expression (CTE).
Solution
CTEs provide a readable alternative to subqueries, especially when combined with window functions for row identification. Numerous prior MSSQLTips.com articles focused on different ways of describing CTE syntax and implementing specific kinds of tasks with CTEs. You can search MSSQLTips.com for a list of prior articles on CTEs.
Duplicate rows can show up in multiple environments, such as text and csv files. Failing to detect and remove duplicate rows from a dataset can result in improperly computed aggregates. SQL Server gives you several ways to identify and remove duplicates. Common Table Expressions (CTEs) offer one of the most readable SQL techniques for identifying duplicates in an underlying data source.
This tip demonstrates how to detect duplicates in a data source with one item per dataset row. Furthermore, each row contains both an identifier to reference the row as well as an item identifier to reference what is being ordered. This tip describes simple CTE code examples for a source dataset with one item per row to return non duplicated rows and duplicated rows in separate result sets.
Note on Datasets with One Item per Row
In many systems, datasets can contain multiple line items for a single logical entity, such as one order having multiple line items. However, some systems store data in a one-to-one structure where each dataset row represents exactly one item. Examples include subscription purchases (one subscription per transaction) and service based work requests where each request is for precisely one work item. For clarity and focus, this tip uses a simple orders table with a one-to-one structure (one item per order on each row). This keeps the CTE design easy to follow without the need to resolve parent/child relationships. A follow up tip will extend the scope of CTE demonstrations for de-duplicating by showing how to de-duplicate datasets that contain multiple line items for entities in a parent/child relationship, such as a mother having multiple children or an order having multiple line items for different products.
In this tip, the term “duplicate” refers to a dataset row that matches one or more other dataset rows across all column values. Because the sample dataset is truly one row per item, any repeated rows represent an exact duplicate set of column values. In the context of the sample dataset described in the next section, two rows are considered duplicates only when order_id, customer_id, order_date, product_id, quantity, and unit_price column values all match.
Sample Dataset
Many SQL Server applications deal with an orders table or some other kind of flat table. The following script defines and populates an orders table with a flat data source that includes no parent/child relationships.
The dataset includes two exact duplicate row sets and eighteen unique rows.
The duplicate row sets include one with two duplicate rows and another with three duplicate rows.
use tiny_cte_dataset_ver1
go
-- create orders table
-- source for de-dupe example
drop table if exists orders;
create table orders (
order_id int,
customer_id int,
order_date date,
product_id int,
quantity int,
unit_price decimal(10,2)
);
-- insert orders data
insert into orders values
-- exact duplicate set #1
(201, 1, '2024-01-05', 10, 2, 25.00),
(201, 1, '2024-01-05', 10, 2, 25.00),
(201, 1, '2024-01-05', 10, 2, 25.00),
-- exact duplicate set #2
(218, 18, '2024-03-10', 10, 2, 25.00),
(218, 18, '2024-03-10', 10, 2, 25.00),
-- remaining unique rows
(202, 2, '2024-01-10', 11, 1, 100.00),
(203, 3, '2024-01-12', 12, 3, 15.00),
(204, 4, '2024-01-15', 13, 4, 12.50),
(205, 5, '2024-01-20', 14, 2, 40.00),
(206, 6, '2024-01-25', 15, 1, 75.00),
(207, 7, '2024-01-28', 10, 5, 25.00),
(208, 8, '2024-02-01', 12, 5, 10.00),
(209, 9, '2024-02-05', 11, 2, 100.00),
(210, 10, '2024-02-10', 13, 3, 40.00),
(211, 11, '2024-02-12', 14, 2, 75.00),
(212, 12, '2024-02-18', 15, 2, 60.00),
(213, 13, '2024-02-20', 10, 4, 25.00),
(214, 14, '2024-03-01', 11, 1, 100.00),
(215, 15, '2024-03-03', 12, 6, 10.00),
(216, 16, '2024-03-05', 13, 1, 40.00),
(217, 17, '2024-03-08', 14, 3, 75.00),
(219, 19, '2024-03-12', 12, 3, 12.50),
(220, 20, '2024-03-15', 15, 2, 60.00); Each duplicate row set is identical across all column values. The goal of the following code is to return two result sets – one row for each unique row and one for all duplicated rows.
Detecting and Returning Non-duplicated Rows
The following script uses a CTE and a trailing select statement to return non duplicated rows from the orders table.
The count(*) function returns an order_count value representing how many rows match across all six columns in the table. Rows with an order_count value of one are non duplicated, and rows with an order_count value greater than one belong to duplicate row sets.
The join brings the order_count value from the CTE into the trailing select, allowing the query to filter on order_count = 1 while still returning all full original rows.
-- CTE that identifies unique rows based on all columns
with counts as (
select *,
count(*) as order_count
from orders
group by
order_id,
customer_id,
order_date,
product_id,
quantity,
unit_price
)
select o.*
from orders o
join counts c
on o.order_id = c.order_id
and o.customer_id = c.customer_id
and o.order_date = c.order_date
and o.product_id = c.product_id
and o.quantity = c.quantity
and o.unit_price = c.unit_price
where c.order_count = 1
order by o.order_id;Here is a result set for the non-duplicated or unique rows from the preceding script.

A much more compact way to return the same result is to use a having clause, such as having count(*) = 1.
-- finding non-duplicated rows with a having clause
select *
from orders
group by order_id, customer_id, order_date, product_id, quantity, unit_price
having count(*) = 1Get List of Duplicated Rows
Here is a result set based on the preceding CTE example with an alternative criterion of “where c.order_count > 1” instead of “where c.order_count = 1” for the trailing select statement. The result set displays all rows for each duplicate order.
-- CTE that identifies duplicated rows based on all columns
with counts as (
select *,
count(*) as order_count
from orders
group by
order_id,
customer_id,
order_date,
product_id,
quantity,
unit_price
)
select o.*
from orders o
join counts c
on o.order_id = c.order_id
and o.customer_id = c.customer_id
and o.order_date = c.order_date
and o.product_id = c.product_id
and o.quantity = c.quantity
and o.unit_price = c.unit_price
where c.order_count > 1
order by o.order_id;Here is the result set that shows all rows that are duplicates along with each occurence of the duplicated row.

We could also having, but this only lists the duplicate row one time instead of every occurence.
-- finding non-duplicated rows with a having clause
select *
from orders
group by order_id, customer_id, order_date, product_id, quantity, unit_price
having count(*) > 1In order to get a list of all of the duplicate rows we would need to join back to the table and match on all columns such as was done in the CTE above.
Next Steps
Here are some ways you can extend the example demonstrated in this tip within your own environment:
- Make sure your data source dataset has just one item per parent row. Datasets with more than one item per parent row require a different CTE design to separately list non-duplicated and duplicated rows.
- A follow up tip will demonstrate a CTE design to de duplicate datasets with multiple line items per parent.
- The grouped count approach in this tip is flexible, readable, and easy to adapt to many real world de duplication scenarios. If you encounter edge cases in your own environment, consider posting a comment requesting a follow-up tip.
- Here are some related articles:
Rick Dobson is a SQL Server professional with decades of T-SQL experience that includes authoring books, running a national seminar practice, working for businesses on finance and healthcare development projects, and serving as a regular contributor to MSSQLTips.com. He has been a practicing Python developer for more than the past half decade – with a special emphasis for data visualization and ETL tasks with JSON and CSV files. His most recent professional passions include financial time series data and analyses, AI models, and statistics. If you are interested in growing your skills in any of these areas especially as they relate to financial securities, consider visiting his blog at https://securitytradinganalytics.blogspot.com/2023/12/.
- MSSQLTips Awards: Leadership (200+ tips) – 2025 | Author of the Year Contender – 2017-2024



I came here to say exactly the same thing Henn. Here’s a specific example:
;With CTE_Duplicates As
(
Select ROW_NUMBER () Over (Partition By DateOfBirth, ClientGroupID Order By ClientID) RowNumber
From dbo.Client
)
Delete
From CTE_Duplicates
Where (RowNumber > 1)
You only need to specify your meaningful columns once without all those interminable join conditions.
I’m doing this same more simpler way
When I have in table full duplicates – actually there is now way to operate only with one of them
I need to identify – I need to remove or something
with Identify_Duplicates as (
select *,
row_number() over (partition by all_meaningfull_columns order by rand()) nr
)
select * from
— or after check
— delete
Identify_Duplicates
where nr > 1
I agree that this group by gives same or similar result but using CTE like this is only way (I’ve found) to remove those duplicates