SQL COUNT and SQL COUNT DISTINCT in SQL Server
By: Eric Blinn | Updated: 2022-11-22 | Comments | Related: More > TSQL
I need to know what COUNT DISTINCT means in SQL Server. What is the difference between COUNT and COUNT DISTINCT? How does it handle NULL values? Are there performance concerns when using COUNT DISTINCT in a SQL database?
COUNT is an aggregate function used in T-SQL code to count the number of rows. Unlike other aggregate functions such as SUM or AVG, COUNT doesn't care what the values are in the column(s)—caring only that the rows exist. However, it can be modified by using the DISTINCT keyword to return a value that coincides with the number of unique values in the column in question.
This tutorial shows several examples of COUNT and COUNT DISTINCT to show the difference this keyword can make. All demos will be run in SQL Server Management Studio (SSMS) and against SQL Server 2019.
SQL COUNT vs COUNT(DISTINCT)
The following code creates a temporary table with four rows. There are two string columns (varchar data type), one numeric column, and one date column. One of the string columns allows NULL values while the other does not. Here is the syntax:
CREATE TABLE #AGG_TEST ( a_number INT NOT NULL , nullable_text VARCHAR(10) NULL , not_nullable_text VARCHAR(10) NOT NULL , a_date DATETIME2); INSERT INTO #AGG_TEST VALUES (1, 'RED', 'CAT', '16-Dec-2023') ,(2, 'BLUE', 'DOG', '23-Jan-2017') ,(3, NULL, 'DOG', '19-Aug-2013') ,(4, NULL, 'MOUSE', '14-Sep-2004');
The rows in the table should look exactly like this:
Now that the table has been created, what would a COUNT function return when run against this table with a SELECT statement? The COUNT function is commonly run with the * (star or asterisk) character within the parentheses. That is the first column in this query below. It is run again for each column individually. Finally, the SQL query is run with the constant values 1 and "hello".
SELECT COUNT(*) CountStar , COUNT(a_number) Counta_number , COUNT(nullable_text) CountNullableText , COUNT(not_nullable_text) CountNotNullableText , COUNT(a_date) Counta_date , COUNT(1) Count1 , COUNT('hello') CountHello FROM #AGG_TEST;
When executed, the query returns exactly one row as expected. Each of the COUNT functions returns the number 4, matching the number of rows in the table. Except for the nullable text column, which returns a 2, the number of non-null values.
SQL COUNT DISTINCT
Using the DISTINCT clause with the aggregate function COUNT works by adding the keyword within the parentheses and before the column name to be counted in the SELECT statement.
SELECT COUNT(DISTINCT a_number) Counta_number FROM #AGG_TEST;
Note: COUNT DISTINCT must be paired with a single column and cannot be used with the * character.
COUNT DISTINCT will work exactly like COUNT except that it will only count rows that are non-null AND are unique within the row.
Consider the non-nullable text column of the sample table. The value DOG appears twice within the column. That value will only be counted once in the COUNT DISTINCT instead of twice as it was in the regular COUNT function.
The nullable text column has two NULL values. Just like COUNT, they will not be considered at all. The NULL value will not be the third distinct value after RED and BLUE.
The following query is exactly the same as the one above except for the DISTINCT keywords and the removal of the first column, which cannot be included.
SELECT -- COUNT(DISTINCT *) CountStar <-- Not allowed COUNT(DISTINCT a_number) Counta_number , COUNT(DISTINCT nullable_text) CountNullableText , COUNT(DISTINCT not_nullable_text) CountNotNullableText , COUNT(DISTINCT a_date) Counta_date , COUNT(DISTINCT 1) Count1 , COUNT(DISTINCT 'hello') CountHello FROM #AGG_TEST;
Check out the query results below. They are pretty different from the results above.
The count of a_number and a_date remains the same since all four values in each column were unique. The value for nullable text also remained the same as there are only two non-null values, and both are unique. The two hard-coded values, 1 and "hello", dropped to 1 as only one value was used on every table row. Finally, the non-nullable text column returned a 3 instead of a 4, as neither "DOG" rows were counted.
If a fifth row were added to the table that was also DOG, it would not change the value of the COUNT DISTINCT for that column. See the following SQL statement:
INSERT INTO #AGG_TEST VALUES (5, 'GREEN', 'DOG', '18-Mar-2019'); SELECT COUNT(not_nullable_text) TotalRows , COUNT(DISTINCT not_nullable_text) DistinctNotNullableText FROM #AGG_TEST;
Performance Differences Between COUNT and COUNT(DISTINCT)
To see the performance difference between COUNT and COUNT(DISTINCT), the query needs to examine more than five rows. To follow along, these next demos will depend upon the WideWorldImporters sample database, which can be downloaded for free from GitHub.
Open an SSMS query window, connect to WideWorldImporters, and run this statement:
SET STATISTICS TIME,IO ON;
This will allow some performance statistics to appear in the SSMS messages tab.
Next, run a simple count of two columns within the largest table in the database:
SELECT COUNT(ColdRoomSensorNumber) CountSensor ,COUNT(Temperature) CountTemp FROM Warehouse.ColdRoomTemperatures_Archive;
The results tab will show two identical numbers because the columns in question are not nullable; thus, all rows are counted. The messages tab should now show a series of performance metrics. The logical reads measure how much disk and/or memory were used to run the query. The total for this query was 9,554. The elapsed time indicates this query ran in almost exactly one second while using just over six seconds of CPU time.
Here is the same query, but with the DISTINCT keyword added to both COUNT functions:
SELECT COUNT(DISTINCT ColdRoomSensorNumber) CountSensor ,COUNT(DISTINCT Temperature) CountTemp FROM Warehouse.ColdRoomTemperatures_Archive;
Executing it gives very different results. However, this demo focuses on the messages tab and the performance metrics.
The reads for all tables combined were almost 11 million. Most of that was in a temporary table where SQL Server likely sorted the rows too quickly to find and eliminate duplicate values. The execution time increased from one second to just over 38 seconds while using over 56 seconds of CPU time.
Asking SQL Server to examine every row of a large table or result set can be a huge performance issue in some scenarios, especially those with large row counts. Use caution when implementing COUNT DISTINCT.
COUNT DISTINCT has a little bit of a limited use case, but when a case inevitably comes up, it will be imperative to have this tool in your TSQL toolbox.
- Using the SQL Server Group By clause
- Understanding Aggregate Functions in SQL Server
- Using STATS TIME and IO
- All SQL Functions on MSSQLTips.com
- SQL Server T-SQL Aggregate Functions
- Using Window Functions to Create Smart, Quick Queries
About the author
View all my tips
Article Last Updated: 2022-11-22