SQL GROUP BY Queries to Summarize and Rollup Sets of Records

By:   |   Updated: 2022-09-08   |   Comments   |   Related: > TSQL


Problem

The SQL GROUP BY clause allows you to group data together to summarize data as well as figure out other things like minimum and maximum values for a group in a SQL database.  In this SQL tutorial we look at examples of how to use the SQL GROUP BY clause. 

Solution

This SQL tutorial will cover several examples of the GROUP BY clause to show how to use it. To follow along, please make sure the following are installed:

  1. SSMS
  2. SQL Server Database Engine
  3. AdventureWorks database

Grouping Data and Getting Sum of Values

If we want to sum the values in a column, we use the SUM function as shown below with the following SELECT statement:

SELECT [OrderDate], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]

But when the above query is run, we get an error message. The error message says that the OrderDate is not in the aggregate or the group by.

Column 'Sales.SalesOrderHeader.OrderDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

To fix the problem, the OrderDate should be included in a GROUP BY clause as follows.

SELECT [OrderDate], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY OrderDate

The query will show the sales data including the OrderDate and the SUM of Freight. As you can see, the GROUB BY is used in conjunction with aggregate functions (in this case, the SUM) to group by OrderDate.  Here is the result set:

basic group by example

GROUP BY Multiple Columns

Whenever you do not use an aggregate function on a column, you need to make sure you include that column in the GROUP BY statement.

In the example below, we are using an aggregate function on Freight, but not on OrderDate or ShipDate, so both of these need to be included in the GROUP BY.

SELECT [OrderDate], [ShipDate], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY [OrderDate], [ShipDate]

If we do the following:

SELECT [OrderDate], [ShipDate], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY [OrderDate]

We get this error.

Msg 8120, Level 16, State 1, Line 1
Column 'Sales.SalesOrderHeader.ShipDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

GROUP BY with Multiple Aggregate Functions

You can have several aggregate functions in a query. The SQL SELECT statement example shows a SQL query with two aggregate functions (sum and avg) and the GROUP BY clause.

SELECT [OrderDate], SUM([Freight]) as Freight, AVG(TaxAmt) as TaxAmt
FROM [Sales].[SalesOrderHeader]
GROUP BY OrderDate

Here is the result set:

group by with 2 aggregate functions

GROUP BY with SQL HAVING

In addition to using GROUP BY we can also add a check to only include certain groups when a value matches some criteria. In the example below, we are summing the Freight value, but we only want to include rows when the sum of Freight is greater than 400.

SELECT OrderDate, SUM([Freight]) as Freight 
FROM [Sales].[SalesOrderHeader]
GROUP BY OrderDate
HAVING SUM([Freight]) > 400

Here is the result set:

 group by error with the having clause

It is important to put the HAVING clause after the GROUP BY. If we do the following.

SELECT [OrderDate], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
HAVING SUM([Freight]) > 400
GROUP BY OrderDate

We will get this error message:

Msg 156, Level 15, State 1, Line 4
Incorrect syntax near the keyword 'GROUP'.

GROUP BY with Expressions

You can also use expressions in the GROUP BY.

The following example shows how to include an expression in the GROUP BY where we are grouping by just the date part:

SELECT FORMAT(OrderDate,'yyyy.mm.dd') as OrderDate, SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY FORMAT(OrderDate,'yyyy.mm.dd')
HAVING SUM([Freight]) > 400

Here is the result set:

group by with expressions

GROUP BY with SQL ORDER BY

To display the data in a particular order, we can use ORDER BY to arrange in ascending (ASC) or descending (DESC) order.

In this query, we want the order of the results to show the highest sum of Freight values first.

SELECT FORMAT(OrderDate,'yyyy.mm.dd') as OrderDate, SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY FORMAT(OrderDate,'yyyy.mm.dd')
HAVING SUM([Freight]) > 400
ORDER BY SUM([Freight])

Here is the result set:

group by order by example

Again, it is important to put the ORDER BY at the end. If we try this query:

SELECT FORMAT(OrderDate,'yyyy.mm.dd') as OrderDate, SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY FORMAT(OrderDate,'yyyy.mm.dd')
ORDER BY SUM([Freight])
HAVING SUM([Freight])>400

We will get this error message:

Msg 156, Level 15, State 1, Line 5
Incorrect syntax near the keyword 'HAVING'.

So, we need to use this order in the query GROUP BY, HAVING, ORDER BY.

Also, the use of HAVING is not needed. You could just use GROUP BY and ORDER BY.

Group Data with ROLLUP Example

ROLLUP is an option of GROUP BY and creates combinations of columns specified. The following example will create different combinations of Status and Freight values, including NULL values. It is commonly used to generate subtotals. This will also include a total summary value for all records.

SELECT OrderDate, [Status], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY ROLLUP (OrderDate, [Status])
HAVING SUM([Freight]) > 400

Here is the result set:

group by rollup example

Group Data with CUBE Example

The CUBE option with GROUP BY, creates groups of all possible combinations of the column values.

In the following query, we will show the results of all combinations of OrderDate and Status columns. This will also include total summary values for all combinations.

SELECT OrderDate, [Status], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY CUBE (OrderDate, [Status])
HAVING SUM([Freight]) > 400

Here is the result set:

GROUP BY CUBE Example

Group Rows with GROUPING SETS Example

The following query will group the data together using results from both ROLLUP and CUBE in the following SELECT clause:

SELECT OrderDate,[Status], SUM([Freight]) as Freight
FROM [Sales].[SalesOrderHeader]
GROUP BY GROUPING SETS (ROLLUP(OrderDate,[Status]), CUBE(OrderDate,[Status]))
HAVING SUM([Freight]) > 400

Here is the result set:

GROUPING SETS
Next Steps

For more information, refer to these links:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Daniel Calbimonte Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and 6-time Microsoft Certified IT Professional. Daniel started his career in 2001 and has worked with SQL Server 6.0 to 2022. Daniel is a DBA as well as specializes in Business Intelligence (SSIS, SSAS, SSRS) technologies.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2022-09-08

Comments For This Article

















get free sql tips
agree to terms