Introduction to Subqueries in SQL Server
How and when should I use subqueries while working with Microsoft SQL Server? Under what circumstances should I use subqueries in SQL Server? In this tip we look at the best way to use subqueries.
What are subqueries?
Subqueries are queries that are nested inside of another query or statement. They are permitted wherever SQL Server would allow an expression, and are indicated by enclosing the subquery in parenthesis. For instance, using the sample AdventureWorks database, we might create a query to find any employees who have more vacation available than the average:
SELECT BusinessEntityID, LoginID, JobTitle, VacationHours FROM HumanResources.Employee E1 WHERE VacationHours > (SELECT AVG(VacationHours) FROM HumanResources.Employee E2)
In that example, we used the subquery in the WHERE clause and it returned a single value.
The part below is the subquery from the example above. It is the part of the query inside the parentheses.
(SELECT AVG(VacationHours) FROM HumanResources.Employee E2)
We can put subqueries in the FROM clause and access the rows returned from them like any other table. For instance, if we want to know which employees have more vacation hours than average for their job title, we might write:
SELECT E1.BusinessEntityID, E1.LoginID, E1.JobTitle, E1.VacationHours, Sub.AverageVacation --Drawn from the subquery FROM HumanResources.Employee E1 JOIN (SELECT JobTitle, AVG(VacationHours) AverageVacation FROM HumanResources.Employee E2 GROUP BY JobTitle) sub ON E1.JobTitle = Sub.JobTitle WHERE E1.VacationHours > Sub.AverageVacation ORDER BY E1.JobTitle
Correlated subqueries are subqueries which reference the main query or statement. This allows them to do a wide array of comparisons or manipulation. For instance, if we want to know which employees had their pay rates changed effective in 2003, we could use a correlated subquery like:
SELECT * FROM HumanResources.Employee E WHERE EXISTS (SELECT * FROM HumanResources.EmployeePayHistory EPH WHERE E.BusinessEntityID = EPH.BusinessEntityID --Comparing a value from the outer query AND YEAR(EPH.RateChangeDate) = 2002)
That correlated subquery compares values from the base table of the outer query with values from the table in the inner query. It would be relatively easy to rewrite that particular query by joining the tables rather than using a subquery. That is fairly common and many uses for subqueries could be reformulated to avoid using the subquery and instead use techniques like joins.
A slightly more sophisticated use for correlated subqueries allows for concatenations across rows, at least in SQL Server 2005 and later. For instance, if we wanted a concatenated list of all the titles that were associated with each organizational level from the HumanResources part of Adventure Works, we could use:
SELECT E.OrganizationLevel, STUFF((SELECT DISTINCT ', ' + E2.JobTitle FROM HumanResources.Employee E2 WHERE E.OrganizationLevel = E2.OrganizationLevel --Correlation between inner and outer queries FOR xml PATH ('')), 1, 1, '') FROM HumanResources.Employee E GROUP BY E.OrganizationLevel
That type of query would be difficult to reframe as a JOIN, though it can be rewritten using user defined functions. Jeff Moden has looked at the performance implications of the two techniques and found that this style, using the correlated subquery, is substantially faster than using the user defined functions.
While correlated subqueries like that can perform quite well, they can often cause performance issues since they may force the server to run the subquery for every row considered by the outer statement. Jeff Moden has an article detailing why this occurs and refers to one possible scenario as creating a triangular join. For performance reasons, correlated subqueries need to be used cautiously.
Common Table Expressions (CTEs)
Common Table Expressions (CTEs) were introduced in SQL Server 2005 and they are essentially a different syntax for creating a subquery. They can sometimes make it easier to read and maintain the code if it will be reused. This is especially so when you have multiple subqueries.
A CTE is formatted as: WITH [CTE Name] as ([query]). The CTE can then be referenced by name, but only within the statement it was created in. Any statement coming immediately before a CTE is declared must be terminated with a semicolon (;). You can have more than one CTE within a statement if they are separated by commas.
For instance, if we wanted details on employees that have worked in more than one department or worked more than one designated shift, we could use CTEs to handle the concatenation.
--Any previous query in the same batch must be terminated by a semicolon (;) WITH ConcatDepartments AS (SELECT EDH.BusinessEntityID, STUFF((SELECT ', ' + D.name FROM HumanResources.EmployeeDepartmentHistory EDH2 JOIN HumanResources.Department D ON EDH2.DepartmentID = D.DepartmentID WHERE EDH2.BusinessEntityID = EDH.BusinessEntityID --correllate inner and outer FOR xml PATH ('')), 1, 1, '') AS Departments, COUNT(*) AS NumDepartments FROM HumanResources.EmployeeDepartmentHistory EDH GROUP BY EDH.BusinessEntityID), --The comma allows another CTE in the same statement ConcatShifts AS (SELECT EDH.BusinessEntityID, STUFF((SELECT ', ' + S.Name FROM HumanResources.EmployeeDepartmentHistory EDH2 JOIN HumanResources.[Shift] S ON EDH2.DepartmentID = S.ShiftID WHERE EDH2.BusinessEntityID = EDH.BusinessEntityID --correllate inner and outer FOR xml PATH ('')), 1, 1, '') AS Shifts, COUNT(*) AS NumShifts FROM HumanResources.EmployeeDepartmentHistory EDH GROUP BY EDH.BusinessEntityID) SELECT P.BusinessEntityID, P.FirstName, P.LastName, CD.Departments, CS.Shifts FROM Person.Person P INNER JOIN ConcatDepartments CD ON P.BusinessEntityID = CD.BusinessEntityID INNER JOIN ConcatShifts CS ON p.BusinessEntityID = cs.BusinessEntityID WHERE CD.NumDepartments > 1 OR CS.NumShifts > 1;
A CTE can also reference itself, making it recursive. Among other things, this can be used for counting, though it may not be efficient or scale well, as Jeff Moden explained. For a straightforward example:
WITH CountingCTE AS (SELECT 1 AS num UNION ALL SELECT num + 1 FROM CountingCTE WHERE num < 7) SELECT * FROM CountingCTE
In more sophisticated usage, recursive CTEs are useful for working with a parent-child hierarchy.
- As Arshad Ali explains, one use for subqueries is to remove duplicate rows, especially if there is an identity column on the table.
- Edwin Sarmiento explained how to use CTES for recursive queries.
- Andy Brown has provided examples of using CTES to make code with subqueries easier to read.
About the author
View all my tips