mssqltips logo

SQL Server Window Functions LEAD and LAG



By:
Overview

In this part of the tutorial, well look at offset functions, especially the LAG and LEAD functions. They were introduced in SQL Server 2012 and they made writing specific patterns in T-SQL much easier. With those functions, complex queries using self-joins or cursors can be replaced with easier queries.

LAG and LEAD

The LAG function has the ability to fetch data from a previous row, while LEAD fetches data from a subsequent row. Both functions are very similar to each other and you can just replace one by the other by changing the sort order.

Using the AdventureWorks data warehouse, well calculate the sales amount of the previous year.

SELECT
     [Year]                         = YEAR([OrderDate])
    ,[Sales Amount]                 = SUM([SalesAmount])
    ,[Sales Amount Previous Year]   = LAG(SUM([SalesAmount])) OVER (ORDER BY YEAR([OrderDate]))
FROM [dbo].[FactResellerSales]
GROUP BY YEAR([OrderDate])
ORDER BY [Year];

We can see that for each year, the data of the previous year has been fetched in the second column:

offset function

The LAG/LEAD function has also two optional parameters:

  • The offset. The default is 1, but you can jump back more rows by specifying a bigger offset. You cannot specify a negative value.
  • A default value. When there is no previous row (in the case of LAG), NULL is returned. You can see this in the screenshot in the first row. You can specify a default value to be returned instead of NULL.

If we would sort descending in the window function, LAG will fetch the next rows value instead of the previous one:

lag descending is lead

To show you the contrast, this is how the previous year values needed to be calculated before LAG/LEAD were introduced:

WITH CTE_Years AS
(
    SELECT
         [Year]                         = YEAR([OrderDate])
        ,[Sales Amount]                 = SUM([SalesAmount])
    FROM [dbo].[FactResellerSales]
    GROUP BY YEAR([OrderDate])
)
, CTE_PY AS
(
    SELECT
         y1.[Year]
        ,y1.[Sales Amount]
        ,[Sales Amount Previous Year] = y2.[Sales Amount]
    FROM CTE_Years y1
    LEFT JOIN CTE_Years y2 ON ([y2].[Year] + 1) = [y1].[Year]
)
SELECT * FROM [CTE_PY]
ORDER BY [Year];

As you can see, the query is a bit more elaborate since an extra step needs to be taken: first the sales amount per year needs to be calculated, then the results need to be joined to itself in order to fetch the data from the previous year.

Calculating Year-over-Year growth

Using LAG, its easy to calculate the year-over-year growth of sales. Lets reuse the query from the previous example:

WITH CTE_PY AS
(
    SELECT
         [Year]                         = YEAR([OrderDate])
        ,[Sales Amount]                 = SUM([SalesAmount])
        ,[Sales Amount Previous Year]   = LAG(SUM([SalesAmount]))
                                            OVER (ORDER BY YEAR([OrderDate]))
    FROM [dbo].[FactResellerSales]
    GROUP BY YEAR([OrderDate])
)
SELECT
     [Year]
    ,[Sales Amount]
    ,[Sales Amount Previous Year]
    ,[YoY Growth]                   = 100.0 * ([Sales Amount] - [Sales Amount Previous Year])
                                        / [Sales Amount Previous Year]
FROM [CTE_PY]
ORDER BY [Year];

This gives us the following result:

yoy growth

FIRST_VALUE and LAST_VALUE

Both functions are straight forward: they either return the first or the last value of an ordered set. Lets illustrate using the sample data introduced in the previous part of the tutorial:

SELECT
     [Group]
    ,[Value]
    ,FirstValue    = FIRST_VALUE([Value]) OVER (PARTITION BY [Group] ORDER BY [Value])
    ,LastValue     = LAST_VALUE([Value])  OVER (PARTITION BY [Group] ORDER BY [Value])
FROM [dbo].[Test];

The results:

offset functions

The FIRST_VALUE function returns nicely the first value for each group. The LAST_VALUE function however suffers from the same issue as the MAX function in part 4 of the tutorial: it returns the last value of the current frame, which is giving the ascending sort the same value as the current row. If you want to find the maximum value, you better use the MAX function with a descending sort or specify a correct frame extent (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for example).

Additional Information

Last Update: 5/31/2018




More SQL Server Solutions











Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name
*Email
Email me updates

Signup for our newsletter

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.






download





get free sql tips

I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



Learn more about SQL Server tools