SQL Servers Lag and Lead Functions to Help Identify Date Differences
By: Thomas LeBlanc | Updated: 2015-01-19 | Comments (3) | Related: More > Functions - System
Analytics in the Health Care industry requires locating previous visits or claims that indicate if a patient/member is being readmitted to a hospital or rehab facility. Basic T-SQL is very difficult to use in order compute these rates when not using temporary tables. The T-SQL statement does become hard to follow if you are not an expert in set based theory nor is it very readable. Are there any simple answers to find previous records for readmissions in our example?
SQL Server added the LAG and LEAD functions in T-SQL to help find a previous record related to the key fields. The addition of these functions help with readability and simplicity. No longer do you need to use temporary tables although using staging tables for further calculation could be required.
The following is a view of the Claims structure. Our date columns used will be StartDate and EndDate.
Here are the CREATE TABLE and INSERT statements for this example
SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO SET ANSI_PADDING ON GO CREATE TABLE [dbo].[ClaimHeader]( [ClaimNumber] [varchar](20) NOT NULL, [PatID] [varchar](50) NOT NULL, [ClaimStartDate] [date] NOT NULL, [ClaimEndDate] [date] NOT NULL ) ON [PRIMARY] GO SET ANSI_PADDING OFF GO INSERT INTO dbo.ClaimHeader VALUES ('123','AB0123','2014-01-21','2014-01-28')' INSERT INTO dbo.ClaimHeader VALUES ('124','AB0123','2014-03-01','2014-03-09')' INSERT INTO dbo.ClaimHeader VALUES ('125','AB0123','2014-03-28','2014-03-31')' INSERT INTO dbo.ClaimHeader VALUES ('126','AB0123','2014-03-31','2014-04-10')'
A Look at the Data
We are going to use the ClaimEndDate and ClaimStartDate to find our readmissions. Here is a view of some of the data and what we are trying to accomplish.
Using the SQL Server LAG Function to get Previous End Date
If you look closely, you will see that the first claims started on 1/21/14 and ended 1/28/14. The next claim starts on 3/1/14. This indicates that the second admission did not happen within 30 days of the previous claim. Thus, it is not a readmission within 30 days.
But, the third claim started on 3/28/14 and the previous (2nd claim) ended 3/9/14 which is within 30 days of each other.
Letís use the LAG functions to get the days between claims. The first parameter in the function is the date we are using in the current row - ClaimEndDate. The second parameter (i.e. 1) is the previous occurrence in the set returned with Null being used for the last parameter for the return value if nothing is found by the LAG function.
-- DateDiff( day, Previous End Date, Current Start Date) AS DiffDays /* LAG (scalar_expression [,offset] [,default]) OVER ( [ partition_by_clause ] order_by_clause ) */ SELECT ClaimNumber, PatID AS PatientID, ClaimStartDate, ClaimEndDate, LAG( ClaimEndDate, 1, Null) OVER (PARTITION BY PatID ORDER BY ClaimStartDate, ClaimEndDate) AS PreviousClaimEndDate FROM dbo.ClaimHeader
OVER divides the data into partitions (PARTITIONED BY - PatID) containing a rowset from the FROM (ClaimHeader) clause using ORDER BY (StartDate and EndDate) to sort the set. This is how we get claims for a Patient ordered by ClaimEndDate ascending.
This use of the LAG function gives us the Previous ClaimEndDate and with this we can calculate the day difference between Start and End.
-- DateDiff( day, Previous End Date, Current Start Date) AS DiffDays SELECT *, DATEDIFF ( day, PreviousClaimEndDate, ClaimStartDate) AS DiffDays FROM ( SELECT ClaimNumber, PatID AS PatientID, ClaimStartDate, ClaimEndDate, LAG( ClaimEndDate, 1, Null) OVER (PARTITION BY PatID ORDER BY ClaimStartDate, ClaimEndDate) AS PreviousClaimEndDate FROM dbo.ClaimHeader ) LagSelect
From this output, we can see the first Claim does not have a previous claim, so it is not a re-admit within 30 days. The second claim can be excluded from readmit count because the DiffDays is 32 days.
Now, the third claim is within 19 days of the previous, so it will be included in the 30 readmit count, but 4th claim has a DiffDays of 0. This was interesting because it would seem the patient went home and was re-admitted the same day of discharge. The Metrics team took note of this and examined the data more closely and was able to find other columns in the claim that indicated a transfer to a different facility, thus excluding it from the count.
So, the LAG (and possibly the LEAD) functions can really help in looking at before and after records to compute date differences from a single T-SQL statement that is more readable and logical than previous methods for gathering this information.
- Read more about LAG and LEAD functions
Last Updated: 2015-01-19
About the author
View all my tips