solving sql server problems for millions of dbas and developers since 2006



SQL Server DBA Tips SQL Server Developer Tips SQL Server Business Intelligence Tips SQL Server Career Tips SQL Server Tip Categories SQL Server Tutorials SQL Server Webcasts SQL Server Whitepapers SQL Server Tools SQL Server Questions and Answers MSSQLTips Authors About MSSQLTips SQL Server User Groups MSSLQTips Giveaways MSSQLTips Advertising Options

MSSQLTips Facebook Page MSSQLTips LinkedIn Page MSSQLTips RSS Feed MSSQLTips Twitter Page MSSQLTips Google+ Page





SQL Product Highlight

SQL Sentry, Inc. - SQL Sentry Performance Advisor for SQL Server

SQL Sentry Performance Advisor for SQL Server delivers an advanced performance dashboard with relevant Windows and SQL Server metrics in a single view along with detailed insight of heavy SQL, blocking, deadlocks, and disk bottlenecks. Performance Advisor is packed with ground-breaking features that are not found in any other performance monitoring software, all designed with the singular goal of simplifying the process of optimizing your SQL Server performance.

Learn more!




Using CROSS JOIN queries to find records out of sequence

By: | Read Comments (8) | Print



Related Tips: More

Problem
This topic is nothing new, but well worth revisiting for those who have not used CROSS JOINS. You may have faced a problem where you need to compare historical records for a particular set of records to ensure that records have been saved in the correct order. This could be for comparisons of dates or numbers or anything.  The need is to ensure that the historical records have been saved in the correct order and this tip will show you a simple technique for identifying records that have been stored out of order for a subset of records.

Solution
Let’s say I have table in which there are three fields:

  • PersonID
  • Version
  • DEDate  (date entry date)

Here, PersonID is an integer field and could be considered our primary ID. Version is again an integer field and could be considered a secondary ID. So, in all PersonID and Version will form the Composite Key. There can be many versions of the same PersonID and each Version would have a corresponding Data Entry Date (DEDate). Our problem is to compare the Data Entry Dates of all the versions of a particular person on an iterative basis using a single query to ensure the Data Entry Dates are in order based on the Version for a particular person.

Data Entry Date’s for all versions of a particular PersonID should be compared in in such a way that the succeeding version’s Data Entry Date should be greater than its preceding version’s Data Entry Date. If the records are stored correctly, then that PersonID’s version is said to be in proper order, otherwise it is said to be in an improper order.

I have tried lots of ways to tackle this problem, using cursors, loops, subqueries, etc... , but the best solution I have found is by using CROSS JOINS.

The following script will help you understanding the problem and its solution.  I am creating a new table, adding some data and then using the query to determine which records are out of order.

CREATE TABLE [dbo].[Person](
     [PersonID] [int] NOT NULL,
     [Version] [int] NOT NULL,
     [DEDate] [datetime] NULL,
CONSTRAINT [PK_Person] PRIMARY KEY CLUSTERED 
(
     [PersonID] ASC,
     [Version] ASC
))
GO
INSERT INTO Person VALUES(1,0,'03/10/2000')
INSERT INTO Person VALUES(1,1,'03/16/2000')
INSERT INTO Person VALUES(1,2,'03/19/2000')
INSERT INTO Person VALUES(1,3,'03/18/2000')
INSERT INTO Person VALUES(1,4,'03/17/2000')
INSERT INTO Person VALUES(2,0,'02/10/2000')
INSERT INTO Person VALUES(2,1,'02/11/2000')
INSERT INTO Person VALUES(2,2,'02/18/2000')
INSERT INTO Person VALUES(3,0,'03/25/2000')
INSERT INTO Person VALUES(3,1,'03/23/2000')
INSERT INTO Person VALUES(3,2,'03/26/2000')
INSERT INTO Person VALUES(3,3,'03/30/2000')
INSERT INTO Person VALUES(4,0,'08/19/2000')
INSERT INTO Person VALUES(4,1,'08/20/2000')
INSERT INTO Person VALUES(4,2,'08/23/2000')
INSERT INTO Person VALUES(4,3,'08/24/2000') 
GO
SELECT *,
     (SELECT 
          CASE WHEN (SUM(CASE WHEN B.DEDate<C.DEDate THEN 0 
               ELSE 1 
               END))>=1 THEN 1 
          ELSE 0 
          END AS IsNotProper 
     FROM Person2 B CROSS JOIN 
          Person2 C
     WHERE B.PersonID=A.PersonID 
          AND C.PersonID=A.PersonID 
          AND B.Version<>C.Version 
          AND B.Version<C.Version) 
     AS [OutOfOrder] 
FROM Person2 A 

In the output it will display all the records with a flag called OutOfOrder. If the flag is 1 that means there is a problem with the record order. Otherwise, the PersonID’s versions are in proper order.

In this example we can see that PersonID 1 has two records that are out of order.  The 2003-03-18 record came before the 2008-03-17 record, so the query picked these up as being out of order.  Also, for PersonID 3 the 2000-03-25 record came before the 2000-03-23 record, so again these are out of order.

So, instead of using loops and cursors to determine if there is an issue it handles everything in a single query.


In this next example we are looking at just one record (PersonID =1) and we can also see the execution plan that is generated.

SELECT *,
     (SELECT 
          CASE WHEN (SUM(CASE WHEN B.DEDate<C.DEDate THEN 0 
               ELSE 1 
               END))>=1 THEN 1 
          ELSE 0 
          END AS IsNotProper 
     FROM Person2 B CROSS JOIN 
          Person2 C
     WHERE B.PersonID=A.PersonID 
          AND C.PersonID=A.PersonID 
          AND B.Version<>C.Version 
          AND B.Version<C.Version) 
     AS [OutOfOrder] 
FROM Person2 A 
WHERE A.PersonID = 1 

Query Output

Statistics I/O

(5 row(s) affected)

Table 'Worktable'. Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Person'. Scan count 7, logical reads 14, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Execution Plan

 


Next Steps

  • In this article dates are compared, but the same process can be used if you have some other field that needs to be compared
  • Even multiple comparisons can be made on many fields at the same time.
  • You can even look at the performance impact by using a huge amount of data to see that this approach is still easier and faster than using loops and/or cursors.


Related Tips: More | Become a paid author


Last Update: 1/13/2009

Share: Share 






Comments and Feedback:

Tuesday, January 13, 2009 - 4:38:09 PM - jcelko Read The Tip

 let's first try to get the data element names into ISO-11179 rules and write it in Standard SQL instead of local dialect.  Let's get the sdates into ISO-8601 format, too.  Since Data Entry date is physical meteadata, that column should be the posting date or some other logical data elemetn name.  Actuaqlly, if the version number is temporal then it is redudant. But ignoring that

CREATE TABLE People  -- tabls have co9llective names
(person_id INTEGER NOT NULL,
 version_nbr INTEGER NOT NULL,
  posting_date DATE NOT NULL,
 PRIMARY KEY (person_id, version_nbr));

INSERT INTO People VALUES
VALUES (1, 0, '2000-03-10'),
       (1, 1, '2000-03-16'),
       (1, 2, '2000-03-19'),
       (1, 3, '2000-03-18'),
       (1, 4, '2000-03-17'),
       (2, 0, '2000-02-10'),
       (2, 1, '2000-02-11'),
       (2, 2, '2000-02-18'),
       (3, 0, '2000-03-25'),
       (3, 1, '2000-03-23'),
       (3, 2, '2000-03-26'),
       (3, 3, '2000-03-30'),
       (4, 0, '2000-08-19'),
       (4, 1, '2000-08-20'),
       (4, 2, '2000-08-23'),
       (4, 3, '2000-08-24');

With the new functions, your query is fast and easy:

SELECT DISTINCT X.person_id, X.version_nbr, X.posting_date
  FROM (SELECT person_id, version_nbr, posting_date
               ROW_NUMBER()
               OVER (PARTITION BY person_id
                         ORDER BY version_nbr) As version_seq,
               ROW_NUMBER()
               OVER (PARTITION BY person_id
                         ORDER BY posting_date) As posting_seq
         FROM People) AS X
 WHERE X.version_seq <> X.posting_seq;



Tuesday, January 13, 2009 - 4:42:09 PM - admin Read The Tip

Thanks for posting a modified version.


Wednesday, January 28, 2009 - 5:57:34 AM - Jeff Moden Read The Tip

Oh, be careful... the original code contains "qualified" Triangular Joins.  If you take a peek at the arrows in the execution plan, you'll find row counts of 107 and 66 to process just the 16 that are the original data.  For more on what a "Triangular Join" is, please see the following article...

 http://www.sqlservercentral.com/articles/T-SQL/61539/

 


Wednesday, January 28, 2009 - 6:54:50 AM - Sukhotinsky Read The Tip

[quote user="jcelko"]...With the new functions, your query is fast and easy:

SELECT DISTINCT X.person_id, X.version_nbr, X.posting_date
  FROM (SELECT person_id, version_nbr, posting_date
               ROW_NUMBER()
               OVER (PARTITION BY person_id
                         ORDER BY version_nbr) As version_seq,
               ROW_NUMBER()
               OVER (PARTITION BY person_id
                         ORDER BY posting_date) As posting_seq
         FROM People) AS X
 WHERE X.version_seq <> X.posting_seq;
[/quote] You seem to be solving a bit different problem, Consider the set of data:

INSERT INTO People VALUES
       (1, 1, '2099-01-01'),
       (1, 2, '2002-01-01'),
       (1, 3, '2003-01-01'),
       (1, 4, '2004-01-01'),
       (1, 5, '2005-01-01'),
       (1, 6, '2006-01-01'),
       (1, 7, '2007-01-01'),
       (1, 8, '2008-01-01'),
       (1, 9, '2009-01-01'),

Only first record is out of order according to the guy's requirement "the succeeding version’s Data Entry Date should be greater than its preceding version’s Data Entry Date.". All but the first record have succeeding version’s posting_date greater than its posting_date. But your query returns that all the records are out of order for that set of data.

Self joins seem to be a way to go for such kind of problems. "SQL Puzzles And Answers" by Joe Celko is incredible book on the subject. The only thing I'd like to add is that for really huge sets of data I would execute stored procedures page by page. Just remember current person_id and version_nbr between the executing the stored proc and handle only page_size (roughly) number of records per the execution.


Wednesday, January 28, 2009 - 7:30:27 AM - tonimm Read The Tip

Here is a self-join to get the rows out of order. 

 

select     p2.personid outoforderperson,p2.version outoforderver, p2.dedate outoforderdate,
    p1.personid inorderpers,p1.version inorderver, p1.dedate inorderdate

from person p1
join person p2
    on p1.personId = p2.personid and p1.version < p2.version and p1.dedate > p2.dedate

 

Pardon me if the format is screwy.  This is my first post to the site

 

Toni


Wednesday, January 28, 2009 - 7:33:39 AM - tonimm Read The Tip

 Or if you want just order based solely on the preceding id:

select     p2.personid outoforderperson,p2.version outoforderver, p2.dedate outoforderdate,
    p1.personid inorderpers,p1.version inorderver, p1.dedate inorderdate

from person p1
join person p2
    on p1.personId = p2.personid and p1.version = p2.version -1 and p1.dedate > p2.dedate

 

Toni


Wednesday, January 28, 2009 - 8:38:17 AM - ahains Read The Tip

I don't see why a cross join was used here. If I change the CROSS JOIN to INNER JOIN, and change the WHERE clause to ON, I get the same query plan and hence same result.


Wednesday, January 28, 2009 - 9:17:28 AM - ahains Read The Tip

 I rewrote it the following way. It is a different result than the original query because I want just the wrong versions flagged. If there are a great multitude of versions I would like to be able to see which specific version is wrong. Often times it would be better not to include columns like DEDate from the CTE (doesn't participate in the rowNum generation), but in this particular case it is preferable to leave it in since we have it "in hand" in the (clustered) index it is running over. I just guessed at how to handle null dates since it wasn't spelled out.


;with cte as
(SELECT
    PersonId
    ,Version
    ,DEDate
    ,rowNum=row_number() over (partition by personid order by version)
from Person
)
select
    a.PersonId
    ,a.Version
    ,a.DEDate
    ,OutOfOrder =
        case
            when b.PersonId is null then cast(0 as bit)
            when a.DEDate > isnull(b.DEDate, 0) then cast(1 as bit)           
            else 0
        end           
from cte as a
    left outer join cte as b on a.PersonId = b.PersonId
        and a.rowNum + 1 = b.rowNum



Post a Comment or Question

Keep it clean and stay on the subject or we may delete your comment.
Your email address is not published. Required fields are marked with an asterisk (*)

*Name   *Email   Notify for updates
Comments
*Enter Code refresh code


 
Sponsor Information
"SQL diagnostic manager delivers response in minutes, not hours!"

SQL Monitor – For database professionals who need results on Day One. Try it online.

Need SQL Server help and not sure where to turn? Reach out to expert consultants for a Health Check.

Get SQL Server Tips Straight from Kevin Kline.

Join the over million SQL Server Professionals who get their issues resolved daily.

Valuable SQL Server web casts on Performance Tuning, Development, Administration, SSIS and more...


Copyright (c) 2006-2012 Edgewood Solutions, LLC All rights reserved
privacy | disclaimer | copyright | advertise | about
authors | contribute | feedback | giveaways | user groups
Some names and products listed are the registered trademarks of their respective owners.


Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com