Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Checking for potential constraint violations before entering SQL Server TRY and CATCH logic


By:   |   Read Comments (5)   |   Related Tips: More > Error Handling

New Whitepaper > 10 Ways We Can Steal Your Data >> download now


Problem

With the advent of SQL Server 2005, many people adopted the belief that they could abandon all of their proactive error handling techniques and use this shiny new TRY/CATCH mechanism. While TRY/CATCH certainly makes for easier programming, and aligns in general with methodologies used in more mature languages, it isn't always a "free" way to perform error handling.

Solution

It is easy to determine whether we should be allowing the engine to capture all of our exceptions for us, or if it is worth the effort to prevent exceptions when we can. Let's consider a simple table with a primary key containing 100,000 unique values:

CREATE TABLE dbo.Numbers
(
 
n INT PRIMARY KEY
);


INSERT dbo.Numbers(n)
SELECT TOP (100000) rn = ROW_NUMBER() OVER
   
(ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.objects AS s2;

Now given this table, let's consider a very contrived example where we have users trying to insert numbers into the table. They may not know there's a primary key, and they may not have the ability to check what values are already there. They just know that they need to call a stored procedure called something like dbo.InsertNumber and pass the number they want to add to the table. One way that dbo.InsertNumber could be coded is to just try and insert the parameter passed in, and bubble an exception back to the caller when it fails. But most of us have procedures that are a little more user friendly than that. We probably wrap our INSERT statement inside of TRY/CATCH like the following (I'll leave out friendlier error messages, logging, etc. for the purposes of this discussion, and we can pretend that there is more than just the INSERT going on inside the transaction):

CREATE PROCEDURE dbo.InsertNumber_JustInsert
 
@Number INT
AS
BEGIN
  SET NOCOUNT ON
;

 
BEGIN TRY
     
BEGIN TRANSACTION;
     
INSERT dbo.Numbers(n) SELECT @Number;
     
COMMIT TRANSACTION;
 
END TRY
 
BEGIN CATCH
     
-- error handling goes here
     
ROLLBACK TRANSACTION;
 
END CATCH
END
GO

In most scenarios, if you are doing performance testing, you probably don't really test the throughput for this procedure when the exception is raised, because most people think an exception is the exception (no pun intended). I've long been a believer that exception handling is expensive, and as such I've usually written the procedure this way (changes highlighted in bold):

CREATE PROCEDURE dbo.InsertNumber_CheckFirst
 
@Number INT
AS
BEGIN
  SET NOCOUNT ON
;

 
IF NOT EXISTS
  (
     
SELECT 1 FROM dbo.Numbers WHERE n = @Number
 
)
 
BEGIN

      BEGIN TRY
         
BEGIN TRANSACTION;
         
INSERT dbo.Numbers(n) SELECT @Number;
         
COMMIT TRANSACTION;
     
END TRY
     
BEGIN CATCH
         
-- error handling goes here
         
ROLLBACK TRANSACTION;
     
END CATCH
 
END
END
GO

Obviously it is still possible that, even with the check, in a highly concurrent system some other session may have inserted the same number I'm attempting after I've performed the existence check but before I've actually attempted the INSERT. I've purposely left non-default isolation levels out of the discussion to keep it simple, but for now we can just assume that in some cases we will still hit our CATCH block.

Now, you may not be convinced that this is actually faster, but think about this: the EXISTS check performs a seek, but so does the code that checks for a violation against the index. So that part is essentially a wash, and then add both the transaction and error handling scaffolding, and I suspect that in cases where the value already exists, that will make the initial check negligible in comparison.

But let's not take my relatively unscientific word for it; let's test it! Let's say we want to test inserting 100,000 numbers, in four different scenarios:

  1. All insert succeed
  2. 66% of inserts succeed
  3. 33% of inserts succeed
  4. No inserts succeed

To perform this testing, we need two things: (a) a procedure that loops and tries to insert unique and non-unique rows using either of the two procedures above, and (b) a log table that will keep track of how long each batch takes. Here is the log table:

CREATE TABLE dbo.InsertLog
(
 
Test VARCHAR(32),
 
Step TINYINT,
 
UniqueRows INT,
 
dt DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP()
);
GO

And here is the procedure - which takes @Test and @UniqueRows as parameters, iterates through a cursor that combines the right mix of numbers that will and won't violate the constraint, and calls the relevant stored procedure based on the test (which, for simplicity, I'll call 'JustInsert' and 'CheckFirst'):

CREATE PROCEDURE dbo.InsertNumber_Wrapper
 
@Test      VARCHAR(32),
 
@UniqueRows INT
AS
BEGIN
  SET NOCOUNT ON
;

 
-- always
 
DELETE dbo.Numbers WHERE n > 100000;

 
-- record a log entry for step 1 (start)
 
INSERT dbo.InsertLog(Test, Step, UniqueRows) SELECT @Test, 1, @UniqueRows;

 
DECLARE @n INT;

 
DECLARE c CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR
  SELECT
n FROM
 
(
     
-- to generate successes; unique values come from > 100000
     
SELECT TOP (@UniqueRows) n = n + 100000 FROM dbo.Numbers
     
ORDER BY n
 
) AS x(n)
 
UNION ALL
 
SELECT n FROM
 
(
     
-- to generate failures; duplicate values come from <= 100000
     
SELECT TOP (100000 - @UniqueRows) n FROM dbo.Numbers
     
ORDER BY n
 
) AS y(n)
 
ORDER BY n;

 
OPEN c;

 
FETCH NEXT FROM c INTO @n;

   
-- trudge through cursor
 
WHILE (@@FETCH_STATUS <> -1)
 
BEGIN
      IF
@Test = 'JustInsert'
         
EXEC dbo.InsertNumber_JustInsert @Number = @n;

     
IF @Test = 'CheckFirst'
         
EXEC dbo.InsertNumber_CheckFirst @Number = @n;

     
FETCH NEXT FROM c INTO @n;
 
END

  CLOSE
c;
 
DEALLOCATE c;

 
-- record a log entry for step 2 (end)
 
INSERT dbo.InsertLog(Test, Step, UniqueRows) SELECT @Test, 2, @UniqueRows;
END
GO

Now we have a table to operate on, a procedure to call, and a log table to store results. The following batch will test the inserts in each scenario I've described:

EXEC dbo.InsertNumber_Wrapper @Test = 'CheckFirst', @UniqueRows = 100000;
EXEC dbo.InsertNumber_Wrapper @Test = 'CheckFirst', @UniqueRows = 66000;
EXEC dbo.InsertNumber_Wrapper @Test = 'CheckFirst', @UniqueRows = 33000;
EXEC dbo.InsertNumber_Wrapper @Test = 'CheckFirst', @UniqueRows = 0;
EXEC dbo.InsertNumber_Wrapper @Test = 'JustInsert', @UniqueRows = 100000;
EXEC dbo.InsertNumber_Wrapper @Test = 'JustInsert', @UniqueRows = 66000;
EXEC dbo.InsertNumber_Wrapper @Test = 'JustInsert', @UniqueRows = 33000;
EXEC dbo.InsertNumber_Wrapper @Test = 'JustInsert', @UniqueRows = 0;

On my system, this took about four minutes to run. I ran the test 3 times to smooth out any anomalies. We can now look at the log table and see how long each test took, on average:

;WITH s AS 
(
   
SELECT Test, Step, UniqueRows, dt, rn = ROW_NUMBER() OVER
   
      (PARTITION BY Test, UniqueRows ORDER BY dt)
 
FROM InsertLog
),
x AS
(
 
SELECT s.Test, s.UniqueRows,
     
[Duration] = DATEDIFF(MILLISECOND, s.dt, e.dt)
 
FROM s INNER JOIN s AS e
 
ON s.Test = e.Test
 
AND s.UniqueRows = e.UniqueRows
 
AND s.rn = e.rn - 1
 
AND s.Step = 1
 
AND e.Step = 2
)
SELECT [Test], UniqueRows, [Avg] = AVG([Duration]*1.0)
FROM x
GROUP BY [Test], UniqueRows
ORDER BY UniqueRows, [Test];

Results:

Results of aggregate query

If we plot these on a graph, we can see that checking first is significantly better for our throughput performance in all but one case (at least in terms of duration; we didn't actively measure I/O or CPU utilization in order to achieve these speeds):

Graph results of aggregate query

In the case where all insert attempts fail, we average over 11 inserts per MS if we check for the violations first, but only 3.5 inserts per millisecond. As the number of successful inserts increases, we see that our write times increase faster for successful inserts than the time spent dealing with failures. For the case where 33% of the insert attempts are successful, the average drops to 4.4 inserts per millisecond when we check first compared to 2.9 inserts per millisecond if we catch the error. At 66% success rate, they become very close - 2.7 inserts per millisecond when we check, and 2.4 inserts per millisecond when we don't check. Only when we get to 100% success rate (and I imagine it is somewhere in the 90% range), we see the scales tip, and finally preventing the exception costs more (1.9 inserts per millisecond) than just letting the system handle it (2.0 inserts per millisecond).

Conclusion

I've shown that while checking for errors can hamper throughput in cases where you expect a very low number of constraint violations, the impact can be quite dramatic as the failure rate goes up.

I've obviously run a very isolated test here, and have used a lot of constants that should be variable in your own tests, including adjusting the isolation level, not using explicit transactions, playing with the various data types and constraints in your system, and dealing with concurrency. You should really stress test your application (on your hardware, with your schema, data and usage patterns) and specifically its error handling mechanisms in a worst case scenario in terms of failure rate. In some cases it might make sense to perform some additional checking before allowing a constraint to be violated, as the above tests should demonstrate; however, this isn't necessarily true in all cases.

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand is a Senior Consultant at SQL Sentry, Inc., and has been contributing to the community for about two decades, first earning the Microsoft MVP award in 1997.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Tuesday, June 09, 2015 - 9:41:05 AM - Aaron Bertrand Back To Top

Jihad, I'm saying that in cases where the row already exists, if EXISTS check does the same work that the insert would have had to do in order to attempt to enforce the constraint.


Tuesday, June 09, 2015 - 8:41:53 AM - Jihad Back To Top

Thanks for this tip. I have a question thought: You say that "the EXISTS check performs a seek, but so does the code that checks for a violation against the index. So that part is essentially a wash...". But is the code that checks for a violation not called by the INSERT, thus performing a seek twice? Or do you rely on the database server's efficiency so that the second call to seek is negligble because it caches part of the result of the first call to seek in EXISTS?


Saturday, March 14, 2015 - 11:12:27 AM - Yawar Back To Top

Thanks, interesting tip. What if we do the existence check inside the transaction and then either go ahead and insert, or rollback, depending on the result of the existence check inside te transaction? That should take care of highly-concurrent systems.


Saturday, March 03, 2012 - 10:56:42 PM - Jon Back To Top

I'd be curious to see a comparison between the two options above, and a version such as below.  I wonder if the speed difference is primarily due to the TRY...CATCH, or the transaction.

I think that it is would also be a great test to determine the impact on concurrency of the two options.

   IF NOT EXISTS
   (
      
SELECT 1 FROM dbo.Numbers WHERE n = @Number
  
)
  
BEGIN
           INSERT dbo.Numbers(n) SELECT @Number;
   END


Tuesday, February 28, 2012 - 10:34:00 AM - John Fox Back To Top
You bring up a good point about speed, but there is a another reason. We discovered a few years ago that a unique key constraint had been dropped (didn't find out why, the DBA was no longer with us). Because the program depended on the system to report the error to do an update instead of an insert, we discovered thousands of rows in the database for the table that should have resulted in an update instead of an insert and spent days resolving the issue. In a perfect world, this would never have happened. With proper change management and deployment procedures, this would have been caught. With adequate data integrity checks, it would have been caught. But the world is not perfect, and in my experience, depending on the database to report an error instead of coding proper error detection in the code is subject to breaking down. Having both lessens the risks when the perfect world meets reality.

Learn more about SQL Server tools