Avoiding SQL Server Query Disaster through Refactoring for Resilience

By: Derek Colley | Comments (4) | Related: > Testing

Problem

Occasionally, new development work submitted from your developers (or indeed written by you, the data professional) can fail due to unforeseen execution errors. These errors are despite testing on your dev/test/QA platform. Sometimes these errors can be catastrophic such as the infamous UPDATE or DELETE without WHERE, for example, or a rollback script that fails when executed. This tip aims to discuss a logical approach to testing almost any inbound script for logical and behavioral inconsistencies and aims to strengthen the credibility of your scripts, supplementing the testing phase of your development lifecycle.

Solution

To tackle the problem, we must first identify the scope of the problem. Many data professionals (and by this term, I include database administrators, data architects, database developers and other associated roles) will often find themselves with a brief (or otherwise) set of business requirements and the instruction to turn them into viable code, either as a one-time controlled batch operation or as a process of adding new functionality to the data layer. Here's an example of such a requirement:

1) Requirement: For all future customers, we need to capture the date of birth, where possible, in the database.

2) Solution Outline: Add a column to the dbo.Customers table called [dob], type DATE.

This is a fairly easy requirement. Why would it fail? Well, let's first look at the simplest method of implementing this change.

 ALTER TABLE dbo.Customers ADD dob DATE;

What could go wrong?

Let's take one example - code dependency - what if there exists a code dependency (in the language of your choice) that pulls back a result set from the Customers table? Depending on the flexibility of the code, it is uncertain whether this could cause a failure and therefore potentially a cascading set of code failures. Example (VB / pseudo code):

 ' Suppose conn is already defined as an ADO DB connection object
 ' and that recordSet is defined as an ADODB.RecordSet object (a special kind of derivative array object)
 ' and that variable custId is passed in as the unique customer reference number.
 conn.Open
 recordSet = conn.Execute("SELECT TOP 1 * FROM dbo.Customers WHERE CustID = " & custId & ";") ' expects 3 columns
 Dim resultArray(2) ' note arrays are 0-indexed so this has 3 allocations
 resultArray - recordSet.ToArray

Should we add a column as proposed, we will break this portion of application-side code since the resultArray object contains only three allocations in one dimension, and the table change means we will be returning four. It is quite possible this segment of code is integral to the operation of a customer-facing or otherwise business-critical application module, for example an 'Order Summary' page, which could consequently display an error or cause other effects.

This particular example underlines the need for rollback scripts - ideally such a change will be tested first on a suitable test platform (in this case ideally with a stage website or other application connected to the test platform to test for unforeseen application layer problems) and this testing will identify the error. Sadly, in real life (as many MSSQLTips.com readers will identify with this), complete change testing is often not possible thanks in part to the complexity of modern IT systems, the changing nature of software development (from structured methodologies like SSADM to unstructured, quick-win methodologies like Agile), and resource constraints. This emphasizes the need for rollback scripts, so we must make sure both our change scripts and rollback scripts are as watertight as possible. This will be the focus of this tip.

Temporarily putting aside the question of application-side errors and concentrating on the syntax and structure of our proposed statement, what other problems do we have with our proposed ALTER TABLE?

What happens if the column already exists? We need to take this into account.
Or we're running a version of SQL Server that doesn't support the DATE data type.
Or the dbo.Customers table doesn't exist.
Or the executing user doesn't have DDL rights on the database.
Or the exclusive lock attempt on the table fails due to deadlock, for example, and our transaction is chosen as the victim?
... ad infinitum.

This article won't deal with all of these potential issues individually, but it will *assume* that they are all viable concerns, and put in place measures to mitigate against them. We need to ensure that there's a valid path through our batch of transactions such that *for all probable paths, a known and controlled output exists*. Let me explain - for the ALTER TABLE statement, any one of a number of factors (as above) can cause a failure. For uncontrolled exceptions (i.e. exceptions handled by SQL Server rather than user-defined exception handling), we are reliant on the SQL Server engine to handle the exception and present it in the most appropriate way. However we can put in place measures to handle any exception ourselves, even at the most basic, catch-all level, as follows:

 BEGIN TRY
  ALTER TABLE dbo.Customers ADD [dob] DATE;
 END TRY
 BEGIN CATCH
  RAISERROR('Could not create table, unspecified error occurred!',16,1) WITH LOG
 END CATCH

We have placed an explicit TRY/CATCH block in our procedure. This adds assurance, especially with complex transactions or batches, that the attempt to alter the table, if it fails, will be handled and won't execute in part - it simply won't take place at all - an extension, or enforcement, of the atomicity principle (the 'A' in ACID). Now here's the beauty of sequential instructions in SQL Server (i.e. non-parallel execution) - choices can be mapped in a tree, with each 'path' through the procedure mapped to a node. We can then test the viability of each path of the node by following the constraints that govern the path of execution, to ensure each node is reachable. We can, in other words, test each probable path through the batch to ensure that for all viable choices, a controlled output is obtained. This has advantages not only in error handling, but in testing (since the batch can be put through multiple tests to gauge the probability of success), and in business logic. Here, for example, is one such tree for the short transaction above:

You can see it's actually fairly secure. We have drawn this out as a binary tree-like structure, with the assumption a given statement will either succeed or fail. We can see there is one problem only - what will happen if the RAISERROR... statement fails. All other eventualities lead to a known output. We know, thanks to SQL Server's ACID properties, that in this situation an error will return to console and the transaction will be aborted (the state of the DB will return to the pre-transaction state), so for our purposes this erroneous path can be safely ignored. Furthermore this erroneous output arguably has a negligible probability of occurring, so that this output does *not* form a probable path. Identifying and excluding improbable paths is an important part of limiting the scope of logical testing.

By far the best advantage about logically testing using this method is that it can be done mentally or on paper, in swift time with little to no resources. In computer science, this method is also called the 'dry run', where each probable path through a set of choices is mapped and the values calculated for testing purposes.

Let's look at a more realistic batch of SQL statements, to see if we can identify any anomalies.

 -- pseudo-random procedure to pick 10 lucky winners from a pool of 1,000,000 customers
 -- assumes dbo.Customers exists with column custID UNIQUE INT NOT NULL
 CREATE TABLE dbo.Winners (
 custID INT NOT NULL )
 
 DECLARE @rnd FLOAT
 
 WHILE ( SELECT COUNT(*) FROM dbo.Winners ) < 10 
 
 BEGIN
 
  SET @rnd = ROUND((RAND() * 1000000),0)
 
  IF NOT EXISTS ( SELECT custId FROM dbo.Winners WHERE custId = @rnd ) 
   
 INSERT INTO dbo.Winners (custId) VALUES (@rnd)
 
 END
 
 SELECT custId [winners] FROM dbo.Winners

At a glance, this procedure looks okay. We are creating a table with one column to hold the customer IDs of ten random winners from a customer pool (defined in dbo.Customers by unique custId) of 1,000,000. The procedure itself uses the RAND() function, which is a pseudo-random number generator (PRNG) built into SQL Server. This procedure runs and outputs ten unique winners, uniformly distributed across the customer base (with a negligible difference from true random caused by the PRNG). A typical test run from a harried and hurried developer will consist of executing this on a test box. It returns ten results, doesn't cause an error - he (or she) pushes it to live. What could go wrong? Let's map it out using our decision-tree / binary-tree-esque method, and also add annotations to show what could go wrong with each statement (which helps justify why we can't assume every statement will be successful, with the exception of those statements for which we have no direct control - the improbable paths, mentioned earlier):

Decision Tree for Previous Set of Random Logic

As you can see, despite a read-through of the batch looking good, there are some potential failure points marked as 'UNCONTROLLED EXCEPTION' where SQL Server picks up responsibility for parsing and failing the statement (for different reasons) rather than our code. I've circled in yellow the most important ones, and these are the points we can address by amending the script.

Let's take the first one - the CREATE TABLE statement. Let's think about what can go wrong with it - suppose the table already exists, or suppose the user doesn't have permissions to create the table. These are the two most likely possibilities. Let's mitigate by planning for this in the code:

 SET XACT_ABORT ON
 BEGIN TRY
  IF EXISTS ( SELECT t.name FROM sys.tables t
     INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
     WHERE s.name = 'Winners' AND s.name = 'dbo' )
  BEGIN   
   CREATE TABLE dbo.Winners (
    custID INT NOT NULL )
  END
  ELSE
   RAISERROR('Table already exists!',16,1) -- optional here whether you add a DROP TABLE statement.
 END TRY
 BEGIN CATCH
  IF XACT_STATE() <> 0
   BEGIN
    -- do something here, i.e. email DB support.  Build the error stack as below:
    DECLARE @msg VARCHAR(MAX)
    SET @msg = 'Error occurred: Error ' + CAST(ERROR_NUMBER() AS VARCHAR(50)) + ', Severity: '
    SET @msg = @msg + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ', State: ' + CAST(ERROR_STATE() AS VARCHAR(10))
    SET @msg = @msg + ', Message: ' + CAST(ERROR_MESSAGE() AS VARCHAR(8000))
   END
  ELSE
  BEGIN
   -- this should never get called, but should be here for logical completeness.
   RAISERROR('Unspecified error occurred.',16,1)
  END
 END CATCH

Now, as you can see, we've mitigated against the failure of CREATE TABLE. We have also unintentionally protected against failure in the next significant point of failure, the WHILE clause, which relies on SELECT COUNT(*) FROM dbo.Winners to return a result. The most obvious potential failure here is that dbo.Winners doesn't exist. Logically speaking, it is irrelevant what the result from dbo.Winners is, even if NULL, just that it parses and executes correctly. But our correction to the CREATE TABLE in the code snippet below mitigates also against SELECT COUNT(*) FROM dbo.Winners failing, since the table existence is tested.

Let's move on to our next potential failure point, IF NOT EXISTS ( SELECT custId FROM dbo.Winners WHERE custId = @rnd ). What could go wrong here?

dbo.Winners doesn't exist
custId is not a valid column
custId and/or @rnd are non-compatible data types
No permissions to query dbo.Winners

The first potential error is already mitigated against, as described. The second is a valid concern, likewise the third and fourth. Let's address the fourth point first, since it is a predicate for testing the other two. Should the user not have permissions to SELECT from the table, then an uncontrolled exception will occur. Happily there is a built-in system function we can use called sys.fn_my_permissions that takes two parameters, 'securable' and 'securable_class'. We can use it like so (note there's different ways of doing this, you could use nested IF statements, you could add this test as a pre-test to the main script - abstract it entirely - you could make an argument that if CREATE TABLE is granted and you have created Winner, you will automatically have access to SELECT from the table anyway. The solution below is for logical completeness - the aim of this article - and it's up to you whether you want to test to this extent):

 ...
 SET @rnd = ROUND((RAND() * 1000000),0)
 DECLARE @hasPermissions TINYINT
 SET @hasPermissions = ( SELECT COUNT(*) FROM sys.fn_my_permissions('dbo.Winners','OBJECT')
 IF @hasPermissions > 0
 BEGIN
  IF NOT EXISTS ( SELECT custId FROM dbo.Winners WHERE custId = @rnd ) 
   INSERT INTO dbo.Winners (custId) VALUES (@rnd)
 ...

This leaves us with mitigating whether custId is a valid column, and whether it's a valid comparable data type to @rnd. We already know we have defined @rnd as a sequence of nested system functions operating on RAND(), a PRNG. There is the possibility that we have mis-coded this so that @rnd is a floating-point number rather than an INT, after our transformations. We can ensure it becomes an INT by an explicit CAST:

...
 BEGIN TRY
  IF NOT EXISTS ( SELECT custId FROM dbo.Winners WHERE custId = CAST(@rnd AS INT) )
  ...
 END TRY
 BEGIN CATCH
  IF XACT_STATE() <> 0
   BEGIN
    -- do something here, i.e. email DB support.  Build the error stack as below:
    DECLARE @msg VARCHAR(MAX)
    SET @msg = 'Error occurred: Error ' + CAST(ERROR_NUMBER() AS VARCHAR(50) + ', Severity: '
    SET @msg = @msg + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ', State: ' + CAST(ERROR_STATE() AS VARCHAR(10))
    SET @msg = @msg + ', Message: ' + CAST(ERROR_MESSAGE() AS VARCHAR(8000))
   END
  ELSE
  BEGIN
   -- this should never get called, but should be here for logical completeness.
   RAISERROR('Unspecified error occurred.',16,1)
  END

We can also add a check on the column data type in dbo.Winners by adding a SELECT from sys.columns joining on object_id to sys.tables name column, although we won't - because we explicitly defined this column in our CREATE TABLE, so the probability of it 'spontaneously' becoming a different data type without an explicit ALTER TABLE is near-zero (unless we are exceptionally bad coders who can't tell the difference between an INT and a FLOAT). This means it 'drops off our radar' and becomes an event of negligible probability and can be safely ignored.

The final SELECT custId FROM dbo.Winners line could fail, but only if a) permissions on the table are not granted to the user or b) the table doesn't exist. We have addressed both these concerns by our tests and code modifications above, so these can also be ignored.

Testing of the INSERT has been mitigated already by inclusion of the line above in the TRY block, which automatically covers the error handling of the INSERT should something go wrong - for example, no permissions to INSERT, or row duplication, so it won't be addressed specifically in the example, although feel free to amend the code to do this.

If you have been following along with the code changes, you will be aware that now we have a different problem - the code is unwieldy, overly large and has the same block of error catching duplicated twice, and it doesn't explicitly roll back either. Here's the complete code, without refactoring, according to our changes above, tidied up a little:

 SET XACT_ABORT ON
 BEGIN TRY
  IF NOT EXISTS ( SELECT t.name FROM sys.tables t
     INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
     WHERE s.name = 'Winners' AND s.name = 'dbo' )
  BEGIN   
   CREATE TABLE dbo.Winners (
    custID INT NOT NULL )
  END
  ELSE
   RAISERROR('Table already exists!',16,1) -- optional here whether you add a DROP TABLE statement.
 END TRY
 BEGIN CATCH
  IF XACT_STATE() <> 0
   BEGIN
    -- do something here, i.e. email DB support.  Build the error stack as below:
    DECLARE @msg1 VARCHAR(MAX)
    SET @msg1 = 'Error occurred: Error ' + CAST(ERROR_NUMBER() AS VARCHAR(50)) + ', Severity: '
    SET @msg1 = @msg1 + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ', State: ' + CAST(ERROR_STATE() AS VARCHAR(10))
    SET @msg1 = @msg1 + ', Message: ' + CAST(ERROR_MESSAGE() AS VARCHAR(8000))
    RAISERROR(@msg1,16,1) WITH LOG
   END
 END CATCH
 DECLARE @rnd FLOAT
 WHILE ( SELECT COUNT(*) FROM dbo.Winners ) < 10 
 BEGIN
  SET @rnd = ROUND((RAND() * 1000000),0)
  DECLARE @hasPermissions TINYINT
  SET @hasPermissions = ( SELECT COUNT(*) FROM sys.fn_my_permissions('dbo.Winners','OBJECT') )
  IF @hasPermissions > 0
  BEGIN
   BEGIN TRY
   IF NOT EXISTS ( SELECT custId FROM dbo.Winners WHERE custId = CAST(@rnd AS INT) )
    INSERT INTO dbo.Winners (custId) VALUES (@rnd)
   END TRY
   BEGIN CATCH
    IF XACT_STATE() <> 0
     BEGIN
      -- do something here, i.e. email DB support.  Build the error stack as below:
      DECLARE @msg2 VARCHAR(MAX)
      SET @msg2 = 'Error occurred: Error ' + CAST(ERROR_NUMBER() AS VARCHAR(50)) + ', Severity: '
      SET @msg2 = @msg2 + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ', State: ' + CAST(ERROR_STATE() AS VARCHAR(10))
      SET @msg2 = @msg2 + ', Message: ' + CAST(ERROR_MESSAGE() AS VARCHAR(8000))
      RAISERROR(@msg2,16,1) WITH LOG
     END
   END CATCH
  END
  ELSE
  BEGIN
   RAISERROR('No permissions to query the dbo.Winners table!',16,1)
  END
     
  END
  SELECT * FROM dbo.Winners

So now we have created our code with failure points mitigated against (i.e. all logical paths end in a known and controlled output, excepting negligible events), we can reduce the code, refactoring it to be more efficient and readable. For this, I'm going to use BEGIN TRANSACTION and END TRANSACTION, and tidy up the code, correcting syntax errors and the like as I go. Here's the final, working code, with inbuilt tests:

 BEGIN TRANSACTION
 SET XACT_ABORT ON
 BEGIN TRY
  IF NOT EXISTS ( SELECT t.name FROM sys.tables t
     INNER JOIN sys.schemas s ON t.schema_id = s.schema_id
     WHERE s.name = 'Winners' AND s.name = 'dbo' )
  BEGIN   
   CREATE TABLE dbo.Winners (
    custID INT NOT NULL )
  END
  ELSE
   RAISERROR('Table already exists!',16,1) -- optional here whether you add a DROP TABLE statement.
 END TRY
 BEGIN CATCH
  IF XACT_STATE() <> 0
   BEGIN
    RAISERROR('Transaction failed.',16,1)          
     DECLARE @msg1 VARCHAR(MAX)
     SET @msg1 = 'Error occurred: Error ' + CAST(ERROR_NUMBER() AS VARCHAR(50)) + ', Severity: '
     SET @msg1 = @msg1 + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ', State: ' + CAST(ERROR_STATE() AS VARCHAR(10))
     SET @msg1 = @msg1 + ', Message: ' + CAST(ERROR_MESSAGE() AS VARCHAR(8000))
     RAISERROR(@msg1,16,1) WITH LOG
   END
 END CATCH
 DECLARE @rnd FLOAT
 WHILE ( SELECT COUNT(*) FROM dbo.Winners ) < 10 
 BEGIN
  SET @rnd = ROUND((RAND() * 1000000),0)
  DECLARE @hasPermissions TINYINT
  SET @hasPermissions = ( SELECT COUNT(*) FROM sys.fn_my_permissions('dbo.Winners','OBJECT') )
  IF @hasPermissions > 0
  BEGIN
   BEGIN TRY
   IF NOT EXISTS ( SELECT custId FROM dbo.Winners WHERE custId = CAST(@rnd AS INT) )
    INSERT INTO dbo.Winners (custId) VALUES (@rnd)
   END TRY
   BEGIN CATCH
    IF XACT_STATE() <> 0
     -- do something here, i.e. email DB support.  Build the error stack as below:
     DECLARE @msg2 VARCHAR(MAX)
     SET @msg2 = 'Error occurred: Error ' + CAST(ERROR_NUMBER() AS VARCHAR(50)) + ', Severity: '
     SET @msg2 = @msg2 + CAST(ERROR_SEVERITY() AS VARCHAR(10)) + ', State: ' + CAST(ERROR_STATE() AS VARCHAR(10))
     SET @msg2 = @msg2 + ', Message: ' + CAST(ERROR_MESSAGE() AS VARCHAR(8000))
     RAISERROR(@msg2,16,1) WITH LOG
     ROLLBACK TRANSACTION
   END CATCH
  END
  ELSE
  BEGIN
   RAISERROR('No permissions to query the dbo.Winners table!',16,1)
  END             
 END
 SELECT * FROM dbo.Winners  
 COMMIT TRANSACTION

As you can see, we have defended against a list of potential problems with all portions of the query without significantly extending the length of the original code, and dramatically reduced the likelihood of the script causing unintended behavior on the database by introducing error-catching, plugging logical path (decision tree) gaps caused by decision constructs, and transformed the batch to be more resilient. There are still existing problems - for example, what if dbo.Winners already has ten rows? - but these don't cause exceptions, and our original script did not take this into account either. In this example, the procedure is meant to be run once only, and perhaps the table truncated and re-ran by another process.

Next Steps

I hope this brief article has given an insight into how to go about refactoring for resiliency, and strengthening code to make it less brittle and better able to withstand failure. You can find further reading on related subjects below. I am open to comments and questions, please leave them below and I'll respond to you as soon as I can.
Check out these additional tips:

About the author

Derek Colley is a UK-based DBA and BI Developer with more than a decade of experience working with SQL Server, Oracle and MySQL.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Monday, February 18, 2013 - 11:56:07 AM - Derek Colley	Back To Top (22241)
Well reasoned! Thanks for the comment, you're quite right. The presence of the table is handled in the code so an exception will be thrown.

Thursday, February 14, 2013 - 1:30:40 PM - Gene Wirchenko	Back To Top (22124)
Near the end: "There are still existing problems - for example, what if dbo.Winners already has ten rows? - but these don't cause exceptions, and our original script did not take this into account either." If the winners table already has ten rows, then the winners table already exists and thus, the error that the winners table already exists will be thrown. Or am I missing something?

Avoiding SQL Server Query Disaster through Refactoring for Resilience

Problem

Solution

Next Steps

About the author

Comments For This Article