4 Factors When Choosing to Automate SQL Server Tasks

By: Tim Smith | Comments | Related: More > Database Administration

Problem

I work in an environment which emphasizes automation and I’m noticing a trend of developers who are automating tasks, which sometimes cause an increase in time to solve a problem or add complexity that creates further problems. Also, some of our automation around manual data entry from clients who want to continue data entry don't add speed in our environment, as the bottleneck is still the manual data entry from clients. When we consider whether it’s worth our efforts to automate tasks, what are some considerations that you would recommend?

Solution

In this tip, we'll look at four angles when we may or may not want to consider automating a task. This offers a starting point for considering when we may want to automate or not and may not be applicable rules for every environments. Exceptional environments exist, so the key question to consider is does automation add value in our environment when we consider these four points? These points will help us consider some points and they may not apply.

Automation for saving time

Image that we logged into SQL Server, checked the index fragmentation on each index per table in the properties, and if it met a certain threshold, we executed rebuilding the index. This does not scale as we have more tables, databases and servers and wastes time when we could automate this entire process, for an example with this tip that uses a configuration table (which uses a setting from a configuration table that uses Ola's automation). In this case, automation saves us time, as well as scales with the configuration table for new servers and settings which further saves us time as we manage more resources.

However, the inverse of this is true - if automation does not save us time, such as a task where we have to manually enter data one way or another, automation may not help us. We've all run into "automation" that actually takes more time or adds complexity over a manual step.

In the below sample script, we "automate" a table creation in PowerShell where we can enter any server, database, uses integrated security, any table we want, and run the script. This doesn't save time if the contrast is simply logging into the server and executing the script. If we wanted to create a copy of this table on 200 servers and 2 databases on each of those servers and we had a configuration table like the above example with all those servers and databases, our script may save us time (an interface alternative is grouped servers through Management Studio). In addition to this example, if developing the automation to solve a problem consumes hours of time that is never recovered by automating the task, that may be another time to avoid developing automation, such as spending 10 hours developing an automation script that saves us at most 15 minutes a decade.

Function Execute-Sql {
    Param(
        [Parameter(Mandatory=$true)][string]$server
        , [Parameter(Mandatory=$true)][string]$database
        , [Parameter(Mandatory=$true)][string]$command
    )
    Process
    {
        $scon = New-Object System.Data.SqlClient.SqlConnection
        $scon.ConnectionString = "Data Source=$server;Initial Catalog=$database;Integrated Security=true"
        
        $cmd = New-Object System.Data.SqlClient.SqlCommand
        $cmd.Connection = $scon
        $cmd.CommandTimeout = 0
        $cmd.CommandText = $command

        try
        {
            $scon.Open()
            $cmd.ExecuteNonQuery()
        }
        catch [Exception]
        {
            Write-Warning $_.Exception.Message
        }
        finally
        {
            $scon.Dispose()
            $cmd.Dispose()
        }
    }
}

$table = "CREATE TABLE myTable(
    Id INT,
    SavedKey VARCHAR(100),
    SavedValue VARCHAR(2000)
)
"

Execute-Sql -server "OurServer" -database "OurDatabase" -command $table

Automation to replace repetitive action when there's only one or a small set of steps

If we experience ASYNC_NETWORK_IO waits on a web server to database server communication and we find that our only option when this wait happens due to a limit in physical resources and a transaction load that cannot be reduced is to restart the web server, we can automate this step by setting a measurement standard and automating a restart when that measurement threshold has been met or crossed. This assumes that this is the only option to correcting this problem when we face it, as if we had other alternatives, we would want to investigate them. In some cases, a problem may only have one solution due to our resource limitations, or due to the problem itself (such as a log reader being off when it should not for replication or CDC).

When we do one solution multiple times and this action can be automated, in most cases, it’s worth our time to automate this to self-resolve. If we need to warn people ahead of the solution, we can automate that step as well – emailing an alert that something will be restarted, a reader will be enabled, etc.

Automation of data validation, if manual entry is required

In some cases, data entry cannot be avoided - for an example, if we use configuration tables or files, some of these values may be required to be manually entered. Data entry invites errors even if that's not the intent. We can build business rules around data entry, but as long as we have data entry in our application, database, or ETL layer - and it's a reality - we will sometimes get erroneous values. If we don't need data entry - such as all our databases always end with the word "OurDatabase" we can automate this by having it added to what's entered. Even if the user-entry part is invalid, we can avoid errors by either automating the part that does not change or validating the input to prevent erroneous values. In the below example, a stored procedure enters values into a table, only after two of those values have been validated - the date and unique key, otherwise an error is thrown. The date throws a custom error, the table definition prevents a duplicate key. Depending on where the data entry occurs and where it is saved, we may have a higher layer (front-end) that can prevent values, such as a JavaScript popup box invalidating a value before it even enters the back-end.

CREATE PROCEDURE stpValveImport@ValveID VARCHAR(11), --unique clustered will throw error if we use existing
@ValveInput INT,
@ValveOutput INT,
@MeasurementTime DATETIME
AS
BEGIN
   IF @MeasurementTime < GETDATE()
   BEGIN
      INSERT INTO tbValveImport (ValveId,ValveInput,ValveOutput,MeasurementTime,ImportDate,ImportFlag)
      VALUES (@ValveID,@ValveInput,@ValveOutput,@MeasurementTime,GETDATE(),0)
   END
   ELSE
   BEGIN
      RAISERROR('Error stpValveImport: date validation failed.  The import date cannot be earlier than the measurement date.',16,50010)
   END
END

EXEC stpValveImport '04e42273c2',90,7,'2017-01-01' --duplicate key error, 04e42273c2 already exists
EXEC stpValveImport '05e41173cc',90,7,'2018-03-03' --invalid date
EXEXEC stpValveImport 'abcdef7635',90,7,'2017-01-01' --valid entry

Automation when security is not a key concern or when automation can be securely and strictly defined

Convenient is a popular sales word and automation provides convenience. Unfortunately, convenience is seldom secure and this matters in some data environments. We may be able to automate solutions to auto-resolve problems, or provide alerts, but both of these could be on servers or databases where we have private data and we don’t want information about these servers sent anywhere. It is incredibly convenient to store all credentials in one encrypted location with access to everything, so that we can automate alerting, building, and automatic resolution. However, as many people discovered with the Spectre and Meltdown security flaws involving hardware (which existed over a decade before people knew), just because we assume something is secure, doesn’t mean it is.

If I’m making the call on security, any data with PII information will live in a completely different area than everything else and will be heavily restricted in terms of access, including automation. While it will cost resources to have some manual activities related to this environment, such as local monitoring, the cost of a security breach is too much and this lack of automation for some activities is a preventative measure. While it’s hard for many DBAs and developers to hear, I predict that we’ll learn in a few decades that many of our current “best” security practices are currently undermined, like we learned with Spectre and Meltdown. What we don’t know can hurt us and we should take extra precaution in some cases even above the best practices because the data are too important.

This isn’t to say never automate. Many environments do not store PII data and I expect this to continue. In addition, some company's shard or partition their PII data elsewhere, so that the majority of their environment do not hold risky data, which allows us to use automation for non-PII data, and have more manual activities and strictly defined automation on the PII data.

Next Steps

While I added security last, the first consideration before automating a task is how to strictly define the automation that complies with our security need. If you're not in charge of security, check with your security team about whether the decision fits with the company's best practice in security.
If we can automate a task which saves us time, reduces errors, or automatically resolves troubleshooting that requires one or very few steps, this may add value to our environment.
If the automation doesn't add value because another bottleneck still limits us, or it adds complexity, eliminate it.

About the author

Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips