Five Things I Wish I Knew When I Started My SQL Server Career
As we gain experience in the technology we use, we inevitably learn things along the way that could have been useful long before we got that far. I certainly learned a lot of things the hard way, as I'm sure many of us do, and in my case that has definitely led to me implementing some sub-par solutions along the way.
In this tip, I wanted to point out a few of the concepts that I wish I had spent more time researching earlier on, and provide my reasoning and some links to further reading.
1. TESTING YOUR BACKUPS IS NOT ENOUGH
You may have heard this before, but a very appropriate quote I hear repeated often is, "You should have a restore plan, not a backup plan." So true, and I've seen so many customers get bitten in scenarios where they are validating that their backup jobs are succeeding, and that is it.
There is not much point in having a .bak file on hand if you have no idea whether it can even be restored, or how long that restore will take on a secondary or disaster recovery server. There are various ways to automate this process, but a good start would be to have periodic disaster recovery drills, where you pretend that the production server is offline, and then do whatever you need to do to stand that system up elsewhere (stopping at the re-point your apps step).
At my last job, I went so far as to actually use a log shipping secondary as a production reporting server. Part of the motive was to reduce read contention on the primary OLTP instance, but a much more important side effect was that this would actively be testing the restore of every single backup, and making production services rely on those restores succeeding. (In the event of failures, we could either revert to the previous day, or just point the reporting application to the OLTP server temporarily.)
Some further reading on this topic:
- Checking to make sure a SQL Server backup is useable
- Validate a SQL Server Backup can be Restored
- Verifying Backups with the RESTORE VERIFYONLY Statement
- Automate SQL Server RESTORE VERIFYONLY Process with Maintenance Tasks
2. INDEXING IS A BALANCE
Often I find that, over time, people will create an index for every problematic query they ever troubleshoot. I've been guilty of that, but have seen some extreme examples. My personal record has been a customer's table with 215 non-clustered indexes – pretty impressive when there were only around 70 columns. While this is a quick and easy way to solve a short-term problem with, but happens because it is a quick way to solve a performance problem with an individual, isolated query.
The problem is that every index takes space on disk and in memory, and adds overhead to the write portion of the workload, since every index needs to be maintained along with the underlying table data. This means that for every insert, update, and delete, all of the indexes have to be modified as well. A system filled with many, often redundant indexes is going to have performance issues even with balanced workloads, and will be more prone to blocking and deadlocking in high-write scenarios.
There are dynamic management views and other indicators for discovering indexes that are missing and would help performance, but there are also ways to find redundant and unused indexes that are more trouble than they're worth.
Some further reading on this topic:
- Using SQL Server DMVs to Identify Missing Indexes
- Be Careful with Key Order in SQL Server Missing Index Recommendations
- Avoid Index Redundancy in SQL Server Tables
- Identify SQL Server Indexes With Duplicate Columns
- Discovering Unused Indexes
- Deeper insight into used and unused indexes for SQL Server
- Find unused SQL Server indexes with Policy Based Management
- Safely Dropping Unused SQL Server Indexes
3. DATES CAN BE A PAIN
I grew up in Canada, so I had a leg up on the disaster that is the MM/DD/YYYY format for expressing dates. But little did I know how many other technical issues and gotchas I would come across when dealing with date/time data in SQL Server in my early career. Some of the things that took my younger, more stubborn self to get straight:
- BETWEEN IS AMBIGUOUS (AT BEST) - I used to write queries
that paraphrased "give me all the data between the beginning and end of February."
February is a thorny example because of leap years, but even that aside, what
do we mean by the "end" of the month? If the data is represented with datetime,
it's the 28th or 29th at 23:59:59.997. If it's smalldatetime,
it's 23:59:00. Datetime2? As late as 23:59:59.9999999. And date is just the
28th or 29th at midnight. It's very difficult to accurately
determine the end of a period when the underlying data type can change;
EOMONTH()doesn't help, because it returns the last day of the month at 12:00 AM. I learned after some time that it's much easier to determine the beginning of the next period than the end of the current one. So instead of:
WHERE column BETWEEN '20190201' AND '2019022[8|9] 23:59[:59.99[7|99999]]';
WHERE column >= '20190201' AND column < '20190301';
Again, if you know the beginning of the range, the beginning of the next range is very easy to find, and doesn't have any of the issues involved with being sure you've correctly calculated the end of the range.
- SHORTHAND IS ALSO AMBIGUOUS - Early in my career, I took a lot of pride in reducing character count. I loved that I could say:
EXEC sp_configure 'show adv';
EXEC sp_configure 'show advanced settings';
I mean, look at all those characters I saved, and how much time I got back as
a result! I think back on that sarcastically whenever I see people type
DATEDIFF(YYYY, <col1>, <col2>). In the first case, they
typed two cryptic characters to avoid typing three (
DAY), and in the
second case, they typed four characters that aren't a word to avoid typing four
characters that are (
YEAR). Making code self-documenting is a huge
plus, especially when the cost is negligible or zero. But in this case establishing
the better practice of typing the whole word instead of the shorthand can help avoid
issues in the future that can be hard to test against. For example, let's say you
want to write code that returns the week number or the year, and you think you can
use shorthand. You don't need to look it up, because what could be simpler, right?
DECLARE @d date = '20190204'; SELECT DATEPART(W, @d), DATEPART(Y, @d);
Most people would probably expect those to return
but instead they return
35. Why? Because
is shorthand for
is shorthand for
YEAR. If you ran your
W based on
GETDATE(), and you happened to
run it on February 8th instead
of the 4th, your tests would
have passed (at least under the default
SET DATEFIRST 7).
There are some other issues I've come across with dates and times, including difficulties with time zones, generating date ranges on the fly, memorizing convert style numbers, removing time from dates, and the horror that is Daylight Savings Time. For further reading, see:
- SQL Server DateTime Best Practices
- Creating a date dimension or calendar table in SQL Server
- Build a cheat sheet for SQL Server date and time formats
- Handle conversion between time zones in SQL Server Part 1 | Part 2 | Part 3
- FORMAT() is nice and all, but…
- What is the most efficient way to trim time from datetime?
4. GAPS ARE UNIMPORTANT
Early in my career, I spent more time than I care to admit coming up with obscure
ways to use all of the available numbers in an
IDENTITY column. A row
gets deleted? A transaction gets rolled back? Let's put a serializable lock on the
table, find the lowest unused number using a disastrous self-join, and insert that
SET IDENTITY_INSERT ON.
I try not to imagine now how much more I could have accomplished in place of
(a) engineering those solutions in the first place, and (b) dealing with persistent
concurrency issues afterward. It's more a business problem than a technical one,
as the requirement is usually driven by the business. But often it's because it's
what they think they want, not what they actually need. I know that there are some
scenarios where gaps do matter (e.g. invoices), but those edge cases probably shouldn't
be implemented with an
IDENTITY column anyway. In most cases, the value
produced by an
IDENTITY column should be a meaningless, surrogate value
that has zero meaning to end users, and therefore gaps should also become meaningless.
There was a bug early in SQL Server 2012 that made these gaps an issue (speeding
up exhaustion of values), but other than that scenario, you really shouldn't be
concerned about missing numbers.
In any case, some further reading about
- Gaps in SQL Server Identity Columns
- SQL Server 2017 IDENTITY_CACHE feature
- Insert missing SQL Server IDENTITY column values using SET IDENTITY_INSERT command
5. SQL SERVER IS OPTIMIZED FOR SETS
As a developer quite green to the database world, I remember trying many times to emulate a for loop in T-SQL (and usually a nested for loop, to boot). The line of thinking we adopt from procedural code is something like, "I need to loop through these things, and perform an action against each thing, one at a time."
Of course, you can accomplish row-by-row processing in SQL Server, using
DECLARE CURSOR (and let me be clear, there really is
no material difference here; a while loop is just a cursor that doesn't say so,
though you have some easy behavioral and performance flexibility with explicit cursors).
In most cases, row-by-row processing will throttle performance. There are some exceptions,
like running totals before SQL Server 2012, but as a general rule, your line of
thinking should be more like, "I need to perform an action against all of these
things, at the same time."
Further reading on set-based operations:
- SQL Server Convert Cursor to Set Based
- SQL Server Cursor Example
- What impact can different cursor options have?
Those are just a few examples from my own experience. If you're new to SQL Server, there are many tips, tutorials, and webcasts right here on this site that can help you avoid some future "I wish I knew back then" moments. Here are several on-demand webinars:
- Surviving The First Week As The New DBA
- SQL Server Database Administration Tips, Tricks, and Best Practices
- Common SQL Server development mistakes and how to avoid them
- Managing SQL Server Source Code
- DBA Code Reviews Done Dirt Cheap
- SQL Injection: What it is, how it happens and how to stop it?
- Don't Be the Next Data Breach - How to Secure SQL Server in the Cloud
- Introduction to SQL Server Indexing
- Fundamentals of SQL Server AlwaysOn Availability Groups
- Understanding Disaster Recovery Options for SQL Server
- SentryOne Tools for Productivity and Performance on Physical, Virtual, and Cloud Environments
- An Introduction to Auditing & Compliance for SQL Server
About the author
View all my tips