By: Tim Smith | Last Updated: 2014-10-24 | Comments (1) | Professional Development Career Planning
What are some good starting points for people interested in becoming a SQL Server database developer or people interested in SQL Server database development?
SQL Server database development will generally involve one (or a combination) of the three:
- ETL development. Sometimes you have to fill in this blank because the position will be described as a "database developer" when they're really looking for an ETL developer.
- Database Administrator. While we may know the difference, the job description may not reflect this difference.
- Architect. Depending on who you ask, this is what I think of when I hear developer, as the position involves building, maintaining and (or) troubleshooting the existing database structure.
For this tip, I'll exclude database administration and we'll assume that this environment wants you to architect a database and ETL solutions. What skills do you need to have?
The ETL Side
On the ETL side of things, developers should know how to get data into the database, manipulated to a format the client wants to see, and out of the source to a viewable format (we'll assume all UX is done by someone else). We can get data from a variety of sources and in many different formats and this can quickly grow into a problem if we don't focus on automation (the same is true with loading data). In the transform step, we must be familiar with our data in order to know how to transform it, what questions to ask that the end user may want to know, and what other sources of data might exist that are better (see the point below on data for more related information). "Data analysis" is assumed in the transform step, and at the heart of a developer is someone who's familiar with data and how to use it (see Dig Into Your Data below this).
Expect that ETL positions will sometimes involve interactions between databases, such as MongoDB to SQL Server, Oracle to SQL Server, or vice versa, etc. Moderate to large environments tend to use multiple platforms and knowing how to interact between databases can help with ETL.
Dig Into Your Data
One of my favorite interview questions to ask is, "Here's a simple data set" and I'll provide a generic data set, "tell me some things about it". I am looking for what the person immediately notices before asking questions because some questions, regarding data, can be answered by looking at data. Experienced developers know that answers to questions from others (especially concerning data) aren't always true. How many times have we heard, "Oh you'll never find that value in the data set because ..." and then we show the same person that it appears 30% of the time in the data set. Good developers must know things like data types, averages, minimums, repeating characters, data structure, lengths, differences between each value, etc. These tell us how we define our data and provide us with the tools to structure our processes around these data; for instance, mining for verbal changes will differ from measuring the outliers in a data set and predicting outcomes for each.
An immediate value add to any organization is comparable data - what and who offers a similar data set? As David Hume might suggest, how can we make assumptions with one data set, if we don't have a comparable data set to look at (that should be parallel) and compare to? Consider the lessons from the housing crash of 2008-2009 - the assumption that housing prices never fall is completely invalidated when looking at a few data outside data sets, for instance, how do housing prices rise when oil prices are also rising, incomes aren't rising, and the distribution of financial demand predicts incomes in high priced housing areas will fall? And yet some of the smartest people made this assumption. Take it from David Hume: always have comparable, related and parallel data, otherwise you have empty assumptions.
Know Your Client
Many years ago, I stumbled across a huge trading opportunity by identifying a pattern in several markets that appeared to be unknown by other traders. Unfortunately, I didn't know advanced development, efficient coding, or anything that would make famous developers proud, but I knew exactly what I wanted to accomplish and how to get there with (very) inefficient code. I also knew that trading opportunities do not always stay around: it's only a matter of time before the market recognizes securities are underpriced, so with the weekend I had before the markets opened on Monday, I built a very inefficient algorithm to confirm my trading idea. It worked and it turned out to be very profitable.
Why do I tell this story? Because we've all seen developers spend hours arguing about solutions, when the client has a sense of urgency - this needs to be done ... now, even if it's inefficient. Performance matters except when it doesn't, and features matter except when they don't; most clients will provide feedback if those are their greatest concern, or if you have time, definitely optimize and build on what you've created. The two most common approaches to development are:
- Solve the problem.
- Automate the solution.
- Improve the performance of the solution.
- Solve the problem.
- Improve the performance of the solution.
- Automate the solution.
In a perfect world, we hit step three every time. In the real world, we don't. Try telling a trader who can make $100 million off his query to identify a trade that his query needs to be SARGable; the math that developers do often involves thinking like if I spend a few hours tweaking this query, I can save fifteen seconds each time I call it whereas some clients think if I execute this query once a month on Sunday, I can make $100 million dollars. The best developers know their clients, how they think, and don't over apply development to problems, if they don't see it becoming a problem, if it isn't currently a problem, or if their limited by time. On the other hand, if you see something is done completely wrong, let the client know and under promise and over deliver the solution. I've been blessed to only work with clients concerned with accuracy, though I've heard horror stories where developers, who raised cautions about certain data assumptions, were ignored.
Assemble Your Minions
Organizations have a tendency to log issues or pay for software that does. This can provide a useful tool in some cases and also this can lead to a flood of data (leading to a noise problem), which may limit an organizations ability to use the tool. In addition, the approach of logging problems can be reactive - you run across the problem, you troubleshoot it, and you clear the log. Instead, note and correct the error and since it is fresh on your mind prevent it from ever happening again, or build architecture that will alert you when it is about to happen. Patience is only a virtue when a situation involving time is beyond your control - this is when you want to act immediately to ensure it doesn't happen again. In addition, you don't want other developers to learn to turn off the noise - are these errors actually errors, or are they noise? Consider the well-known story of the boy who cried wolf - if your log is crying wolf, whether you like it or not, it will eventually be ignored. Anyone can build a logging application; the challenge is creating one that doesn't generate noise and one that debugs most issues for you.
In addition to logging, handle issues with data inside your development. Waiting for QA to validate data means allowing bad data to go through an entire development process when a large portion, if not all, of that data could be validated before being added to a database, saving space as well as further development efforts. For instance, if I was importing SPY data, I would place an automated check on whether the price was (1) numeric and (2) above 0, (3) within a numeric range provided by comparable data, and (4) an alert with outlier changes in movement detected. The more important the value, the more checks and alerts should be built around it.
Many developers have a favorite way of doing things, and we can generally use our favorite tools. In some environments, these tools may contradict other operations and we should be able to use these other tools in these situations. You can find many great articles on the internet about DBAs vs. Developers, or internet debates about the best way of doing things. Debating and arguing consume time, in addition to the fact that others don't always know your clients or your environment. Worry less about debating and focus more on learning alternatives you find useful. Learning other ways of doing things will benefit any environment you're in because as you learn new ways of doing things, you'll also learn faster and more accurate techniques (in addition, integration is the key to success and integration involves seeing patterns among areas that appear to not overlap). In addition, SQL Server 2014 is different from SQL Server 2008R2, which is a factual way of saying that technology changes, so something slow and imprecise today may not be as slow and imprecise tomorrow.
Ask Better Questions
One key to effective learning, as well as developing creative solutions, involves asking better questions and challenging the current approach, even if it appears to be a best practice. Consider the following questions:
- Why does this always happen?
- How can I prevent this from happening?
- What should I do?
- I wonder what would happen if I ...
One set of questions opens the door innovation, while the other set opens the door to confusion and frustration. Consider that the best practice of using energy at one point in history didn't involve electricity and the invention of electricity came from questioning the current best practices. Despite the superlative, best isn't always best and while Keynes may be right (that we tend to follow what others are doing because it's safe - "animal spirits"), we should be careful because this mentality seldom leads to creativity. Ask questions that deepen curiosity and avoid questions that deepen frustration.
- What are your clients two largest concerns? Write them on paper.
- Write out (or write code for) three solutions for each concern.
- Is there a way to prevent these from being a concern, or a way to automatically troubleshoot these issues which require no human oversight?
Last Updated: 2014-10-24
About the author
View all my tips