By: Aaron Bertrand | Comments (5) | Related: 1 | 2 | > SQL Injection
Problem
In my last tip, I showed how to validate and protect yourself from user-supplied input instead of blindly concatenating the values into dynamic SQL strings. This included table, schema, server, database and procedure names, but there are other common scenarios that I wanted to address as well.
Solution
Some other areas where people often write dynamic SQL that is vulnerable to SQL
injection are when they allow users to pass in the entire output column list, an
IN (@list)
of string or numeric literals (usually separated by commas),
and entire WHERE
clause expressions. I have some ideas for at least
partially protecting yourself from some of these patterns, too.
Output column list
Sometimes you want to allow users to select which columns to include. So you may be expected to build this:
SET @sql = N'SELECT ' + @ColumnList + N' FROM dbo.Users;';
The column list might come in looking like this:
N'col1, col2, col3'
On the other hand if someone does something malicious, it may come in looking like this instead:
N'foo = 1; DROP TABLE dbo.Users; SELECT * '
To protect yourself against malicious behavior, you can use a split function to validate the existence of each column in the list (hopefully, though, you don't allow column names that contain things like commas or semi-colons). I like this function because it preserves the order (which might be important when reconstructing the column list, if end users care about column order):
CREATE FUNCTION dbo.ParseColumnList ( @List NVARCHAR(MAX) ) RETURNS TABLE AS RETURN ( SELECT i, c = LTRIM(CONVERT(SYSNAME, SUBSTRING(@List, i, CHARINDEX(',', @List + ',', i) - i))) FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id]) FROM sys.all_columns) AS n(i) WHERE i <= LEN(@List) AND SUBSTRING(',' + @List, i, 1) = ',' ); GO
Note: sys.all_columns was used coincidentally, and not because this problem has anything to do with columns; it just happens to be a reliably large-ish catalog view. On my system, the number of rows in this catalog view will support strings up to ~9K, which would allow over 70 columns at the maximum identifier length (128 characters). If you need to cover more than this, then you can use a cross join or a numbers table (see some other splitting techniques here).
As an example, given a comma-separated list like this:
SELECT c FROM dbo.ParseColumnList(N'col1, col2, col3');
This function would return a set like this:
c ---- col1 col2 col3
Now, you can choose to return an error if any column in the list doesn't exist in the source table:
IF OBJECT_ID(N'dbo.Users') IS NOT NULL AND EXISTS ( SELECT 1 FROM dbo.ParseColumnList(N'col1, col2, col3') AS f LEFT OUTER JOIN sys.columns AS c ON f.c = c.name AND c.[object_id] = OBJECT_ID(N'dbo.Users') WHERE c.name IS NULL ) BEGIN RAISERROR(N'Not all columns exist.', 11, 1); END
Or you can choose to build a column list anyway, ignoring any invalid columns and dropping them on the floor. You can guard against the case where this results in an empty list, too.
DECLARE @cols NVARCHAR(MAX), @sql NVARCHAR(MAX); SELECT @cols = STUFF((SELECT N',' + QUOTENAME(c.name) FROM sys.columns AS c INNER JOIN dbo.ParseColumnList(N'col1, col2, col3') AS f ON c.name = f.c AND c.[object_id] = OBJECT_ID(N'dbo.Users') ORDER BY f.i -- important to maintain same order FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N''); SELECT @cols = COALESCE(@cols, N' TOP (0) [Invalid column list]=''?'''); SET @sql = N'SELECT ' + @Cols + N' FROM dbo.Users;';
If you expose aliases to users and they're not actually selecting the underlying columns, you may want to create a table that represents that mapping, so you still have something tangible to validate against.
Another workaround, in SQL Server 2012 and above, would be to build the query with the column list supplied by the user and pass it into the new dynamic management function, sys.dm_exec_describe_first_result_set. This function is meant to show you all of the column names and data types that are output by an ad hoc query, but we can take advantage of the fact that it will expose (but not raise) an error message if the query is invalid - without having to run the actual query. A limitation is that the function only looks at the first query, so that would be easy to bypass with a semi-colon. So a quick example that eliminates semi-colons from the passed-in column list (if you allow semi-colons in column names, then you likely have different problems to prioritize):
DECLARE @cols NVARCHAR(MAX) = N'col1, col2, col3'; DECLARE @sql SYSNAME = N'SELECT ' + REPLACE(@cols, ';', '') + N' FROM dbo.table;'; IF EXISTS ( SELECT 1 FROM sys.dm_exec_describe_first_result_set(@sql, N'', 0) WHERE error_type IS NOT NULL ) BEGIN RAISERROR(N'Column list was invalid', 11, 1); END
IN (@list)
Often people will want to pass strings like this:
@list = '1,4,12' --or @list = N'Ford,Fritchey,White'
Into a query like this:
SELECT * FROM dbo.table WHERE column IN (@list);
Parameters just don't work this way in SQL Server; @list
in this
context is not an array or a set, but a single string value. So the next thing people
will often try is:
SET @sql = N'SELECT * FROM dbo.table WHERE column IN (' + @list + N');';
This works as is for the integer case, but for the string case, it doesn't work
until you manipulate the string to surround each individual value in @list
with its own string delimiters (and also handle any embedded delimiters). In both
cases, though, the code is still vulnerable to SQL injection.
There are two easy workarounds to this scenario. One is to use the same split function as above, and join against the output:
DECLARE @list NVARCHAR(MAX) = N'Ford,Fritchey,White'; SELECT t.* FROM dbo.table AS t INNER JOIN dbo.ParseColumnList(@list) AS f ON t.column = f.c;
Note: You would probably give the function a different name; I'm just reusing it here for convenience. And if you are passing in lists of different data types, you may want to consider a different function for each data type to avoid things like implicit conversions.
Another workaround (in SQL Server 2008 and above) is to use a table-valued parameter, and pass in a set in the first place. A common trend I see is people who already have set data in a structure like a DataTable in C#, and they will transform the rows into a single, comma-separated string (or worse, XML) and pass it in to SQL Server as a simple parameter, only to have to parse it and split it apart again. We can do better than that in modern versions of SQL Server (2008+). First, create a table type:
CREATE TYPE dbo.Usernames AS TABLE(Username NVARCHAR(128));
Then you can actually populate a local variable directly:
DECLARE @list dbo.Usernames; INSERT @list(Username) VALUES(N'Ford'),(N'Fritchey'),(N'White'); SELECT t.* FROM dbo.table AS t INNER JOIN @list AS u ON t.column = u.Username;
Or pass your DataTable from C# into a stored procedure parameter:
CREATE PROCEDURE dbo.FindUsers @list dbo.Usernames READONLY AS BEGIN SET NOCOUNT ON; SELECT t.* FROM dbo.table AS t INNER JOIN @list AS u ON t.column = u.Username; END GO
Then from C#, given a DataTable tvp
containing those same usernames:
DataTable tvp = new DataTable(); tvp.Columns.Add(new DataColumn("Username")) // populate DataTable using (connectionObject) { SqlCommand cmd = new SqlCommand("dbo.FindUsers", connectionObject); cmd.CommandType = CommandType.StoredProcedure; SqlParameter tvparam = cmd.Parameters.AddWithValue("@list", tvp); tvparam.SqlDbType = SqlDbType.Structured; // execute, consume results }
Yes, it's more code, but whichever method you choose is no longer vulnerable to SQL injection, and you don't have to worry about issues with string delimiters embedded inside actual string values (you may have to choose a value delimiter other than comma, though, if it's possible for a comma to be in your data and you are using the function to parse the comma-separated list).
WHERE Clause
One of the most common uses I see for dynamic SQL is to solve the "kitchen sink" problem. You want a stored procedure that can handle any number of optional parameters, let's say for a silly table like this:
CREATE TABLE dbo.[table] ( col1 INT, col2 NVARCHAR(MAX), col3 NVARCHAR(MAX), col4 DATETIME );
The first attempt to support optional parameters to search against all columns usually goes something like this (grossly simplified):
SELECT * FROM dbo.[table] WHERE (col1 = @p1 OR @p1 IS NULL) AND (col2 = @p2 OR @p2 IS NULL) AND (col3 = @p3 OR @p3 IS NULL) AND (col4 = @p4 OR @p4 IS NULL);
SQL Server has a terrible time optimizing this, because it builds a plan based
both on the first set of parameters that are populated and on the values of those
parameters. The whole point is flexibility, but reusing that same plan over and
over again for what ultimately are very different queries can lead to unpredictable
performance. You can add OPTION (RECOMPILE)
to eliminate plan reuse,
but this is sometimes a bit heavy-handed - it means you pay compilation costs every
time, even when the same parameters are sent with the same values, over and over
again. In most environments, it's likely to be a mix - for some sets of parameters,
you want a new plan every time, and for other sets of parameters that are more stable,
plan reuse is okay.
A compromise here can be dynamic SQL; you only include the clauses that are required given the parameters that are actually populated. Generally it looks like this (and I elaborate in this video):
SET @sql = N'SELECT * FROM dbo.[table] WHERE 1 = 1'; IF @p1 IS NOT NULL SET @sql += N' AND col1 = @p1'; IF @p2 IS NOT NULL SET @sql += N' AND col2 = @p2'; IF @p3 IS NOT NULL SET @sql += N' AND col3 = @p3'; EXEC sys.sp_executesql @stmt = @sql, @params = N'@p1 INT, @p2 DATETIME, @p3 SYSNAME', @p1 = @p1, @p2 = @p2, @p3 = @p3;
This is relatively safe because everything that is user input is not trusted
directly - the values are strongly typed and parameterized (please don't just concatenate
the parameter values into the SQL string). This works best with the server option
optimize for ad hoc workloads
, which doesn't cache a full plan until
the same query has been requested more than once (this prevents the plan cache from
filling up with single-use plans). You may still find it advantageous to optionally
include OPTION (RECOMPILE)
at the end of @sql
for certain
sets of parameters that tend to lead to poor performance due to parameter sniffing,
lack of updated statistics, or extreme data skew and volatility.
Now, it may be the case that you actually have to be more flexible than that
- let's say you have a LOT of columns and users can also specify the operators (>
,
<=
, LIKE
, etc.) that are used to compare the column to
the parameter values, and may have to be able to specify the same column more than
once (for example a date range query). You might break down and let people pass
in complete where clauses, with literals instead of parameters, like this:
@wc1 = N'col1 >= 5', @wc2 = N'col2 < 20', @wc3 = N'col3 LIKE ''%foo%''' @wc4 = N'col4 >= ''20140101'''
Now your query is probably going to be something more like this:
SET @sql = N'SELECT * FROM dbo.[table] WHERE 1 = 1' + COALESCE(N' AND ' + @wc1, N'') + COALESCE(N' AND ' + @wc2, N'') + COALESCE(N' AND ' + @wc3, N'') + COALESCE(N' AND ' + @wc4, N'');
You start getting into pretty dangerous territory here, because it is much harder
to validate these expressions and prevent things like semi-colons - what if they
are literally searching for LIKE '%;%'
? As long as the supported expressions
are relatively simple, you can still take steps to protect yourself, but nothing
is 100% foolproof. For example, you could parse each expression and ensure that
everything up to the first non-alphanumeric character represents a column in that
table (again, this relies on sensible naming conventions, as well as column names
that don't end in Unicode or other characters outside of the search range, and expressions
that list the column name first without applying any expressions to it):
DECLARE @cols TABLE(name NVARCHAR(MAX),expr NVARCHAR(MAX)); INSERT @cols(name,expr) VALUES (LEFT(@wc1,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc1)-1,-1)),@wc1), (LEFT(@wc2,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc2)-1,-1)),@wc2), (LEFT(@wc3,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc3)-1,-1)),@wc3), (LEFT(@wc4,NULLIF(PATINDEX(N'%[^A-Za-z0-9]%',@wc4)-1,-1)),@wc4); -- if you want to raise an error: IF EXISTS ( SELECT 1 FROM @cols AS c WHERE NOT EXISTS ( SELECT 1 FROM sys.columns WHERE [object_id] = OBJECT_ID(N'dbo.Users') AND name = c.name ) ) BEGIN RAISERROR(N'Invalid experession.', 11, 1); END -- if you just want to concatenate the ones with valid column references: DECLARE @wc NVARCHAR(MAX) = N'', @sql NVARCHAR(MAX); SELECT @wc += N' AND ' + expr FROM @cols AS c WHERE EXISTS (SELECT 1 FROM sys.columns WHERE name = c.name AND [object_id] = OBJECT_ID(N'dbo.Users')); SET @sql += N'SELECT * FROM dbo.table WHERE 1 = 1' + @wc;
Like I said, though, this has many of holes, and the more flexibility you give your users, the tougher it gets to guard against possible exploits. This pattern is a good demonstration of that. Another way to solve this problem is to pass the column name, the operator, and the literal as three different columns in a TVP. You could even have a different column for each data type to prevent conversions and invalid predicates. For example:
CREATE TYPE dbo.WhereClauses AS TABLE ( ColumnName SYSNAME, Operator VARCHAR(10) NOT NULL CHECK (Operator IN ('>','<','<=','>=','<>','=','LIKE','NOT LIKE')), BigintValue BIGINT, DatetimeValue DATETIME, StringValue NVARCHAR(MAX) );
Now you could populate a table-valued parameter using this type, for example:
DECLARE @wcs dbo.WhereClauses; INSERT @wcs(ColumnName,Operator,BigintValue,DatetimeValue,StringValue) VALUES (N'col1', '<', 20, NULL, NULL), (N'col2', 'LIKE', NULL, NULL, N'''%foo%'''), (N'col3', 'NOT LIKE', NULL, NULL, N'''%bar%'''), (N'col4', '>', NULL, '20140101', NULL);
Now, it gets a little complex, because in order to continue passing strongly-typed
parameters based on the values inserted into our table type, we need to build up
a handful of strings and execute nested dynamic SQL (sp_executesql
calling sp_executesql
).
DECLARE @sql NVARCHAR(MAX), @mastersql NVARCHAR(MAX), @p NVARCHAR(MAX) = N'@p0 INT', @plist NVARCHAR(MAX) = N',@p0 = NULL'; SET @sql = N'SELECT * FROM dbo.[table] WHERE 1 = 1'; SELECT @sql += CHAR(13) + CHAR(10) + N' AND ' + QUOTENAME(c.name) + N' ' + t.Operator + N' @p' + CONVERT(VARCHAR(11), t.ParamIndex), @p += N',@p' + CONVERT(VARCHAR(11), t.ParamIndex) + CASE WHEN t.BigintValue IS NOT NULL THEN N' BIGINT' WHEN t.DatetimeValue IS NOT NULL THEN N' DATETIME' WHEN t.StringValue IS NOT NULL THEN N' NVARCHAR(MAX)' ELSE N'' END, @plist += N',@p' + CONVERT(VARCHAR(11), t.ParamIndex) + N' = ' + CASE WHEN t.BigintValue IS NOT NULL THEN CONVERT(VARCHAR(32), t.BigintValue) WHEN t.DatetimeValue IS NOT NULL THEN N'''' + CONVERT(CHAR(8), t.DatetimeValue, 112) + N' ' + CONVERT(CHAR(12), t.DatetimeValue, 114) + N'''' WHEN t.StringValue IS NOT NULL THEN t.StringValue ELSE N'NULL' END FROM @wcs AS t INNER JOIN sys.columns AS c ON t.ColumnName = c.name WHERE c.[object_id] = OBJECT_ID('dbo.[table]'); PRINT N'-- @sql:'; PRINT @sql; SET @mastersql = N'EXEC sys.sp_executesql @stmt = N''' + @sql + N';''' + CHAR(13) + CHAR(10) + N',@params = N''' + @p + N'''' + CHAR(13) + CHAR(10) + @plist; PRINT N'-- @mastersql:'; PRINT @mastersql; EXEC sys.sp_executesql @mastersql;
In this case the query is run against dbo.[table]
(returning no
results, because the table is empty), and the printed output is:
-- @sql: SELECT * FROM dbo.[table] WHERE 1 = 1 AND [col1] < @p1 AND [col2] LIKE @p2 AND [col3] NOT LIKE @p3 AND [col4] > @p4 -- @mastersql: EXEC sys.sp_executesql @stmt = N'SELECT * FROM dbo.[table] WHERE 1 = 1 AND [col1] < @p1 AND [col2] LIKE @p2 AND [col3] NOT LIKE @p3 AND [col4] > @p4;' ,@params = N'@p0 INT,@p1 BIGINT,@p2 NVARCHAR(MAX),@p3 NVARCHAR(MAX),@p4 DATETIME' ,@p0 = NULL,@p1 = 20,@p2 = '%foo%',@p3 = '%bar%',@p4 = '20140101 00:00:00:000'
Not the tidiest thing in the world, but it works, and again, is a lot safer than
just blindly appending entire ad hoc predicates onto the end of a query. I have
tested various combinations, and any manipulating you try to do to the strings stored
in the table type lead to syntax errors (which is better than injection!) or just
searching for the wrong string value. If you come up with any way to execute an
unchecked DROP TABLE
, for example, with access only to the values that
get inserted into the TVP, please let me know in the tip comments below so I can
investigate.
You could go ever further, like adding columns that indicate correlation and predicate nesting, as well as adding ordering, grouping, and various other constructs. Really, you can get as complicated as you want. But this is already pretty complex, and likely a lot to digest, so I'll leave it at that for now.
Next Steps
- Make a plan to review your usage of dynamic SQL and identify opportunities to make it safer.
- Review the following tips and other resources:
- Protecting Yourself from Dynamic SQL in SQL Server - Part 1
- Execute Dynamic SQL commands in SQL Server
- Using Parameters for SQL Server Queries and Stored Procedures
- Using Triggers to Mitigate Some SQL Injection Effects
- SQL Injection: Defense in Depth
- SQL Injection - the golden rule
- Everything you wanted to know about SQL injection (but were afraid to ask)
- Stored procedures and ORMs won't save you from SQL injection
- The Curse and Blessings of Dynamic SQL and Dynamic Search Conditions in T-SQL
- All Dynamic SQL Tips
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips