Problem Normalization is an important principal in database design, but it's often the opposite of what's needed in a good report. If a column is normalized, but the user really wants to see the values as a short comma separated list, how can I write a query that produces the list? Concatenating the values in a column would be pretty easy if SQL Server had a concatenate aggregate function, which it doesn't. What's more, for efficiency sake it's important to write the reporting queries without using cursors.
Solution SQL Server 2005 introduced user-defined aggregate functions that can be written in the SQLCLR. These are great when you need an aggregate that SQL Server doesn't supply. The funny thing is that we don't have to write the Concatenate aggregate. Microsoft has supplied it in the sample code that comes with SQL Server 2005.
During the installation of SQL Server there's an option to install Sample Code. If you select that option a generous quantity of samples are added to your system. If you're installing to the default location on the C: drive you'll find the samples in the directory "C:\Program Files\Microsoft SQL Server\90\Samples\". The sample with the Concatenate aggregate is in the directory "Engine\Programmability\CLR\StringUtilities\" under Samples. The SQLCLR samples come in both C# and Visual Basic.Net. For this tip I'll be using the VB version in the VB subdirectory. If you prefer C#, the code in the CS subdirectory has the same functionality.
This samples rely on a strong key to identify the code to SQL Server. To compile the project, you'll need to supply a strong key file, Microsoft doesn't ship one with the samples. There are instructions in the readme_stringutils.htm file that is in the StringUtilities directory on how to create the key. Let's walk through the process:
Open up a command window and navigate to the directory at the top of the sample code: "c:\program files\Microsoft SQL Server\90\Samples\"
Run the sn.exe program that creates a pair of cryptographic keys. sn.exe is part of the .Net SDK so you may need to use it's complete path.
"C:\Program Files\Microsoft Visual Studio 8\SDK\v2.0\Bin\"sn.exe -k Samplekey.snk
The snk file is used by several of the sample projects. Not just StringUtils so you won't have to take this step again.
Now it's almost time to start coding. Using Visual Studio 2005 Professional, or above, open the StringUtilities.sln file to get started. The samples ship without specifying a connection to a database so the next task is to make the connection to a database. Using the menu choose Project and then StringUtils Properties. Along the left side of the properties screen select the Database tab. This is what you'll see:
The green arrow points to the blank string. If you hide the Solution Explorer and scroll over to the right you can press the Browse button that is hidden in the picture above. Add a reference to any database that you like on your local machine. Here's the New Database Reference dialog as I add a connection to AdventureWorks on the local machine:
Press OK and then OK again on the Add Database Reference dialog and your connection string is set. Open the Solution Explorer again and take a look at the code in the Concat.vb file.
The sample declares the aggregate function Concatenate by creating the concatenate class and decorating it with the SqlUserDefinedAggregate attribute. I've re-wrapped the text so it's more readable and now the code for Concatenate starts like this:
Aggregate functions are implemented by writing four methods
Initializes the aggregate
This method is called for each row to be aggregated it provides the value to aggregate.
Called when two aggregates must be merged to produce a final result
Called by SQL Server when it wants the result of the aggregate.
Let's take a look at each of those functions in the Concat.VB file. The Init method is small and called only once each time the Concatenate aggregate is created. All it has to do is create a new StringBuilder object. StringBuilder is a .Net framework class that will do most of the concatenation work.
The accumulate method is called once for every value to aggregate. It should decide what to do when it's called with a null value. The Concatenate.Accumulate method in the sample just ignores nulls. If the string has a value it's Appended to the StringBuilder along with a comma. Concatenating with commas as the separator is hard-coded into the Accumulate method. In SQL Server 2005 aggregates take only one argument so there's no way to pass in the character used to separate the aggregated strings. Here's the Concatenate.Accumulate code:
The merge method is used when SQL Server has to combine two aggregate objects to produce a combined results. Using multiple aggregate methods allows SQL Server to process the query on multiple threads for better performance. It calls Merge to aggregate the intermediate results. The parameter to Merge is a second Concatenate object distinct from the one that is called. Since it's the same class, the Merge method can refer to the private members of the parameter and all it has to do is Append that string to the current intermediateResult StringBuilder. Here's Concatenate.Merge:
Finally the job of the Terminate method is to return the result of the aggregate to SQL Server. Concatenate.Terminate strips the final comma from the intermediate result and returns the result as a new SqlString.
After taking a look at the code it's time to Build and Deploy it to SQL Server with the Deploy menu command. If you run into an error message complaining that the compiler can't find the SampleKey.snk file you'll need to go back to the instructions above and be sure that the SampleKey.snk file has been created and that it's in the correct directory. This is the text of the error message that I got before getting everything correctly:
Error reading key file 'c:\Program Files\Microsoft SQL Server
\90\Samples\SampleKey.snk' -- The system cannot
find the file specified.
Once built and deployed the Concatenate aggregate can be used in a query. Here's a pretty basic one in the AdventureWorks database, where we built the StringUtilties assembly:
where TerritoryName is NOT NULL
group by TerritoryName
The following is the result of running the script in SQL Server Management Studio:
For short lists the Concatenate aggregate works great. But it must be used with caution. Like all User-Defined functions there is a performance penalty that has to be paid when using a function on tens of thousands of rows or more. I had one situation where I had to remove Concatenate for performance reasons when working with a query on 3 million rows.
Another limitation on aggregates in SQL Server 2005 is that their internal storage can only use up to 8000 bytes. Let's see what happens when the string concatenated grows to over 8000 bytes:
Here are the results:
Msg 6522, Level 16, State 2, Line 2
A .NET Framework error occurred during execution of user-defined
routine or aggregate "Concatenate":
System.Data.SqlTypes.SqlTypeException: The buffer is insufficient.
Read or write operation failed.
at System.Data.SqlTypes.SqlBytes.Write(Int64 offset,
Byte buffer, Int32 offsetInBuffer, Int32 count)
buffer, Int32 offset, Int32 count)
at System.IO.BinaryWriter.Write(String value)
at Microsoft.Samples.SqlServer.Concatenate.Write(BinaryWriter w)
The 8000 byte limitation has been addressed in SQL Server 2008.
Explore and build the sample StringUtilities project in either VB.Net or C#.
Use Concatenate in your queries.
Look for other aggregates that might make sense in your environment.
Last Update: 2/19/2009
About the author
Andy is a SQL Server Developer in the Boston area with 25 years of database and application development experience.
@List = COALESCE(@List + '; ', '') + [LegalName] + COALESCE('; ' + [ShortName], '') + COALESCE('; ' + [AlternateName], '')
FROM dbo.prime_Org_Contact c INNER JOINdbo.prime_Organization o ON c.OrgID = o.ID
WHERE c.ContactID = @ContactIDRETURN @List
This returns a CSV list of orgs that a contact is associated with. We populate a search table with a bunch of these UDFS so we can full text search. Unfortunately, a view using a UDF that does data access cannot be full text indexed.