By: Andy Novick | Comments (21) | Related: 1 | 2 | > Common Runtime Language
Problem
User written aggregates have proven very useful for solving a variety of problems that couldn't be handled easily before they became available in SQL Server 2005. The most useful for me has been the Concatenate aggregate from the article SQL Server CLR function to concatenate values in a column. I've used it frequently to create data files or in reports. In the original function the separator character is hard coded as a comma. What if you need a pipe as a separator? How about more than one character as a separator. And what if the concatenated string runs over 8000 characters?
Solution
SQL Server 2005 introduced user-defined aggregate functions but restricted them to have one input parameter and to have a result that was 8000 bytes or less. SQL Server 2008 lifts those restrictions and allows multiple parameters and results that are up to two gigabytes in size. This article will extend the original Concatenate aggregate to take advantage of these features using Visual Studio 2008 SP1 to create the CLR function.
The basics of writing a CLR aggregate are the same and you might want to go back to that article to review how it's done. Each aggregate exposes four methods Init, Accumulate, Merge and Terminate. SQL Server creates the CLR object and calls Init once for each object or each time it reuses an object. It then calls Accumulate for each value to be aggregated. Merge is used when aggregates are combined from multiple threads. Finally, Terminate is called for the CLR object to return a result to SQL Server.
The original article on the Concatenate function took the code it right out of the SQL Server 2005 sample code. For this article we'll create a new function with it's own assembly and a different name, concat. Start with Visual Studio 2008 and create a new project using the Database\SQL CLR\C# Sql Server Project template. Give the project a new name, location, and pick the database where you'd like to put the aggregate. Don't worry too much about the database because it turns out the DLL that the project creates will have to be manually deployed. Visual Studio creates the empty solution to which you add the aggregate using the menu command "Project\Add Aggregate..." I named the file concat.cs to keep the names consistent but that isn't required.
Visual Studio creates a C# struct named concat for the aggregate with the four required methods. We'll flesh out the struct to complete the aggregate. Here's the code:
using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(
Format.UserDefined, /// Binary Serialization because of StringBuilder
IsInvariantToOrder = false, /// order changes the result
IsInvariantToNulls = true, /// nulls don't change the result
IsInvariantToDuplicates = false, /// duplicates change the result
MaxByteSize = -1
)]
public struct concat :IBinarySerialize
{
private StringBuilder _accumulator;
private string _delimiter;
/// <summary>
/// IsNull property
/// </summary>
public Boolean IsNull { get; private set; }
public void Init()
{
_accumulator = new StringBuilder();
_delimiter = string.Empty;
this.IsNull = true;
}
public void Accumulate(SqlString Value, SqlString Delimiter)
{
if ( !Delimiter.IsNull
& Delimiter.Value.Length > 0)
{
_delimiter = Delimiter.Value; /// save for Merge
if (_accumulator.Length > 0) _accumulator.Append(Delimiter.Value);
}
_accumulator.Append(Value.Value);
if (Value.IsNull == false) this.IsNull = false;
}
/// <summary>
/// Merge onto the end
/// </summary>
/// <param name="Group"></param>
public void Merge(concat Group)
{
/// add the delimiter between strings
if (_accumulator.Length > 0
& Group._accumulator.Length > 0) _accumulator.Append(_delimiter);
///_accumulator += Group._accumulator;
_accumulator.Append(Group._accumulator.ToString());
}
public SqlString Terminate()
{
// Put your code here
return new SqlString(_accumulator.ToString());
}
/// <summary>
/// deserialize from the reader to recreate the struct
/// </summary>
/// <param name="r">BinaryReader</param>
void IBinarySerialize.Read(System.IO.BinaryReader r)
{
_delimiter = r.ReadString();
_accumulator = new StringBuilder(r.ReadString());
if (_accumulator.Length != 0) this.IsNull = false;
}
/// <summary>
/// searialize the struct.
/// </summary>
/// <param name="w">BinaryWriter</param>
void IBinarySerialize.Write(System.IO.BinaryWriter w)
{
w.Write(_delimiter);
w.Write(_accumulator.ToString());
}
}
The concat aggregate becomes multi-parameter by including multiple arguments to the Accumulate method. Delimiter is the second parameter for the aggregate and takes a SQLString so that multi-character separators can be used. It also allows a null separator, something the original Concatenate aggregate doesn't do.
For efficiency sake the concat aggregate is implemented with a CLR StringBuilder. That introduces a couple of additional requirements. The first is the using statement that brings in the System.Text namespace. The second requirement is that the struct must handle it's own serialization and deserialization. This is done by implementing the IBinarySerialize interface provide in the Microsoft.SqlServer.Server namespace. Binary serialization is needed because SQL Server can't serialize the StringBuilder object. The code tells SQL Server that it does serialization by including the Format.Userdefiend in the SqlUserDefinedAggregate attribute. The two methods IBinarySerialize.Read and IBinarySerialize.Write implement the interface. Some of the other parameters on the SqlUserDefinedAggregate are noteworthy:
Attribute Parameter |
Description |
IsInvariantToOrder | Tells the optimizer that changing the order of calls to Accumulate and Merge doesn't matter. For concat order matters so it's set to false. |
IsInvariantToNulls | Tells the optimizer that it doesn't have to call Accumulate when the Value is null |
IsInvariantToDuplicates | Tells the optimizer that it only has to call Accumulate once for each unique argument. |
MaxByteSize | When -1 it tells SQL Server that the object may be any size up to SQL Server's maximum of 2 gigabytes. When it's 1 to 8000 it gives the maximum size. |
Once the code is written it comes time to Build and Deploy the project and that's where Visual Studio holds a little surprise. Build works fine and compiles the code. When the Deploy menu is used you're greeted with the message:
The Accumulate method in user defined aggregate "concat" must have exactly one parameter. |
What? The aggregate must have one parameter? This is an article about implementing aggregates with multiple parameters? It turns out that although SQL Server supports aggregates with multiple parameters Visual Studio 2008 doesn't. That seems pretty peculiar since both SQL Server 2008 and Visual Studio 2008 SP1 were shipped in what seemed like coordinated fashion on the same day in August 2008. Shipping on the same day didn't make the features work together. I think this is a bug that should be fixed and Connect item 344093 is for this issue and the related message for the MaxByteSize attribute. There is a workaround, which will be discussed next, but if you think this should be fixed you might want to vote for this item.
The workaround to this problem is to deploy the assembly and create the function manually with your own T-SQL script It's not that difficult and has been described in earlier articles on creating SQLCLR objects such as Send Email from SQL Server Express Using a CLR Stored Procedure. Here's the script to create the assembly and the aggregate. Be sure to start out in the database of your choice.
-- Remove the aggregate and assembly if they're there
IF OBJECT_ID('dbo.concat') IS NOT NULL DROP Aggregate concat
GO
IF EXISTS (SELECT * FROM sys.assemblies WHERE name = 'concat_assembly')
DROP assembly concat_assembly;
GO
CREATE Assembly concat_assembly
AUTHORIZATION dbo
FROM 'c:\MSSQLTIPS\concat\concat\bin\Debug\SqlClassLibrary.dll'
WITH PERMISSION_SET = SAFE;
GO
CREATE AGGREGATE dbo.concat (
@Value NVARCHAR(MAX)
, @Delimiter NVARCHAR(4000)
) RETURNS NVARCHAR(MAX)
EXTERNAL Name concat_assembly.concat;
GO
You'll have to change the path to the path of the DLL That you created. Once the aggregate is created it's just a matter of using it. This example uses the AdventureWorks database and produces a short list of provinces delimited by pipe characters:
SELECT TerritoryName , dbo.concat(StateProvinceName, '|') StateProvinceList FROM AdventureWorks.Sales.vSalesPerson WHERE TerritoryName IS NOT NULL GROUP BY
TerritoryName GO
TerritoryName StateProvinceList -------------------- ---------------------------------- Australia Victoria Canada Alberta|Ontario Central Minnesota France Gironde Germany Hamburg Northeast Michigan Northwest Oregon|Washington|Massachusetts Southeast Tennessee Southwest Utah|California United Kingdom England
With the SQL Server 2005 version of this function, which was called Concatenate, we had to worry about creating aggregates that went over 8000 bytes. That's no longer a problem as seen in this query that use to throw an error:
SELECT DATALENGTH(dbo.concat(Name, ';')) [Concat Len] FROM AdventureWorks.Sales.Store GO
Concat Len -------------------- 29614
Next Steps
- Build the concat aggregate
- Use concat in your queries
- Create other aggregates with multiple arguments.
- Check out all of the CLR tips on MSSQLTips.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips