Concat Aggregates SQL Server CLR Function

By:   |   Comments (21)   |   Related: 1 | 2 | > Common Runtime Language


Problem

User written aggregates have proven very useful for solving a variety of problems that couldn't be handled easily before they became available in SQL Server 2005.  The most useful for me has been the Concatenate aggregate from the article SQL Server CLR function to concatenate values in a column.  I've used it frequently to create data files or in reports.   In the original function the separator character is hard coded as a comma.  What if you need a pipe as a separator?  How about more than one character as a separator.  And what if the concatenated string runs over 8000 characters? 

Solution

SQL Server 2005 introduced user-defined aggregate functions but restricted them to have one input parameter and to have a result that was 8000 bytes or less.   SQL Server 2008 lifts those restrictions and allows multiple parameters and results that are up to two gigabytes in size.  This article will extend the original Concatenate aggregate to take advantage of these features using Visual Studio 2008 SP1 to create the CLR function.

The basics of writing a CLR aggregate are the same and you might want to go back to that article to review how it's done.  Each aggregate exposes four methods Init, Accumulate, Merge and Terminate.  SQL Server creates the CLR object and calls Init once for each object or each time it reuses an object.  It then calls Accumulate for each value to be aggregated.  Merge is used when aggregates are combined from multiple threads.  Finally, Terminate is called for the CLR object to return a result to SQL Server.

The original article on the Concatenate function took the code it right out of the SQL Server 2005 sample code.  For this article we'll create a new function with it's own assembly and a different name, concat.   Start with Visual Studio 2008 and create a new project using the Database\SQL CLR\C# Sql Server Project template.  Give the project a new name, location, and pick the database where you'd like to put the aggregate.  Don't worry too much about the database because it turns out the DLL that the project creates will have to be manually deployed.  Visual Studio creates the empty solution to which you add the aggregate using the menu command "Project\Add Aggregate..."  I named the file concat.cs to keep the names consistent but that isn't required.

Visual Studio creates a C# struct named concat for the aggregate with the four required methods.  We'll flesh out the struct to complete the aggregate.  Here's the code:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text;
 
[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(
 
    Format.UserDefined, /// Binary Serialization because of StringBuilder
    IsInvariantToOrder = false/// order changes the result
    IsInvariantToNulls = true/// nulls don't change the result
    IsInvariantToDuplicates = false, /// duplicates change the result
    MaxByteSize = -1
    )]
 
public struct concat :IBinarySerialize
{
    private StringBuilder _accumulator;
    private string _delimiter;
 
    /// <summary>
    /// IsNull property
    /// </summary>
    public Boolean IsNull { get; private set; }
 
    public void Init()
    {
        _accumulator = new StringBuilder();
        _delimiter = string.Empty;
        this.IsNull = true;
    }
 
    public void Accumulate(SqlString Value, SqlString Delimiter)
    {
        if ( !Delimiter.IsNull 
            & Delimiter.Value.Length > 0)
        {
            _delimiter = Delimiter.Value; /// save for Merge
            if (_accumulator.Length > 0) _accumulator.Append(Delimiter.Value);
 
        }
        _accumulator.Append(Value.Value); 
        if (Value.IsNull == false) this.IsNull = false;
    }
    /// <summary>
    /// Merge onto the end 
    /// </summary>
    /// <param name="Group"></param>
    public void Merge(concat Group)
    {
        /// add the delimiter between strings
        if (_accumulator.Length > 0
            & Group._accumulator.Length > 0) _accumulator.Append(_delimiter); 
 
        ///_accumulator += Group._accumulator;
        _accumulator.Append(Group._accumulator.ToString());
 
    }
 
    public SqlString Terminate()
    {
        // Put your code here
        return new SqlString(_accumulator.ToString());
    }
 
    /// <summary>
    /// deserialize from the reader to recreate the struct
    /// </summary>
    /// <param name="r">BinaryReader</param>
    void IBinarySerialize.Read(System.IO.BinaryReader r)
    {
        _delimiter = r.ReadString();
        _accumulator = new StringBuilder(r.ReadString());
 
        if (_accumulator.Length != 0) this.IsNull = false;
    }
 
    /// <summary>
    /// searialize the struct.
    /// </summary>
    /// <param name="w">BinaryWriter</param>
    void IBinarySerialize.Write(System.IO.BinaryWriter w)
    {
        w.Write(_delimiter);
        w.Write(_accumulator.ToString());
    }
}

The concat aggregate becomes multi-parameter by including multiple arguments to the Accumulate method.  Delimiter is the second parameter for the aggregate and takes a SQLString so that multi-character separators can be used.  It also allows a null separator, something the original Concatenate aggregate doesn't do.

For efficiency sake the concat aggregate is implemented with a CLR StringBuilder.  That introduces a couple of additional requirements.  The first is the using statement that brings in the System.Text namespace.  The second requirement is that the struct must handle it's own serialization and deserialization.  This is done by implementing the IBinarySerialize interface provide in the Microsoft.SqlServer.Server namespace.  Binary serialization is needed because SQL Server can't serialize the StringBuilder object.  The code tells SQL Server that it does serialization by including the Format.Userdefiend in the SqlUserDefinedAggregate attribute.  The two methods IBinarySerialize.Read and IBinarySerialize.Write implement the interface. Some of the other parameters on the SqlUserDefinedAggregate are noteworthy:

Attribute Parameter

Description
IsInvariantToOrder Tells the optimizer that changing the order of calls to  Accumulate and Merge doesn't matter.  For concat order matters so it's set to false.
IsInvariantToNulls Tells the optimizer that it doesn't have to call Accumulate when the Value is null
IsInvariantToDuplicates Tells the optimizer that it only has to call Accumulate once for each unique argument.
MaxByteSize When -1 it tells SQL Server that the object may be any size up to SQL Server's maximum of 2 gigabytes.  When it's 1 to 8000 it gives the maximum size.

Once the code is written it comes time to Build and Deploy the project and that's where Visual Studio holds a little surprise.  Build works fine and compiles the code.  When the Deploy menu is used you're greeted with the message:

The Accumulate method in user defined aggregate "concat" must have exactly one parameter.
 

What? The aggregate must have one parameter? This is an article about implementing aggregates with multiple parameters?  It turns out that although SQL Server supports aggregates with multiple parameters Visual Studio 2008 doesn't.  That seems pretty peculiar since both SQL Server 2008 and Visual Studio 2008 SP1 were shipped in what seemed like coordinated fashion on the same day in August 2008.  Shipping on the same day didn't make the features work together.  I think this is a bug that should be fixed and Connect item 344093 is for this issue and the related message for the MaxByteSize attribute.  There is a workaround, which will be discussed next, but if you think this should be fixed you might want to vote for this item.

The workaround to this problem is to deploy the assembly and create the function manually with your own T-SQL script   It's not that difficult and has been described in earlier articles on creating SQLCLR objects such as Send Email from SQL Server Express Using a CLR Stored Procedure.  Here's the script to create the assembly and the aggregate.  Be sure to start out in the database of your choice.

-- Remove the aggregate and assembly if they're there

IF OBJECT_ID('dbo.concat'IS NOT NULL DROP Aggregate concat
GO

IF EXISTS (SELECT FROM sys.assemblies WHERE name 'concat_assembly')
       
DROP assembly concat_assembly;
GO     

CREATE Assembly concat_assembly
   
AUTHORIZATION dbo
   
FROM 'c:\MSSQLTIPS\concat\concat\bin\Debug\SqlClassLibrary.dll'
   
WITH PERMISSION_SET SAFE;
GO

CREATE AGGREGATE dbo.concat (

    
@Value NVARCHAR(MAX)
  , 
@Delimiter NVARCHAR(4000)

RETURNS NVARCHAR(MAX)
EXTERNAL Name concat_assembly.concat;
GO
 

You'll have to change the path to the path of the DLL That you created.   Once the aggregate is created it's just a matter of using it.  This example uses the AdventureWorks database and produces a short list of  provinces delimited by pipe characters:

SELECT TerritoryName 
     dbo.concat(StateProvinceName'|'StateProvinceList 
   FROM AdventureWorks.Sales.vSalesPerson 
    WHERE TerritoryName IS NOT NULL 
    GROUP BY TerritoryName 
GO     
  
TerritoryName        StateProvinceList
-------------------- ----------------------------------
Australia            Victoria
Canada               Alberta|Ontario
Central              Minnesota
France               Gironde
Germany              Hamburg
Northeast            Michigan
Northwest            Oregon|Washington|Massachusetts
Southeast            Tennessee
Southwest            Utah|California
United Kingdom       England

With the SQL Server 2005 version of this function, which was called Concatenate, we had to worry about creating aggregates that went over 8000 bytes.  That's no longer a problem as seen in this query that use to throw an error:

SELECT DATALENGTH(dbo.concat(Name';')) [Concat Len] 
    FROM AdventureWorks.Sales.Store 
GO       
          Concat Len
--------------------
               29614
Next Steps
  • Build the concat aggregate
  • Use concat in your queries
  • Create other aggregates with multiple arguments.
  • Check out all of the CLR tips on MSSQLTips.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Andy Novick Andy Novick is a SQL Server Developer in the Boston area with 25 years of database and application development experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, May 8, 2018 - 6:44:13 AM - Amit Rawat Back To Top (75892)

how to select two maximum no out of three columns in sql server


Wednesday, March 16, 2016 - 8:51:05 AM - Allison Back To Top (40951)

hi, it is possible to include parameter Sort?
for example: concat ordered by date descending.


Monday, October 5, 2015 - 8:54:37 AM - Roman Pekar Back To Top (38825)

Just fyi, I've also encountered issue with missing separators when using distinct keyword during concatenation. It was caused by parallel processing indeed (like Don Schaeffer mentioned in earlier comment).

After careful checking of the code, I've found the reason. If current instance is empty and added instance is not, the result will have correct _accumulator but empty _delimiter.

So I've modified Merge method like this:

 

public void Merge(Concat Group)

{

// get delimiter from not null group

if (IsNull && !Group.IsNull)

_delimiter = Group._delimiter;

 

// add the delimiter between strings

if (_accumulator.Length > 0 && Group._accumulator.Length > 0)

_accumulator.Append(_delimiter);

 

//_accumulator += Group._accumulator;

_accumulator.Append(Group._accumulator.ToString());

 

// modifying IsNull

IsNull = IsNull && Group.IsNull;

}


Tuesday, January 6, 2015 - 12:07:48 PM - Don Schaeffer Back To Top (35845)

We encountered a problem with using this concat aggregate function when concatenating tens of thousands of items for each column in the group column.  At various seemingly random points within the lengthy concatenated string there were glitches like missing separators.  

A developer suggested the issue was parallel processing and in fact our issue was resolved by preventing parallel processing in the query.  This function has otherwise worked great; it was only in this unusual case where we had very large number of items being concatenated that we saw this issue.

OPTION (MAXDOP 1)


Friday, September 20, 2013 - 12:22:53 AM - Jeff Moden Back To Top (26877)

How does this compare in performance to the STUFF(FOR XML PATH) method for creating string aggregates?


Sunday, November 4, 2012 - 12:45:46 PM - Veena Back To Top (20208)

Hi Andy,

  Thank you for the great explanation of CLR funtionality in your article.  I wanted to use this functionality to calculate daily returns of stock prices, annual retun, or weekly retuns and as this involves a data from the previous row, I am currently using the row number function with CTE to do this.  Is there a way to do it in CLR and would appreciate if you could guide me towards it.

 

Thanks,

Veena.


Thursday, January 12, 2012 - 8:50:13 AM - opc.three Back To Top (15622)

The use of a StringBuilder to store the concatenation value is terribly inefficient. The article expands nicely on the MSDN example but for a solution that will perform well for large data sets in a production environment consider using the concatenation aggregates available here: http://groupconcat.codeplex.com


Tuesday, August 16, 2011 - 10:09:15 AM - George Back To Top (14414)

Hi, for starters i must say it is very nice work.

At first i thought that the fact tha it does not handle nulls was not an issue, but after some time i was needed is desperatelly, so i slightly modified your code, so that i returned null when no not null value was avalibale and also so as to handle null values correctly.

 

[Serializable]
[Microsoft.SqlServer.Server.SqlUserDefinedAggregate(
    Format.UserDefined,
    IsInvariantToOrder = false, // order changes the result
    IsInvariantToNulls = false,  // nulls change the result
    IsInvariantToDuplicates = false, // duplicates change the result
    MaxByteSize = -1)]
public struct concat : IBinarySerialize, INullable
{
    private StringBuilder _accumulator;
    private string _delimiter;

    public void Init()
    {
        _accumulator = new StringBuilder();
        _delimiter = string.Empty;
        this.IsNull = true;
    }

    public void Accumulate(SqlString Value, SqlString Delimiter)
    {
        if (!Delimiter.IsNull & Delimiter.Value.Length > 0)
        {
            _delimiter = Delimiter.Value; /// save for Merge
            if (_accumulator.Length > 0)
                _accumulator.Append(Delimiter.Value);
        }

        if (Value.IsNull == false)
        {
            _accumulator.Append(Value.Value);
            this.IsNull = false;
        }
    }

    public void Merge(concat Group)
    {
        if (_accumulator.Length > 0 & Group._accumulator.Length > 0)
            _accumulator.Append(_delimiter);

        _accumulator.Append(Group._accumulator.ToString());
    }

    public SqlString Terminate()
    {
        return IsNull ? SqlString.Null : new SqlString(_accumulator.ToString());
    }

    void IBinarySerialize.Read(System.IO.BinaryReader r)
    {
        IsNull = r.ReadBoolean();
        _delimiter = r.ReadString();
        _accumulator = new StringBuilder(r.ReadString());
        if (_accumulator.Length != 0)
            this.IsNull = false;
    }

    void IBinarySerialize.Write(System.IO.BinaryWriter w)
    {
        w.Write(IsNull);
        w.Write(_delimiter);
        w.Write(_accumulator.ToString());
    }

    public Boolean IsNull { get; private set; }
}

 


Wednesday, November 10, 2010 - 1:01:39 PM - AaronLS Back To Top (10350)

Ah, makes total sense now!


Wednesday, November 10, 2010 - 12:10:54 PM - Andy Novick Back To Top (10349)

HI,

What may be confusing is that "InvariantToDuplicates" is a declaration by your code to SQL Server that it is "Invariant to Duplicates" so if you turn it on you're telling SQL Server that the code handles duplicates and will return the same results as if it SQL Server didn't send duplicates.  So when you say "Distinct s", SQL Server says "okay, the clr code will take care of the duplicates so it's okay to send duplicates even if there is a Distinct clause and in the process save the work of makeing the distinct data to pass to the aggregate.

HTH

Regards,
Andy

Andrew Novick
SQL Server MVP


Wednesday, November 10, 2010 - 9:57:03 AM - AaronLS Back To Top (10348)

Got it working, I had set the InvariantToDuplicates = true, and it was not working like that.  The code as-is has this:

IsInvariantToDuplicates = false, /// duplicates change the result

So if I read the comment then that implies the setting of false means that duplicates will change the result, i.e. a duplicates will be included.

I would have thought that setting it to true would mean that it does not change the result, i.e. is invariant, when duplicates are present, implying the duplicates are ignored. So both the name of Microsoft's setting and the comment mislead me.

Realizing I had not tried it with the original false setting, I gave it a whirl and that worked. Thanks.

Tuesday, November 9, 2010 - 6:25:10 PM - Andy Novick Back To Top (10346)

Hi,  I'm not having any problem.  What version of sql server are you using.  I'm using 2008 R2.

 


Tuesday, November 9, 2010 - 2:45:20 PM - AaronLS Back To Top (10345)

The distinct doesn't seem to make a difference, still get "a, b, a" as a result.


Tuesday, November 9, 2010 - 1:04:01 PM - Andy Novick Back To Top (10344)

That didn't paste correctly  try

select dbo.concat(distinct s, ',')

from (values ('a'), ('b'), ('a')) x(s)


Tuesday, November 9, 2010 - 11:08:14 AM - AaronLS Back To Top (10342)

Any idea how to get this to ignore duplicates?  The IsInvariantToDuplicates setting doesn't have an effect, I assume because it is only an optimizer hint and thus doesn't change the behavior.  Would the code need to be modified to track a list of values and only append a value if it is not already in the list?


Monday, May 31, 2010 - 1:17:03 PM - Cornan Back To Top (5613)

Interestingly enough, on one SQL 2008 server I deploy to (Developer Edition 2008) the parameters are set correctly, while using the VS2010 "Deploy" from exactly the same project to a SQL Server 2008 version VS2010 generates the erroneous nvarchar(4000) declarations.

I also had the same problem on the developer edition until some mysterious point in time, after which it began to work properly (!?)

I've written my own version of Concatenate(,,) with a third parameter to allow a place-holder delimited to be included for empty or NULL values, and to return NULL when there are no "input" rows (as with built-in aggregates such as SUM()).

@@VERSION on server that worked properly with VS2010:

Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)   Mar 29 2009 10:11:52   Copyright (c) 1988-2008 Microsoft Corporation  Developer Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: )

@@VERSION on server that generated nvarchar(4000) declarations for the first parameter and return value:

Microsoft SQL Server 2008 (SP1) - 10.0.2531.0 (X64)   Mar 29 2009 10:11:52   Copyright (c) 1988-2008 Microsoft Corporation  Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7600: ) 

As the .sql script generated is a SQLCMD script that can be edited and run in SSMS, that's what I did for the Enterprise box.


Sunday, May 30, 2010 - 10:41:11 AM - Andrew Novick Back To Top (5581)

2)  Yes the connect item is for Visual Studiko 2008 SP1.   These problems are addressed in VIsual Studio 2010 and will be covered in a future article.

 


Sunday, May 30, 2010 - 5:46:27 AM - Cornan Back To Top (5580)

1) Can anyone tell me how to make a "top level" comment?  I didn't want to REPLY on a thread but I couldn't find any "Make a Comment" buttons on the form.

2)  I noticed that though this article is dated May 20, 2010, the link to Connect is to a 2008 connect item

    Is it possible that the issues discussed have already been fixed in a later release of Visual Studio?  2008 sp1?  2010?


Thursday, May 20, 2010 - 1:20:02 PM - GeriReshef Back To Top (5391)
Thank you!

Thursday, May 20, 2010 - 11:31:37 AM - Andrew Novick Back To Top (5389)

Hi,  The C# syntax is from version 3.5 so a newer compiler is necessary.  I've used Visual Studio 2008, which compiles the code correctly.  The command line would be:

c:\Windows\Microsoft.NET\Framework\v3.5\csc

I've verified that the 3.5 compiler works with the command line.

Regards,

Andy

 


Thursday, May 20, 2010 - 2:31:23 AM - GeriReshef Back To Top (5388)
I tried to compile: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\Csc /target:library C:\Tmp\Concat.Cs And receive the following message: Concat.Cs(26,29): error CS0501: 'concat.IsNull.get' must declare a body because it is not marked abstract or extern Concat.Cs(26,42): error CS0501: 'concat.IsNull.set' must declare a body because it is not marked abstract or extern What's the problem? How should I fix it?














get free sql tips
agree to terms