Removing Duplicates from Strings in SQL Server

By:   |   Comments (15)   |   Related: > TSQL


Problem

I have seen many systems that accept or store multi-valued strings in parameters, variables, or columns, with no process set up for de-duplication. Typically these are comma-separated; but really, the delimiter isn't important, the problem is that there are multiple facts embedded in a single value.

Let's say we have a string like this:

 Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta

And we want to turn it into this (distinct values ordered by original appearance):

 Bravo/Alpha/Tango/Delta

Or this (distinct values ordered alphabetically):

 Alpha/Bravo/Delta/Tango

Assuming that we can't fix the application to remove duplicates before SQL Server ever sees them, and we can't fix the schema so that the data can be separated and normalized, how would we accomplish this cleanup in Transact-SQL?

Solution

In this tip, I will show you my approach to this kind of problem, which (if the application can't be fixed and the data can't be normalized) will almost always try to lean toward a set-based solution. The typical response to such a problem will often point toward some kind of looping strategy; SQL Server is not very efficient with loops.

First, we need a Numbers table in order to support string splitting. There are multiple approaches to breaking strings apart, but the Numbers table is my favorite, because it is simple to implement, has a small disk/memory footprint, performs well enough, and is more flexible than some of the other common approaches. One way to create a numbers table is as follows:

DECLARE @UpperLimit INT = 1000000;

;WITH n(rn) AS
(
  SELECT ROW_NUMBER() OVER (ORDER BY s1.[object_id])
  FROM sys.all_columns AS s1
  CROSS JOIN sys.all_columns AS s2
)
SELECT [Number] = rn
INTO dbo.Numbers FROM n
WHERE rn <= @UpperLimit;

CREATE UNIQUE CLUSTERED INDEX n ON dbo.Numbers([Number])
-- WITH (DATA_COMPRESSION = PAGE);

This will create a Numbers table with 1,000,000 rows, which by default will take about 17MB (with page compression, it will only require about 11MB). You may need more or fewer numbers depending on your usage and requirements; for this tip, you probably really only need 4,000 or 8,000 numbers to deal with the most common maximum string lengths.

As an aside, I typically create this table in a Utility database, so that a single copy can be used centrally from any other database. You can use synonyms to point to it so that your code does not need to use three-part names.

Now with the Numbers table in place, you can create a function that uses the table to break strings apart:

CREATE FUNCTION dbo.SplitString
(
  @List  NVARCHAR(MAX),
  @Delim NVARCHAR(32)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
  RETURN
  (
    SELECT 
      rn, 
      vn = ROW_NUMBER() OVER (PARTITION BY [Value] ORDER BY rn), 
      [Value]
    FROM 
    ( 
      SELECT 
        rn = ROW_NUMBER() OVER (ORDER BY CHARINDEX(@Delim, @List + @Delim)),
        [Value] = LTRIM(RTRIM(SUBSTRING(@List, [Number],
                  CHARINDEX(@Delim, @List + @Delim, [Number]) - [Number])))
      FROM dbo.Numbers
      WHERE Number <= LEN(@List)
      AND SUBSTRING(@Delim + @List, [Number], LEN(@Delim)) = @Delim
    ) AS x
  );

As a quick test, we can see how this splitting function outputs our original string (and how we can use the rn and vn output columns to sort and remove duplicates, respectively):

DECLARE @List NVARCHAR(MAX), @Delim NVARCHAR(32);

SELECT @List  = N'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta', 
       @Delim = N'/';

SELECT rn, vn, [Value] FROM dbo.SplitString(@List, @Delim);

Results:

  rn      vn      Value
  ------  ------  -----
  2       1       Alpha
  7       2       Alpha
  1       1       Bravo
  3       2       Bravo
  6       3       Bravo
  5       1       Delta
  8       2       Delta
  4       1       Tango
  
--^ any vn <> 1 can be filtered out as a duplicate
        
--^ row number can be used to preserve original order

Finally, we need a function that will put the strings back together, using FOR XML PATH:

CREATE FUNCTION dbo.ReassembleString
(
  @List  NVARCHAR(MAX),
  @Delim NVARCHAR(32),
  @Sort  NVARCHAR(32)
)
RETURNS NVARCHAR(MAX)
WITH SCHEMABINDING
AS
BEGIN
  RETURN 
  ( 
    SELECT newval = STUFF((
     SELECT @Delim + x.[Value] 
     FROM dbo.SplitString(@List, @Delim) AS x
     WHERE (x.vn = 1) -- filter out duplicates
     ORDER BY CASE @Sort
       WHEN N'OriginalOrder' THEN CONVERT(INT, x.rn)
       WHEN N'Alphabetical'  THEN CONVERT(NVARCHAR(4000), x.value)
       ELSE CONVERT(SQL_VARIANT, NULL) END
     FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,1,N'')
  );
END

With these in place, removing duplicates from our original string and reassembling it in the desired order becomes quite trivial:

DECLARE @List NVARCHAR(MAX), @Delim NVARCHAR(32);

SELECT @List  = N'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta', 
       @Delim = N'/';

SELECT OriginalOrder = dbo.ReassembleString(@List, @Delim, N'OriginalOrder'),
       Alphabetical  = dbo.ReassembleString(@List, @Delim, N'Alphabetical');

Results:

OriginalOrder              Alphabetical
-----------------------    -----------------------
Bravo/Alpha/Tango/Delta    Alpha/Bravo/Delta/Tango

Now, this is not going to be very efficient at scale, but should be quite acceptable for smaller sets, or one-time mass updates - certainly fast compared to looping constructs you may be using now.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a passionate technologist with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and also blogs at sqlblog.org.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, May 10, 2019 - 5:21:53 PM - Josh Yakubov Back To Top (80039)

It is pretty simple to change my code to function, again using string_slpit, string_agg even better and less code. I wrote it with an xml solution with way less code, but I did not check the performance.

We can avoid a Numbers table and we dont need to do string splitting function eaither, we can handle all as below, with xml.

create function Remove_dups(@x varchar(max))
returns varchar(max)
as begin 
 declare @xy nvarchar(max)
 ;with a as
 (
 select cast('<y>'+replace(@x,'/','</y><y>')+'</y>' as xml) n
 ) , c as
 (
 select b.value('.','varchar(max)') y,row_number()over(partition by b.value('.','varchar(max)') order by  b.value('.','varchar(max)'))rn  from a cross apply n.nodes('/y') node(b) 
 ) ,d as 
 (
 select (select '/'+y from c where rn=1 for xml path('')) x
 ) select  @xy=stuff(x,1,1,'') from d 
 return @xy
 end
 
declare @x nvarchar(max)='Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta'
select dbo.Remove_dups(@x)
 

If you want to see it as function with input and delimiter

alter function Remove_dups(@x varchar(max),@y varchar(max))
returns varchar(max)
as begin 
declare @xy nvarchar(max)         
;with a as
 (
 select cast('<y>'+replace(@x,@y,'</y><y>')+'</y>' as xml) n
 ) , c as
 (
 select b.value('.','varchar(max)') y,row_number()over(partition by b.value('.','varchar(max)') order by  b.value('.','varchar(max)'))rn  from a cross apply n.nodes('/y') node(b) 
 ) 
 select  @xy=stuff((select @y+y from c where rn=1 for xml path('')),1,1,'') 
 return @xy
 end
 
 
 declare @input varchar(max)='data,Bravo,Alpha,Bravo,Tango,Delta,Bravo,Alpha,Delta'
        , @dlmt varchar(max)=','
 select dbo.remove_dups(@input,@dlmt)
 

This has way less code, old but classic. We can even do recursive cte, we don't need row_num.

alter function Remove_dups(@x varchar(max),@y varchar(max))
returns varchar(max)
as begin 
declare @xy nvarchar(max)         
;with a as
 (
 select cast('<y>'+replace(@x,@y,'</y><y>')+'</y>' as xml) n
 ) , c as
 (
 select distinct b.value('.','varchar(max)') y  from a cross apply n.nodes('/y') node(b) 
 ) 
 select  @xy=stuff((select @y+y from c  for xml path('')),1,1,'') 
 return @xy
 end
 
 
 declare @input varchar(max)='data,Bravo,Alpha,Bravo,Tango,Delta,Bravo,Alpha,Delta'
        , @dlmt varchar(max)=','
 select dbo.remove_dups(@input,@dlmt) as 'string'
 

Here is a function using a CTE.

alter function Remove_dups(@x varchar(max),@y varchar(max))
returns varchar(max)
as begin 
declare @xy nvarchar(max)         
 ;with a as
 (
 select distinct  b.value('.','varchar(max)') y  from(select cast('<y>'+replace(@x,'/','</y><y>')+'</y>' as xml) n) as x cross apply n.nodes('/y') node(b) 
 )  
select  @xy=stuff((select @y+y from a  for xml path('')),1,1,'')
 return @xy
 end
GO 

declare @input varchar(max)='data/Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta'
      , @dlmt varchar(max)='/'
select dbo.remove_dups(@input,@dlmt) as 'string'

Here is another approach, we can simply rap it up with function if needed.

declare @x varchar(max)='bbd,aa,cc,aa,bbd,cc,dd'
       ,@text varchar(max)=''
;with a as  
(
select 1 as n ,substring(@x,1,1) letter, 1 as n1
union all
select n+1  ,substring(@x,n+1,1), iif(substring(@x,n,1)=',',n1,n1+1) from a
where n<len(@x)
) , b as 
(
    select distinct (select  ''+letter from a b where letter<>','  and b.n-b.n1=c.n-c.n1 for xml path ('')) grp from  a c group by n-n1
) 
select @text=@text+iif(@text='','',',')+grp from b 
select @text

Friday, May 10, 2019 - 2:10:47 PM - Aaron Bertrand Back To Top (80037)

Thanks Greg.

Josh, what's the advantage of this approach, except you didn't have to create the functions? I feel like it's rather complex (I'm pretty good and I wasn't able to discern how HTML had mucked with the code in any way that I could immediately fix it). So you've avoided creating functions, but now you have this complex chunk of XML PATH code that you need to put in every query that needs to do anything like this. IMHO string_split() and string_agg() are much clearer ways to express things - they don't offer full solutions to every scenario, but they are good replacements for clunky XML PATH syntax when possible. Again, IMHO.

A


Friday, May 10, 2019 - 1:50:40 PM - Greg Robidoux Back To Top (80035)

Josh and Aaron, 

this has been fixed.  The code below works.

-Greg


Friday, May 10, 2019 - 7:06:04 AM - Josh Yakubov Back To Top (80021)

Here is my solution, it is even simple with string_split,string_agg, i used xml node and xml path

 declare @x nvarchar(max)='Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta'
 ;with a as
 ( select cast('<y>'+replace(@x,'/','</y><y>')+'</y>' as xml) n ) ,
  c as 
( select b.value('.','varchar(max)') y,row_number()over(partition by b.value('.','varchar(max)') order by  b.value('.','varchar(max)'))rn  from a cross apply n.nodes('/y') node(b)  ) ,
  d as 
 ( select (select '/'+y from c where rn=1 for xml path('')) x
 ) select stuff(x,1,1,'') from d 


Thursday, November 8, 2018 - 4:19:40 PM - Aaron Bertrand Back To Top (78195)

@Dan can you please elaborate? Or show an example where STRING_SPLIT removes duplicates? Of course you can say SELECT Value FROM STRING_SPLIT() GROUP BY Value, but that is not exactly the same thing. I talked about these new functions in a more recent tip:

https://www.mssqltips.com/sqlservertip/5275/solve-old-problems-with-sql-servers-new-stringagg-and-stringsplit-functions/


Thursday, November 8, 2018 - 3:38:18 PM - Dan Carollo Back To Top (78194)

 The new STRING_SPLIT() function in SQL 2016 also handles dedups! 


Friday, January 22, 2016 - 8:49:51 AM - Aaron Bertrand Back To Top (40476)

 Steve, I posted some performance comparisons here:

http://sqlperformance.com/2016/01/t-sql-queries/comparing-splitting-concat

 


Wednesday, January 20, 2016 - 3:50:38 PM - Aaron Bertrand Back To Top (40460)

 @Steve Lots of utility in a numbers table. Some examples here:

http://dba.stackexchange.com/q/11506/1186

And even if this specific functionality is the only one you need it for, if it performs better, I still don't consider that over-engineering. It's one table...

 


Wednesday, January 20, 2016 - 3:28:40 PM - Steve Mangiameli Back To Top (40459)

 

My intent was to show that it could be done more simply without the need of the numbers table.  That is the part I deemed over engineered.  But then, if you have one hanging around for various other tasks, then I can just shut my mouth!  I went ahead and modified the Parse function to return a positional value as well and put the rest of the code in a function (using table variables) to mimic what you are doing.  I added the ability to order as received or by alphanumeric.  I still think this is a little easier to follow, though the looping may cause issues if the string became egregiously long.  That said, your way provides a lot more utility if you have need of the numbers table in other places.

New Function with Positional ID

USE master

GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE type in ('FN', 'IF', 'TF', 'FS', 'FT') AND name = 'gfn_ParseList')

  EXEC sp_executesql N'CREATE FUNCTION gfn_ParseList RETURNS @paresedIDs AS BEGIN SELECT 1 ParsedValue, 1 PositionID RETURN END'

GO

 

ALTER FUNCTION gfn_ParseList (@strToPars VARCHAR(8000), @parseChar CHAR(1))

RETURNS @parsedIDs TABLE

   (ParsedValue VARCHAR(255), PositionID INT IDENTITY)

AS

BEGIN

DECLARE 

  @startPos INT = 0

  , @strLen INT = 0

 

WHILE LEN(@strToPars) >= @startPos

  BEGIN

    

    IF (SELECT CHARINDEX(@parseChar,@strToPars,(@startPos+1))) > @startPos

      SELECT @strLen  = CHARINDEX(@parseChar,@strToPars,(@startPos+1))  - @startPos

    ELSE

      BEGIN

        SET @strLen = LEN(@strToPars) - (@startPos -1)

        

        INSERT @parsedIDs

        SELECT RTRIM(LTRIM(SUBSTRING(@strToPars,@startPos, @strLen)))

        BREAK

      END

      

    SELECT @strLen  = CHARINDEX(@parseChar,@strToPars,(@startPos+1))  - @startPos

    

    INSERT @parsedIDs

    SELECT RTRIM(LTRIM(SUBSTRING(@strToPars,@startPos, @strLen)))

    

    SET @startPos = @startPos+@strLen+1

  END

RETURN

END  

GO

New Dedupe function, using parse function above

USE TEST

GO

IF NOT EXISTS (SELECT * FROM sys.objects WHERE type in ('FN', 'IF', 'TF', 'FS', 'FT') AND name = 'ufn_DedupeString')

  EXEC sp_executesql N'CREATE FUNCTION ufn_DedupeString() RETURNS NVARCHAR(MAX) AS BEGIN RETURN (SELECT CAST(GETDATE() AS NVARCHAR(100))); END';

GO

 

ALTER FUNCTION ufn_DedupeString (@dupeStr VARCHAR(MAX), @strDelimiter CHAR(1), @maintainOrder BIT)

RETURNS NVARCHAR(MAX)

AS

BEGIN  

  DECLARE @tblStr2Tbl  TABLE (ParsedValue VARCHAR(255), PositionID INT);

  DECLARE @tblDeDupeMe TABLE (ParsedValue VARCHAR(255), PositionID INT);

   

  INSERT @tblStr2Tbl

  SELECT DISTINCT ParsedValue, PositionID FROM master.dbo.gfn_ParseList(@dupeStr,@strDelimiter);  

  WITH cteUniqueValues

  AS

  (

    SELECT DISTINCT ParsedValue

    FROM @tblStr2Tbl

  )

  INSERT @tblDeDupeMe

  SELECT d.ParsedValue

    , CASE @maintainOrder

        WHEN 1 THEN MIN(d.PositionID)

      ELSE ROW_NUMBER() OVER (ORDER BY d.ParsedValue)

    END AS PositionID

  FROM cteUniqueValues u

    JOIN @tblStr2Tbl d ON d.ParsedValue=u.ParsedValue

  GROUP BY d.ParsedValue

  ORDER BY d.ParsedValue

  

  DECLARE 

    @valCount INT

  , @curValue VARCHAR(255) =''

  , @posValue INT=0

  , @dedupedStr VARCHAR(4000)=''; 

  

  SELECT @valCount = COUNT(1) FROM @tblDeDupeMe;

  

  WHILE @valCount > 0

  BEGIN

    SELECT @posValue=a.minPos, @curValue=d.ParsedValue

    FROM (SELECT MIN(PositionID) minPos FROM @tblDeDupeMe WHERE PositionID  > @posValue) a

      JOIN @tblDeDupeMe d ON d.PositionID=a.minPos;

  

    SET @dedupedStr+=@curValue;

    SET @valCount-=1;

    

    IF @valCount > 0

      SET @dedupedStr+='/';

  END

 

  RETURN @dedupedStr;

 

END

GO

 

Function Calls with long repetitive string

USE TEST

GO

 

DECLARE 

  @dupeStr VARCHAR(4000) = 'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Papa/Delta/Charlie/Bravo/Delta/Echo/Charlie/Juliet/Charlie/Foxtrot/Foxtrot/Yankee/Juliet/Golf/Alpha/Foxtrot/Golf/India/Delta/Zulu/Quebec/Alpha/Bravo/Tango/Hotel/Victor/Alpha/Delta/Charlie/Hotel/Delta/Echo/Charlie/Echo/Charlie/Foxtrot/Foxtrot/Alpha/Bravo/Golf/Alpha/Foxtrot/Golf/India/Delta/Zulu/Kilo/Alpha/Bravo/Tango/Delta/Romeo/Alpha/Delta/Uniform/Sierra/Delta/Whisky/Charlie/Papa/Charlie/Foxtrot/Foxtrot/Alpha/Xray/Golf/Victor/Foxtrot/Golf/India/Delta/Kilo/Bravo/Lima/Bravo/Tango/Delta/Bravo/Alpha/Delta/Papa/Lima/Delta/Quebec/Romeo/Kilo/Charlie/Foxtrot/Foxtrot/Alpha/Tango/Golf/Alpha/Lima/Golf/India/Victor/Zulu/Bravo/Alpha/Mike/Tango/Delta/Bravo/Alpha/Delta/Yankee/Bravo/Delta/Echo/Whisky/Echo/Charlie/Mike/Foxtrot/Alpha/Bravo/Golf/Alpha/Foxtrot/Golf/Xray/Delta/Mike/Bravo/Alpha/Bravo/November/Delta/Bravo/Alpha/November/Charlie/Bravo/Delta/Echo/Xray/Echo/November/Foxtrot/Foxtrot/Alpha/Bravo/Golf/Alpha/Foxtrot/Golf/India/Delta/Zulu/Quebec/Alpha/Xray/Tango/Oscar/Bravo/Alpha/Oscar/Oscar/Bravo/Delta/Echo/Charlie/Echo/Charlie/Foxtrot/Foxtrot/Alpha/Bravo/Golf/Alpha/Foxtrot/Golf/India/Delta/Zulu/Bravo/Alpha/Romeo/Yankee/Delta/Whisky/Alpha/Delta/Victor/Bravo/Delta/Tango/Charlie/Uniform/Charlie/Foxtrot/Foxtrot/Quebec/Bravo/Romeo/Alpha/Foxtrot/Golf/India/Delta/Zulu/Bravo/Alpha/Xray/Tango/Delta/Bravo/Alpha/Delta/Quebec/Bravo/Delta/Tango/Charlie/Echo/Charlie/Foxtrot/Tango/Quebec/Bravo/Romeo/Alpha/Foxtrot/Golf/India/Delta/Zulu/Bravo/Alpha/Romeo/Tango/Delta/Bravo/Uniform/Delta/Quebec/Bravo/Delta/Tango/Charlie/Echo/Charlie/Whisky/Foxtrot/Quebec/Bravo/Romeo/Alpha/Foxtrot/Golf/India/Sierra/Yankee/Bravo/Alpha/Romeo/Tango/Delta/Bravo/Alpha/Delta/Quebec/Bravo/Delta/Sierra/Charlie/Echo/Charlie/Foxtrot/Foxtrot/Quebec/Juliet/Romeo/Alpha/Foxtrot/Yankee/India/Delta/Zulu/Uniform/Alpha/Romeo/Tango/Delta/Bravo/Alpha/Delta/Quebec/Bravo/Xray/Tango/Charlie/Echo/Victor/Foxtrot/Victor/Quebec/Yankee/Romeo/Alpha/Tango/Yankee/India/Delta/Whisky/'

, @strDelimiter CHAR(1) = '/'

--, @maintainOrder BIT = 1; 

 

SELECT OriginalOrder  =dbo.ufn_DedupeString(@dupeStr, @strDelimiter, 1),

       AlphabeticOrder=dbo.ufn_DedupeString(@dupeStr, @strDelimiter, 0);

 


Wednesday, January 20, 2016 - 11:19:22 AM - Aaron Bertrand Back To Top (40454)

 Steve, also, half of my engineering is the reassemble function, which makes it very easy to pass in the delimited string and get a delimited string back, instead of all the code the user will have to write each time with your version. I thought about wrapping your code in a function, but because you have DDL with the #temp table, it can't be done. Maybe I'll try with a table variable instead of a #temp table.

 


Wednesday, January 20, 2016 - 11:09:03 AM - Aaron Bertrand Back To Top (40453)

Steve, I think you will find that it is over-engineered because it will prove to give better performance than a looping/iteration function; I also missed how yours guarantees that the returned string will be generated in the same order (or how a user could call the function and ask for alphabetical instead).


Tuesday, January 19, 2016 - 5:26:52 PM - Steve Mangiameli Back To Top (40444)

It seems to me this is over-engineered.  Parse the list into a table of distinct values and put it back together again via loop.

 

Parsing function

/********************************************************************************

PROCEDURE NAME   : gfn_ParseList

AUTHOR           : 

CREATED          : 02/14/2012

DESCRIPTION      : Takes a delineated list and parses it out.  Accepts a string and a delineator; assumes comma (',')

CALLED BY        : 

DEPENDS ON       : 

EXAMPLE          : gfn_ParseList (@assesmentIDs, ',')

 

DATE         NAME DESCRIPTION

02/14/2012 steve.mangiameli     Creation

********************************************************************************/

CREATE FUNCTION gfn_ParseList (@strToPars VARCHAR(8000), @parseChar CHAR(1))

RETURNS @parsedIDs TABLE

(ParsedValue VARCHAR(255))

AS

BEGIN

 

DECLARE 

@startPos INT = 0

, @strLen INT = 0

 

WHILE LEN(@strToPars) >= @startPos

BEGIN

 

IF (SELECT CHARINDEX(@parseChar,@strToPars,(@startPos+1))) > @startPos

SELECT @strLen  = CHARINDEX(@parseChar,@strToPars,(@startPos+1))- @startPos

ELSE

BEGIN

SET @strLen = LEN(@strToPars) - (@startPos -1)

 

INSERT @parsedIDs

SELECT RTRIM(LTRIM(SUBSTRING(@strToPars,@startPos, @strLen)))

BREAK

END

 

SELECT @strLen  = CHARINDEX(@parseChar,@strToPars,(@startPos+1))- @startPos

 

INSERT @parsedIDs

SELECT RTRIM(LTRIM(SUBSTRING(@strToPars,@startPos, @strLen)))

 

SET @startPos = @startPos+@strLen+1

END

RETURN

END

  

Loop

DECLARE 

 @dupeStr VARCHAR(4000) = 'Bravo/Alpha/Bravo/Tango/Delta/Bravo/Alpha/Delta'

, @strDelimiter CHAR(1) = '/'

 

IF OBJECT_ID('tempdb..#DeDupeMe','U') IS NOT NULL

DROP TABLE #DeDupeMe

 

SELECT DISTINCT ParsedValue 

INTO #DeDupeMe

FROM gfn_ParseList(@dupeStr,@strDelimiter)

 

DECLARE 

 @valCount INT

, @curValue VARCHAR(255) =''

, @dedupedStr VARCHAR(4000)=''

 

SELECT @valCount = COUNT(1) FROM #DeDupeMe

 

WHILE @valCount > 0

BEGIN

SELECT @curValue=MIN(ParsedValue)

FROM #DeDupeMe

WHERE ParsedValue > @curValue

 

SET @dedupedStr+=@curValue

SET @valCount-=1

 

IF @valCount > 0

SET @dedupedStr+='/'

END

 

SELECT @dedupedStr DeDupe_String

 


Monday, January 18, 2016 - 9:11:47 AM - J Back To Top (40437)

 It's splitting hairs, but page compression doesn't buy you anything in this case over row compression. I believe SQL Server is smart enough to not bother trying, but I'm not 100% sure because, rather than rely on it, I'd specify row compression.

 


Thursday, January 14, 2016 - 9:33:46 AM - Aaron Bertrand Back To Top (40422)

 

Paul, yes, you're right. Most of my code library has a hard-coded delimiter or only accepts char(1) because, in the real world, how often are we using multi-character delimiters? :-) But point taken, I'll try to get the script updated.


Thursday, January 14, 2016 - 6:30:16 AM - Paul Back To Top (40421)

Great code! Thanks.

One tiny error: The last '1' in your code should be len(@delim), so that it works with delimiters of other length than 1 too.

     FOR XML PATH, TYPE).value(N'.[1]',N'nvarchar(max)'),1,len(@delim),N'') 

 















get free sql tips
agree to terms