Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 
The Trade-off Between SQL Server Security and Performance - Free Webinar
 

Dealing with the single-character delimiter in SQL Server's STRING_SPLIT function


By:   |   Last Updated: 2017-06-22   |   Comments   |   Related Tips: More > Functions - System

Problem

SQL Server 2016 introduced a new built-in function, STRING_SPLIT(), which offers much better performance than the methods many of us have implemented ourselves. The function does come with some limitations, though; most notably, it only supports a single-character delimiter, which means you can’t use the function directly on data that has a separator using more than one character, like ^^^ or ~|~.

Solution

The most obvious solution would be to change your data so that it only requires a single-character separator. Of course, this is not practical for most users; the data – and all of its touchpoints – are not always in your control, and can’t always all be changed at once. So what else can be done?

Well, let’s say you have a table like this:

CREATE TABLE dbo.Pets
(
  StudentID int PRIMARY KEY,
  PetNames  nvarchar(4000)
);

And it currently stores data like this:

INSERT dbo.Pets(StudentID, PetNames) VALUES
  (1, N'Furface~|~Snowball~|~Max'),
  (2, N'Kirby~|~Quigley'),
  (3, N'Dax~|~Spike~|~Bowser~|~Rosco');

Depending on the balance of reads and writes, and how often you need the data in concatenated vs. separated forms, you have three general options:

1. Change the data on read

If you have an application or report that mostly relies on the concatenated data appearing as above (with the ~|~ delimiter), but still need to occasionally split the data out into a relational form, then you could create a view that STRING_SPLIT looks at, which first replaces the multi-character separator with a single-character separator before handing the data off to the function. The trick is to choose a Unicode character that is not possible to appear in the actual data, like NCHAR(9999), which is a pencil (✏). So your view would look like this:

CREATE VIEW dbo.AltPets
AS
    SELECT StudentID,
           PetNames = REPLACE(PetNames, N'~|~', NCHAR(9999))
    FROM dbo.Pets;

Now the application can still look at the table directly if it wants the 3-character delimiter version. But other code that wants to split the data can instead look at the view, which contains values that can be processed by STRING_SPLIT:

SELECT v.StudentID, s.Value
  FROM dbo.AltPets AS v
  OUTER APPLY STRING_SPLIT(v.PetNames, NCHAR(9999)) AS s;

You can do this without the view, of course, but you’d have to inline the REPLACE() expression into all the queries:

SELECT p.StudentID, s.Value
  FROM dbo.Pets AS p
  OUTER APPLY STRING_SPLIT(REPLACE(p.PetNames, N'~|~', NCHAR(9999)), NCHAR(9999)) AS s;

I talked about this recently in an answer on the Database Administrators Stack Exchange.

2. Change the data on write

If your application spends more time splitting the data than storing it, then you may want to handle this the other way around – replace the delimiters as you write the data to the table, either via a trigger or changing the data access layer. Then a view could be created to run the replacement the other way when you need to show the original format with the 3-character separator:

CREATE VIEW dbo.AltPets2
AS
    SELECT StudentID,
           PetNames = REPLACE(PetNames, NCHAR(9999), N'~|~')
    FROM dbo.Pets;

3. Change the storage altogether

The above two options work best when you can’t change all aspects – for example, the application might be hard-coded to concatenate the data with this 3-character delimiter (or expect it to be presented that way), and you can’t change the app due to resources or because the app belongs to a 3rd party.

If you do have control over the app, though, a better solution might be to stop splitting and concatenating in the first place. The schema could instead be:

CREATE TABLE dbo.StudentPets
(
  StudentID int,
  PetName nvarchar(255)
);

Then you could have a table-valued parameter used to insert/update delete data:

CREATE TYPE dbo.PetNameTVP
(
  PetName nvarchar(255)
);

Then the application – which is probably already using a structured data type, like a DataTable, to store the data before passing it to SQL Server anyway – can just send the DataTable to a stored procedure, such as:

CREATE PROCEDURE dbo.OverwriteStudentPets
  @StudentID int,
  @PetNames dbo.PetNameTVP READONLY
AS
BEGIN
  SET NOCOUNT ON;
  DELETE dbo.StudentPets WHERE StudentID = @StudentID;
  INSERT dbo.StudentPets(StudentID, PetName)
    SELECT @StudentID, PetName FROM @PetNames;
END
GO

Now there is no messy splitting happening, delimiters become irrelevant, and performance will probably improve, too.

Next Steps


Last Updated: 2017-06-22


next webcast button


next tip button



About the author
MSSQLTips author Aaron Bertrand Aaron Bertrand (@AaronBertrand) is a Product Manager at SentryOne, with industry experience dating back to Classic ASP and SQL Server 6.5. He is editor-in-chief of the performance-related blog, SQLPerformance.com, and serves as a community moderator for the Database Administrators Stack Exchange.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools