Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Using TOKEN and TOKENCOUNT in SQL Server Integration Services


By:   |   Read Comments (1)   |   Related Tips: More > Integration Services Development

Attend our free MSSQLTips Webcast - How to Simplify Routine SQL Server Administration Tasks


Problem

SQL Server 2012 introduced a few new functions for the Integration Services (SSIS) expression language. Two of those new functions are Token and TokenCount, which will be discussed in this tip. The new REPLACENULL function is discussed in the tip SQL Server Integration Services REPLACENULL Function.

Solution

Token and TokenCount are two functions introduced in the SQL Server 2012 release of SSIS and they assist you with the splitting of strings, something that was quite difficult in earlier versions of SSIS.

TOKEN

The Token function will split up a given string using the delimiters provided into separate tokens (substrings) and return the token corresponding with the number provided. Its syntax is the following:

TOKEN(string expression, delimiters, occurrence)

For example, TOKEN("2015-01-23","-",2) will return “01”.

Extracting the day part out of a date using TOKEN

Also note it is possible to pass more than one delimiter to the Token function. If the parameter occurrence is higher than the actual number of tokens in the string, NULL is returned.

It’s easy to see TOKEN is really useful for parsing codes, such as telephone numbers, but especially useful for parsing dates. SSIS can be a little difficult when trying to parse strings into dates, but this function makes it a bit more bearable.

Suppose we are reading a flat file with dates in European time. For example, “26/01/2015 17:43:05”. In SSIS 2012 you can cast this to a datetime data type with the following expression:

(DT_DBTIMESTAMP)Date

However, SSIS can sometimes give incorrect results when it fails to understand which format the input has, mm/dd/yyyy or dd/mm/yyyy. The following screenshot clearly illustrates the issue, where SSIS only detected the European format in the second row and mistakenly converted the first row to a date in February:

SSIS can incorrectly convert date time in European format

In order to make sure SSIS has a consistent output, we need to write an expression. In earlier versions of SSIS this was always a hassle with a lot of SUBSTRINGS and FINDSTRINGS, but with TOKEN an easy expression can be written:

(DT_DBTIMESTAMP)(
TOKEN([Date],"/ :",3) -- year
+ "-" + RIGHT("00" + TOKEN([Date],"/ :",2),2) -- month
+ "-" + RIGHT("00" + TOKEN([Date],"/ :",1),2) -- day
+ " " + RIGHT("00" + TOKEN([Date],"/ :",4),2) -- hour
+ ":" + RIGHT("00" + TOKEN([Date],"/ :",5),2) -- minute
+ ":" + RIGHT("00" + TOKEN([Date],"/ :",6),2) -- minute
)

This expression uses multiple delimiters to split up the datetime value into different tokens. The RIGHT("00" + token,2) expression is a useful shorthand to prefix single digits with a 0. For example, 4 becomes 04.

With this expression, SSIS returns a consistent format:

Correctly parse European format

TOKENCOUNT

The function TokenCount is pretty straightforward: it takes a string expression and given a set of delimiters, it returns the count of tokens found in that expression. Its syntax is as follows:

TOKENCOUNT(expression,delimiters)

You can use it for example to count the number of words in a sentence, using the expression TOKENCOUNT(expression,” ”). Here we assume words are always separated with a space.

Counting words

Token and TokenCount can also lead to interesting combinations together. For example, if we want to find the word that comes before the last word, we can use the following expression:

TOKEN(myText," ",TOKENCOUNT(mytext," ")-1)
Finding words

Conclusion

In this tip we have introduced the two functions Token and TokenCount. Both functions use delimiters to split up strings. Token can return a substring, while TokenCount returns the number of tokens found. They can be really useful for parsing strings, such as dates for example.

Next Steps


Last Update:


signup button

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Monday, February 16, 2015 - 11:52:00 AM - Kenny Wymore Back To Top

Good stuff Koen. Thanks!


Learn more about SQL Server tools