Using TOKEN and TOKENCOUNT in SQL Server Integration Services
SQL Server 2012 introduced a few new functions for the Integration Services (SSIS) expression language. Two of those new functions are Token and TokenCount, which will be discussed in this tip. The new REPLACENULL function is discussed in the tip SQL Server Integration Services REPLACENULL Function.
Token and TokenCount are two functions introduced in the SQL Server 2012 release of SSIS and they assist you with the splitting of strings, something that was quite difficult in earlier versions of SSIS.
The Token function will split up a given string using the delimiters provided into separate tokens (substrings) and return the token corresponding with the number provided. Its syntax is the following:
TOKEN(string expression, delimiters, occurrence)
For example, TOKEN("2015-01-23","-",2) will return 01.
Also note it is possible to pass more than one delimiter to the Token function. If the parameter occurrence is higher than the actual number of tokens in the string, NULL is returned.
Its easy to see TOKEN is really useful for parsing codes, such as telephone numbers, but especially useful for parsing dates. SSIS can be a little difficult when trying to parse strings into dates, but this function makes it a bit more bearable.
Suppose we are reading a flat file with dates in European time. For example, 26/01/2015 17:43:05. In SSIS 2012 you can cast this to a datetime data type with the following expression:
However, SSIS can sometimes give incorrect results when it fails to understand which format the input has, mm/dd/yyyy or dd/mm/yyyy. The following screenshot clearly illustrates the issue, where SSIS only detected the European format in the second row and mistakenly converted the first row to a date in February:
In order to make sure SSIS has a consistent output, we need to write an expression. In earlier versions of SSIS this was always a hassle with a lot of SUBSTRINGS and FINDSTRINGS, but with TOKEN an easy expression can be written:
(DT_DBTIMESTAMP)( TOKEN([Date],"/ :",3) -- year + "-" + RIGHT("00" + TOKEN([Date],"/ :",2),2) -- month + "-" + RIGHT("00" + TOKEN([Date],"/ :",1),2) -- day + " " + RIGHT("00" + TOKEN([Date],"/ :",4),2) -- hour + ":" + RIGHT("00" + TOKEN([Date],"/ :",5),2) -- minute + ":" + RIGHT("00" + TOKEN([Date],"/ :",6),2) -- minute )
This expression uses multiple delimiters to split up the datetime value into different tokens. The RIGHT("00" + token,2) expression is a useful shorthand to prefix single digits with a 0. For example, 4 becomes 04.
With this expression, SSIS returns a consistent format:
The function TokenCount is pretty straightforward: it takes a string expression and given a set of delimiters, it returns the count of tokens found in that expression. Its syntax is as follows:
You can use it for example to count the number of words in a sentence, using the expression TOKENCOUNT(expression, ). Here we assume words are always separated with a space.
Token and TokenCount can also lead to interesting combinations together. For example, if we want to find the word that comes before the last word, we can use the following expression:
TOKEN(myText," ",TOKENCOUNT(mytext," ")-1)
In this tip we have introduced the two functions Token and TokenCount. Both functions use delimiters to split up strings. Token can return a substring, while TokenCount returns the number of tokens found. They can be really useful for parsing strings, such as dates for example.
- Try to think of other ways where you could use these new functions.
- Another new function, REPLACENULL, was described in the tip SQL Server Integration Services REPLACENULL Function.
- For more Integration Services data flow tips, please check this overview on MSSQLTips.com.
About the author
View all my tips