Converting Invalid Date Formats Using SQL Server 2000 DTS
Importing data into SQL Server 2000 using Data Transformation Services isn't always picking a source and destination and letting it fly. Sometimes data is formatted in a way that SQL Server won't accept. In these cases the phrase "Extract, Transform, and Load" (ETL) need to be treated as "Extract, Load, and Transform", with dates being especially troublesome. There are a number of ways to transform data as it comes into the database, the most popular of which is either through a view or ActiveX script transformation. This tip will discuss using a view to manipulate the data as it gets moved to the final table.
Let's say we want to create a database application that reminds us of all the great upcoming concerts. We find a place that provides a text file of this information: event date, event name, and event location. Everything looks straightforward until we examine the date format. The dates in the text file look like this: "2006-31-10" (YYYY-DD-MM). To find out whether the date format is acceptable we open Query Analyzer and run the following query:
The IsDate keyword tells SQL Server to evaluate the string entered to see if it is an acceptable format. If it is valid then "1" will be returned, if not then "0" is returned. In this case it isn't, but we luck out because the date string is always 10 characters.
We decide on the following plan:
- Create a database named "concertdates"
- Create a "working" table named "tbltempconcertdates". The table has an eventdate column of varchar(20), an eventname column of varchar(200), and an eventlocation of varchar(100):
- Create a view named vwconcerttransform that is based on the working table. The eventdate column in the view allows a re-arrangement of the table column to "MM/DD/YYYY", a valid date format, and converts it from type varchar to smalldatetime. This is accomplished by concatenation using right(eventdate, 2) + '/' + substring(eventdate, 6, 2) + '/' + left(eventdate, 4):
- Instead of using temp tables, we will keep the "working" table in the database permanently
Now we create our DTS package. First we create a SQL Connection to connect to the concertdates database:
Then we add an Execute SQL Task that checks to see if the "tbltempconcertdates" table exists. If it exists then it is truncated, otherwise it is created.
At this point we have a choice on how to import the data-either a Bulk Insert Task or Transform Data Task. Since we will be performing the data transformation once the data is inside the database, the Bulk Insert Task is the fastest way to get the information there.
We create two database connections, both of which connect to the concertdates database. We choose the view as the source and the permanent table as the destination. Since the view is manipulating the data in the "working" table to a valid date format, the transformations are set to copy the data as is:
Here is a diagram of the package:
- When faced with invalid date formats that must be imported, consider the use of views to manipuate the format
- Stay tuned for a future tip focusing on using ActiveX Scripts to load and convert data
- Stay tuned for a future tip focusing on time the DTS data load time trails
- Read MSDN's DTS Overview
- Review information on the Bulk Insert Task and Execute SQL Task
- Review information on the DTS Connections
- Read additional tips on Data Transformation Services from MSSQLTIPS
About the author
View all my tips