Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Scripts          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Dynamically Import data from a foreign database using a SQL Server Linked Server


By:   |   Last Updated: 2009-12-08   |   Comments (2)   |   Related Tips: 1 | 2 | 3 | More > Linked Servers

Problem

You have a foreign database, different than SQL Server. This database can be Oracle, DB2, Progress DB, Sybase or any other database platform that can be connected to by using ODBC or OLE DB providers. You need to import data from many tables that reside in the remote database into your SQL Server database. The list of tables is dynamic (tables may be added or removed from the list or from the remote/destination databases) and the schema could also change. It is possible that the tables on the SQL Server side also include additional column, such as the datetime when the row was imported or a timestamp.

In this article, I show how this can be done by using Linked Servers. Note that it is also possible to be implemented via an SSIS package, but it's much more complicated. An example of such an implementation can be found here.

Solution

There are three possible requirements for the implementation of this import process:

  • Drop and recreate the tables in the destination database (SQL Server) based on the schema in the source database.
  • Truncate destination tables (SQL Server) and INSERT the data from the source database based on the schema in the destination database.
  • INSERT data into destination database (SQL Server) using a WHERE clause in the source database. This will append data to the existing tables, without cleaning up the tables before (inserting deltas).

I will provide scripts to implement each of the above options. The script will be running from SQL Server (which in our case is the destination database).

Solution components:

The solution requires the following:

  • A table in the local SQL Server database that will contain the list of the tables that need to be imported. There is also a flag column called IsEnabled, so that tables can be enabled/disabled from the list without being removed:
CREATE TABLE tblListOfTables  
   (TableName VARCHAR(200) NOT NULL PRIMARY KEY,  
    IsEnabled bit DEFAULT 1  
   )    
  • A working Linked Server to the source (external) database. Learn how to create the Linked Server here. In this article I use the names: MyLinkedServer.MyDB.MySchema for the Linked Server name, remote database name and remote schema name. You will need to change this to match your linked server objects. Also, I am using a fixed schema, so you can adjust this script to be more dynamic if needed.
  • In my solution, the local imported tables in SQL Server have the same name and contain column names that are identical to the names in the source table. If needed, you can extend the tblListOfTables table to include different table names or different column names, but this does complicate the solution.

Before you get started make sure you replace this code MyLinkedServer.MyDB.MySchema in the scripts below with your linked server name, source database name and source schema name.

Solution #1

Drop and recreate the tables in the destination database (SQL Server) based on the schema in the source database.

This will create an exact duplicate of the table and the data in the destination database. If the table exists it will first be dropped.

DECLARE @Loop INT,  
        @cmd VARCHAR(MAX),  
        @rc INT,   
        @TB VARCHAR(130)    

-- I am creating a new temporary table with a sequential ID (and no gaps),   
-- so I can loop by it:  
SELECT ROWID = IDENTITY(INT,1,1), TableName   
INTO #Tables   
FROM tblListOfTables  
WHERE IsEnabled = 1  
ORDER BY TableName   

SET @rc = @@ROWCOUNT    

SET  @Loop = 1    

-- Looping on table names, dropping and recreating each:   
WHILE @Loop <= @rc   
BEGIN  
   SELECT @TB = TableName FROM #Tables WHERE ROWID = @Loop     

   SET @cmd = ''          

   SELECT @cmd = 'IF EXISTS (SELECT 1 FROM sys.objects WHERE Type = ''U''' +  
                 'AND name = ''' + @TB  + ''') DROP TABLE [' + @TB + '];' +  
                 'SELECT * INTO  [' + @TB  + '] ' +  
                 'FROM MyLinkedServer.MyDB.MySchema.[' + @TB  + ']'      

   EXEC (@cmd)  
   SET @Loop = @Loop + 1   
END  

DROP TABLE #Tables   
GO   

Solution #2

Truncate destination tables (SQL Server) and INSERT the data from the source database based on the schema in the destination database.

Assumptions: table and column names are identical in source and destination tables. Destination tables already exist and may have additional columns that will be excluded from the insert (optional). The insert will be created on the fly based on the schema of the destination tables.

In the following example, the destination table includes the additional column shown below that does not exist in the source and should not be imported:

RowImported datetime NOT NULL default getdate()
DECLARE @Loop INT,  
        @cmd VARCHAR(MAX),  
        @rc INT,   
        @TB VARCHAR(130)    

-- I am creating a new temporary table with a sequential ID (and no gaps),   
-- so I can loop by it:  
SELECT ROWID = IDENTITY(INT,1,1), TableName   
INTO #Tables   
FROM tblListOfTables  
WHERE IsEnabled = 1  
ORDER BY TableName   

SET @rc = @@ROWCOUNT    

SET  @Loop = 1    

-- Looping on table names, truncating and inserting data to each:   
WHILE @Loop <= @rc   
BEGIN     
   SELECT @TB = TableName FROM #Tables WHERE ROWID = @Loop     

   SET @cmd = ''          

   SELECT @cmd = @cmd + ',[' + name + ']'     
   FROM sys.columns     
   WHERE OBJECT_NAME(OBJECT_ID) = @TB        
   AND name NOT IN ('RowImported') -- Excluding the additional column  
                                   -- (remove this line if not required  
                                   --  or add more columns if necessary)   

   SELECT @cmd = 'TRUNCATE TABLE [' + @TB+ '];' +  
                 ' INSERT INTO  [' + @TB  + '] ('  + SUBSTRING(@cmd,2,LEN(@cmd)) + ')' +  
                 ' SELECT ' + SUBSTRING(@cmd,2,LEN(@cmd)) +   
                 ' FROM MyLinkedServer.MyDB.MySchema.[' + @TB  + ']'     

   EXEC (@cmd)     
   SET @Loop = @Loop + 1   
END  

DROP TABLE #Tables  
GO  
  

Solution #3

INSERT data into destination database (SQL Server) using a WHERE clause in the source database. This will append data to the existing tables, without cleaning up the tables before (inserting deltas).

Assumption: table and column names are identical in source and destination tables. Destination tables already exist and may have additional columns that will be excluded from the insert (optional).

In the following example, destination table includes the additional column:

RowImported datetime NOT NULL default getdate()

In this example, the delta will be fetched based on the last SeqNumber column in the source table. (Note that if you use date datatypes in the WHERE clause, you will have to adjust your query to the date format in the source database, according to the format of the specific platform):

In this example you will need to replace SeqNumber and @SeqNo with the appropriate columns from your tables. Since this is hard coded for all tables you could also make this more dynamic by adding this logic to the control table.

DECLARE @Loop INT,  
        @cmd VARCHAR(MAX),  
        @rc INT,   
        @TB VARCHAR(130)    

-- I am creating a new temporary table with a sequential ID (and no gaps),   
-- so I can loop by it:  
SELECT ROWID = IDENTITY(INT,1,1), TableName   
INTO #Tables   
FROM tblListOfTables  
WHERE IsEnabled = 1  
ORDER BY TableName   

SET @rc = @@ROWCOUNT    
SET  @Loop = 1   

WHILE @Loop <= @rc   
BEGIN     
   SELECT @TB = TableName FROM #Tables WHERE ROWID = @Loop     

   SET @cmd = ''          

   SELECT @cmd = @cmd + ',[' + name + ']'     
   FROM sys.columns     
   WHERE OBJECT_NAME(OBJECT_ID) = @TB        
   AND name NOT IN ('RowImported') -- Excluding the additional column                                       
                                   -- (remove this line if not required  
                                   --  or add more columns if necessary)      

   -- First step would be to read the last fetched Sequential Number from which    
   -- remote data will be fetched:    
   SELECT @cmd = 'DECLARE @SeqNo int; SELECT @SeqNo = max(SeqNumber) from [' + @TB+ '];' +  
                 'INSERT INTO  [' + @TB  + '] ('  + SUBSTRING(@cmd,2,LEN(@cmd)) + ')' +  
                 ' SELECT ' + SUBSTRING(@cmd,2,LEN(@cmd)) +  
                 ' FROM MyLinkedServer.MyDB.MySchema.[' + @TB  + '] WHERE SeqNumber > @SeqNo'     

   EXEC (@cmd)     
   SET @Loop = @Loop + 1   
END  

DROP TABLE #Tables  
GO   
  

Summary

In this article I provided a simple solution to import data dynamically from a remote external database via a Linked Server.

You can expand your solution taking into consideration the following:

  • Encapsulate everything in a transaction, if it is required.
    • This is not recommended because it involves MSDTC transactions. The best approach would be to import everything into a staging database in SQL Server (with no transaction) and then use a local transaction to update the SQL Server Production tables.
  • If the number of tables to be imported is big, consider running a few processes in parallel. You can encapsulate the above logic in an SSIS package or a program to split the load between a few processes that will run in parallel.
Next Steps
  • Learn how to Create and Destroy Linked Server on demand.
  • Modify the above T-SQL by replacing the object names with the names in your environment.
  • Schedule the above T-SQL via a SQL Server Agent job according to your needs.
  • Implement a Log table in which you can capture failure/success of each import as well as the time each import took. Possibly, implement Try/catch blocks to assist with the error capturing.


Last Updated: 2009-12-08


get scripts

next tip button



About the author




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Thursday, February 04, 2010 - 2:24:47 PM - nosnetrom Back To Top

Great article... this SQL Server rookie spent a fair bit of time trying to figure how to connect to an Exchange 2003 data store using Linked Servers; I was able to connect, but never was able to see inside. From what I read, the most expedient way to connect was from an instance of SQL Server on the same box as the Exchange server, and we put SQL Server 2K5 Express Edition on just for that purpose. Does anyone have any experience trying to connect from SQL Server 2005 to Exchange 2003?

(FWIW, this all revolves around a Web app we are developing with a business partner; the only way they have to feed their data back to us is by email.  :-\  I'd like to programmatically pick off the incoming messages and extract the data with some regular expressions. I've got the regex working, but I need a better way to access the incoming data rather than opening Outlook and saving multiple messages as .TXT files!)


Tuesday, December 08, 2009 - 2:05:32 PM - Sum_Of_David Back To Top

Wow, I wish this article would have been out months ago.  I just had to accomplish this with a Teradata server.  I had originally created a DTS package in SQL 2000 that imported all 400 tables into my SQL Server.  I had to truncate and reimport all of the data every day.  The source Teradata schema changed every quarter or so and maintaining that package was quite a chore.

 We're migrating our SQL box to 2008 and I was having nightmares about having to convert that package to SSIS.  I came up with the exact solution you did, except I have some of the more complicated issues.

1) I only import particular columns from the source db.  Therefore, my metadata table also has a ColumnName column.

2) Some of the columns I import, I use Teradata SQL to transform the data.  It may be a simple CAST to a CASE statement to something more complicated.  So I have a ColumnFormula column also. 

3) Since as you mentioned importing all these tables serially takes the longest time, I added a Thread column so that I could use the SQL Agent to import all the tables for a particular thread, and then go off and start executing another parallel task for the other threads.

4) I've also added audit steps into everything so that I can determine 1) how many rows were imported and 2) how long the import took for each table.

This process makes adding new tables or columns or even changing the import of a column a simple step as managing the data within the metadata table(s).

Once again, excellent article that probably only applies to a limited group of people yet for those that it does...it's a definite timesaver.


Learn more about SQL Server tools