Compare PUSH vs PULL Data Copy Performance in SQL Server
By: Ben Snaidero | Updated: 2018-10-17 | Comments (1) | Related: More > Import and Export
As SQL Server database professionals we are always tasked with moving data around. One of the methods I use quite often when copying data between instances (especially when it is just a one-off table copy between test environments) is to setup a linked server between the instances and copy the data using a single INSERT/SELECT command. A question that arises with this method is which is faster, to push (INSERT INTO remotetable SELECT FROM localtable) or pull (INSERT INTO localtable SELECT FROM remotetable) the data. This tip will try to answer that question.
To test the performance difference between push and pull we will also look at 3 different methods for creating the connection between the servers so we can see if the difference is also in any way based on the type of connection used.
The 3 different methods we will used are:
- Linked server with 4-part name reference
- OPENROWSET with OLE DB data source
- OPENQUERY with linked server
One thing to note for this test is that to rule out any network anomalies I setup two SQL Server instances on the same server so that the data does not have to go over a network when it is copied. Also, in order for OPENROWSET to work properly the 'Ad Hoc Distributed Queries' option must be enabled on each SQL Server instance. This can be done by running the following T-SQL commands.
sp_configure 'show advanced options',1 reconfigure go sp_configure 'Ad Hoc Distributed Queries',1 reconfigure go
In order to run this test, we will first need to create a linked server. I won't go details on how to do this in this tip since it's pretty straight forward and depends on your environment, but if you need to you can read more on linked servers here.
For the actual data move we will simulate moving data from an online system to an archive table on another server. For my test case I used a source table that was loaded using a dump from a SQL Profiler trace that had approximately 28000 records in it. A similar table structure (minus the primary key) was created in the target instance as the archive table.
The T-SQL to create these tables is below.
-- run on source SQL instance CREATE TABLE [dbo].[onlinedata]( [RowNumber] [int] NOT NULL, [EventClass] [int] NULL, [TextData] [ntext] NULL, [ApplicationName] [nvarchar](128) NULL, [NTUserName] [nvarchar](128) NULL, [LoginName] [nvarchar](128) NULL, [CPU] [int] NULL, [Reads] [bigint] NULL, [Writes] [bigint] NULL, [Duration] [bigint] NULL, [ClientProcessID] [int] NULL, [SPID] [int] NULL, [StartTime] [datetime] NULL, [EndTime] [datetime] NULL, [BinaryData] [image] NULL, PRIMARY KEY CLUSTERED ([RowNumber] ASC)); -- run on target sql instance CREATE TABLE [dbo].[archivedata]( [RowNumber] [int] NOT NULL, [EventClass] [int] NULL, [TextData] [ntext] NULL, [ApplicationName] [nvarchar](128) NULL, [NTUserName] [nvarchar](128) NULL, [LoginName] [nvarchar](128) NULL, [CPU] [int] NULL, [Reads] [bigint] NULL, [Writes] [bigint] NULL, [Duration] [bigint] NULL, [ClientProcessID] [int] NULL, [SPID] [int] NULL, [StartTime] [datetime] NULL, [EndTime] [datetime] NULL, [BinaryData] [image] NULL);
The actual test script used to test moving the data is outlined below. It is divided into two sections. The first section is run from the source (online) SQL Server instance and pushes the data to the target (archive) SQL Server instance. The second section is run from the target (archive) SQL Server instance and pulls the data from the source (online) SQL Server instance. Since this is purely a performance test I turned on the timing statistics option in SSMS using "SET STATISTICS TIME ON" so that the duration of each statement was output after it completed.
-- PUSH data from online system (source) INSERT INTO [localhost\archive].master.dbo.archivedata SELECT * FROM onlinedata; INSERT OPENROWSET('SQLNCLI', 'Server=localhost\archive;uid=sa;pwd=####;','SELECT * FROM master.dbo.archivedata') SELECT * FROM onlinedata; INSERT INTO OPENQUERY([localhost\archive],'SELECT * FROM master.dbo.archivedata') SELECT * FROM master.dbo.onlinedata; -- PULL data from archive system (target) INSERT INTO archivedata SELECT * FROM [localhost].master.dbo.onlinedata; INSERT INTO archivedata SELECT a.* FROM OPENROWSET('SQLNCLI', 'Server=localhost;uid=sa;pwd=####;', 'SELECT * FROM master.dbo.onlinedata') AS a; INSERT INTO archivedata SELECT * FROM OPENQUERY([localhost],'SELECT * from master.dbo.onlinedata') b;
Below is a table that summarizes the script results. It's quite obvious that the pull method performs much faster than the push method in all 3 cases but why is this the case. Let's take a look at a SQL Profiler trace to see if we can answer that question.
|PUSH (ms)||PULL (ms)|
Test Results Explanation
Let’s first take a look at the SQL Profiler trace from the PULL method. Checking first the trace from the source (online) server we see that it is performing a simple SELECT of all columns just as we would suspect.
On the target (archive) server the PULL method is also just doing a simple insert.
Now let’s take a look at the slower PUSH option and see if we can find where/why we are getting this extra execution time. First looking at SQL Profiler trace from the source (online) server we see a similar query to what we saw on the target (archive) with the PULL method with only difference being the longer duration and no write activity (this would now happen on the target (archive) server.
Looking at the SQL Profiler trace from the target (archive) server we can now see where all the extra execution time is coming from for the PUSH method. The linked server is implicitly opening a cursor and running a separate cursor call for each record inserted. Note: I’ve only included a subset of the sp_cursor calls in the interest of saving space.
Here is additional output from the trace.
Based on this simple test it’s pretty easy to see why using the PULL method is definitely the best option for performance. Even with this small dataset we saw some really large gains in performance by pulling data to our archive server rather than pushing it from the online server.
- Read other tips on ETL technologies:
- Read other tips on linked servers:
- Read more on using OPENROWSET:
Last Updated: 2018-10-17
About the author
View all my tips