Access Amazon Redshift data from SQL Server via a Linked Server

Problem

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. There are often times when you need to access Redshift data from SQL Server. You could export and import data, but another option is to create a linked server from SQL Server to Redshift. In this tip we walk through how this can be done for AWS Redshift from SQL Server.

Solution

In this tip, we will show how SQL Server can access Redshift data via a linked server.

Install Amazon Redshift ODBC Driver

You can get a copy of the Amazon Redshift ODBC Driver 32-bit or 64-bit to install.

The steps to complete the installation is very simple, following these steps for the installation.

Configure Amazon Redshift ODBC Driver

Once the driver has been installed you will see the driver in the ODBC Data Source Administrator.

After you download and install the ODBC driver, you need to add a data source name (DSN) entry to the client machine. SQL client tools uses this data source to connect to the Amazon Redshift database.

Create a new data source like the example below.

Select the System DSN tab if you want to configure the driver for all users on the computer or the User DSN tab if you want to configure the driver for your user account only.

Specify the endpoint for your Amazon Redshift cluster. You can find this information in the Amazon Redshift console on the cluster’s details page. Type the port number that the database uses. By default, Amazon Redshift uses port 5439, but you should use the port that the cluster was configured to use when it was launched.

To test, click the Test button. If the client computer can connect to the Amazon Redshift database, you will see the following message: Connection successful.

How to create a SQL Server Linked Server to Amazon Redshift

In SQL Server Management Studio, open Object Explorer, expand Server Objects, right-click Linked Servers, and then click New Linked Server.
On the General Page, type the name of the instance of SQL Server that you area linking to.
Specify an OLE DB server type other than SQL Server.
Select an OLE DB data source from the list box (in this case ODBC Drivers).
Type the product name and the data source will be the ODBC Data Source we created before.

The first RPC setting is mainly for a legacy feature called Remote Server. The RPC OUT setting is very pertinent to linked servers on SQL Server. Think about an RPC (Remote Procedure Call) as being a stored procedure being run remotely from server 1 to linked server 2.

Also, you can create a linked server with these T-SQL commands:

USE [master]
GO
EXEC master.dbo.sp_addlinkedserver @server = N'REDSHIFT'
, @srvproduct=N'Redshift'
, @provider=N'MSDASQL'
, @datasrc=N'demo'
EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'REDSHIFT'
,@useself=N'False'
,@locallogin=NULL
,@rmtuser=NULL
,@rmtpassword=NULL
GO

Access Amazon Redshift data from SQL Server

After you connect to the linked server, you will be able to see the remote data in the SSMS Object Explorer.

Query Amazon Redshift data from SQL Server

I created an example below to show how to join and merge data with the Redshift environment, load data on SQL Server or insert data into Redshift.

Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of as you design and develop your data warehouse application. You can see more examples here how to create tables on Redshift.

USE demo
GO
CREATE TABLE tb01 (
id INT IDENTITY
, first_name VARCHAR(50)
, last_name VARCHAR(50)
, age SMALLINT
)
GO
TRUNCATE TABLE dbo.tb01
GO
INSERT INTO dbo.tb01 (first_name, last_name, age) VALUES ('douglas', 'correa', 36)
GO
EXEC('create table demodb.public.tbdemo(
id integer not null
,name varchar(100)
,birth timestamp ) ') AT DEMO 
EXEC('insert into demodb.public.tbdemo values(1, ''douglas correa'', getdate() )') AT DEMO
EXEC('insert into demodb.public.tbdemo values(2, ''renato silvestre'', getdate() )') AT DEMO
GO
EXEC('delete from demodb.public.tbdemo') AT DEMO
GO
SELECT T.id, T.first_name, T.last_name, T.age
FROM OPENQUERY(DEMO,'select id, name, birth from demodb.public.tbdemo') AS OQ
INNER JOIN dbo.tb01 AS T ON OQ.id = T.id
GO
MERGE dbo.tb01 AS target
USING (SELECT * 
FROM OPENQUERY(DEMO,'select id, name, birth from demodb.public.tbdemo') AS OQ
) AS source (id, name, birth) 
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET target.age = DATEDIFF(yyyy, GETDATE(), source.birth)
WHEN NOT MATCHED THEN
INSERT (first_name, age) 
VALUES (SUBSTRING(source.name, 0, CHARINDEX(' ',source.name)), DATEDIFF(yyyy, GETDATE(), source.birth))
OUTPUT Inserted.id, Inserted.first_name, Inserted.age;
GO

Running the script above, we can see the output in the screenshots below:

The MERGE results:

SELECT with INNER JOIN results:

The results of these operations depend on type conversion rules and data type compatibility. In general, data types that fall into the same type category (such as different numeric data types) are compatible and can be implicitly converted.

If you compare numeric values with character strings, the numeric values are converted to character strings. When converting DATE or TIMESTAMP to TIMESTAMPTZ, DATE or TIMESTAMP are assumed to use the current session time zone.

Amazon Redshift ODBC Driver Performance

The Connector/ODBC driver has been optimized to provide very fast performance. If you experience problems with the performance of Connector/ODBC, or notice a large amount of disk activity for simple queries, there are a number of aspects to check:

Ensure that ODBC Tracing is not enabled. With tracing is enabled, a lot of information is recorded in the tracing file by the ODBC Manager.
You can check, and disable, tracing within Windows using the Tracing panel of the ODBC Data Source Administrator.

Next Steps

More about option parameters here.
Connector/ODBC errors FAQ.

Douglas Correa

Douglas Correa is a database professional focused on performance tuning database, high-availability, and infrastructure. He has been working with SQL Server for more than ten years. He has performance-tuned databases ranging from 1 gigabyte to terabytes working with PowerShell, SQL Server and Oracle. He has certifications from Microsoft (MCSA SQL Server, MTA Network) and Oracle (OCE, OCA 11g).

3 Comments

jd
October 10, 2023 / 3:52 am Reply
same issue w/ michael and Yan, is there any solution on this?
Cannot initialize the data source object of OLE DB provider “MSDASQL” for linked server “REDSHIFTTEST”.
OLE DB provider “MSDASQL” for linked server “REDSHIFTTEST” returned message “[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified”. (.Net SqlClient Data Provider)
Greg Robidoux
May 19, 2023 / 11:26 am Reply
Thank you.
The links have been updated.
murali
May 19, 2023 / 10:33 am Reply
The Redshift drivers links are broken