Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 
Simplify SQL Server Database Development Webcast    ===>    Register Now
 

Access Amazon Redshift data from SQL Server via a Linked Server


By:   |   Read Comments   |   Related Tips: More > Linked Servers

Problem

Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. There are often times when you need to access Redshift data from SQL Server. You could export and import data, but another option is to create a linked server from SQL Server to Redshift. In this tip we walk through how this can be done for AWS Redshift from SQL Server.

Solution

In this tip, we will show how SQL Server can access Redshift data via a linked server.

Install Amazon Redshift ODBC Driver

You can get a copy of the Amazon Redshift ODBC Driver 32-bit or 64-bit to install.

The steps to complete the installation is very simple, following these steps for the installation.

Amazon Redshift install
Amazon Redshift install

Configure Amazon Redshift ODBC Driver

Once the driver has been installed you will see the driver in the ODBC Data Source Administrator.

ODBC Data Source Administrator

After you download and install the ODBC driver, you need to add a data source name (DSN) entry to the client machine. SQL client tools uses this data source to connect to the Amazon Redshift database.

Create a new data source like the example below.

ODBC Data Source Administrator

Select the System DSN tab if you want to configure the driver for all users on the computer or the User DSN tab if you want to configure the driver for your user account only.

Amazon Redshift ODBC Driver DSN setup

Specify the endpoint for your Amazon Redshift cluster. You can find this information in the Amazon Redshift console on the cluster’s details page. Type the port number that the database uses. By default, Amazon Redshift uses port 5439, but you should use the port that the cluster was configured to use when it was launched.

To test, click the Test button. If the client computer can connect to the Amazon Redshift database, you will see the following message: Connection successful.

How to create a SQL Server Linked Server to Amazon Redshift

  1. In SQL Server Management Studio, open Object Explorer, expand Server Objects, right-click Linked Servers, and then click New Linked Server.
  2. On the General Page, type the name of the instance of SQL Server that you area linking to.
  3. Specify an OLE DB server type other than SQL Server.
  4. Select an OLE DB data source from the list box (in this case ODBC Drivers).
  5. Type the product name and the data source will be the ODBC Data Source we created before.
sql server new linked server

The first RPC setting is mainly for a legacy feature called Remote Server. The RPC OUT setting is very pertinent to linked servers on SQL Server. Think about an RPC (Remote Procedure Call) as being a stored procedure being run remotely from server 1 to linked server 2.

Also, you can create a linked server with these T-SQL commands:

USE [master]
GO
EXEC master.dbo.sp_addlinkedserver @server = N'REDSHIFT'
, @srvproduct=N'Redshift'
, @provider=N'MSDASQL'
, @datasrc=N'demo'

EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'REDSHIFT'
,@useself=N'False'
,@locallogin=NULL
,@rmtuser=NULL
,@rmtpassword=NULL
GO

Access Amazon Redshift data from SQL Server

After you connect to the linked server, you will be able to see the remote data in the SSMS Object Explorer.

sql server ssms object explorer

Query Amazon Redshift data from SQL Server

ssms sql query and results

I created an example below to show how to join and merge data with the Redshift environment, load data on SQL Server or insert data into Redshift.

Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of as you design and develop your data warehouse application. You can see more examples here how to create tables on Redshift.

USE demo
GO

CREATE TABLE tb01 (
id INT IDENTITY
, first_name VARCHAR(50)
, last_name VARCHAR(50)
, age SMALLINT
)
GO

TRUNCATE TABLE dbo.tb01
GO

INSERT INTO dbo.tb01 (first_name, last_name, age) VALUES ('douglas', 'correa', 36)
GO

EXEC('create table demodb.public.tbdemo(
id integer not null
,name varchar(100)
,birth timestamp ) ') AT DEMO 

EXEC('insert into demodb.public.tbdemo values(1, ''douglas correa'', getdate() )') AT DEMO
EXEC('insert into demodb.public.tbdemo values(2, ''renato silvestre'', getdate() )') AT DEMO
GO

EXEC('delete from demodb.public.tbdemo') AT DEMO
GO

SELECT T.id, T.first_name, T.last_name, T.age
FROM OPENQUERY(DEMO,'select id, name, birth from demodb.public.tbdemo') AS OQ
INNER JOIN dbo.tb01 AS T ON OQ.id = T.id
GO

MERGE dbo.tb01 AS target
USING (SELECT * 
FROM OPENQUERY(DEMO,'select id, name, birth from demodb.public.tbdemo') AS OQ
) AS source (id, name, birth) 
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET target.age = DATEDIFF(yyyy, GETDATE(), source.birth)
WHEN NOT MATCHED THEN
INSERT (first_name, age) 
VALUES (SUBSTRING(source.name, 0, CHARINDEX(' ',source.name)), DATEDIFF(yyyy, GETDATE(), source.birth))
OUTPUT Inserted.id, Inserted.first_name, Inserted.age;
GO

Running the script above, we can see the output in the screenshots below:

ssms sql query and results

The MERGE results:

ssms sql query and results

SELECT with INNER JOIN results:

ssms sql query and results

The results of these operations depend on type conversion rules and data type compatibility. In general, data types that fall into the same type category (such as different numeric data types) are compatible and can be implicitly converted.

If you compare numeric values with character strings, the numeric values are converted to character strings. When converting DATE or TIMESTAMP to TIMESTAMPTZ, DATE or TIMESTAMP are assumed to use the current session time zone.

Amazon Redshift ODBC Driver Performance

The Connector/ODBC driver has been optimized to provide very fast performance. If you experience problems with the performance of Connector/ODBC, or notice a large amount of disk activity for simple queries, there are a number of aspects to check:

  • Ensure that ODBC Tracing is not enabled. With tracing is enabled, a lot of information is recorded in the tracing file by the ODBC Manager.
  • You can check, and disable, tracing within Windows using the Tracing panel of the ODBC Data Source Administrator.
Next Steps
  • More about option parameters here.
  • Connector/ODBC errors FAQ.


Last Update:


next webcast button


next tip button



About the author
MSSQLTips author Douglas Correa Douglas Correa is a database professional, focused on tuning, high-availability and infrastructure.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools