Access Amazon Redshift data from SQL Server via a Linked Server
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. There are often times when you need to access Redshift data from SQL Server. You could export and import data, but another option is to create a linked server from SQL Server to Redshift. In this tip we walk through how this can be done for AWS Redshift from SQL Server.
In this tip, we will show how SQL Server can access Redshift data via a linked server.
Install Amazon Redshift ODBC Driver
The steps to complete the installation is very simple, following these steps for the installation.
Configure Amazon Redshift ODBC Driver
Once the driver has been installed you will see the driver in the ODBC Data Source Administrator.
After you download and install the ODBC driver, you need to add a data source name (DSN) entry to the client machine. SQL client tools uses this data source to connect to the Amazon Redshift database.
Create a new data source like the example below.
Select the System DSN tab if you want to configure the driver for all users on the computer or the User DSN tab if you want to configure the driver for your user account only.
Specify the endpoint for your Amazon Redshift cluster. You can find this information in the Amazon Redshift console on the cluster’s details page. Type the port number that the database uses. By default, Amazon Redshift uses port 5439, but you should use the port that the cluster was configured to use when it was launched.
To test, click the Test button. If the client computer can connect to the Amazon Redshift database, you will see the following message: Connection successful.
How to create a SQL Server Linked Server to Amazon Redshift
- In SQL Server Management Studio, open Object Explorer, expand Server Objects, right-click Linked Servers, and then click New Linked Server.
- On the General Page, type the name of the instance of SQL Server that you area linking to.
- Specify an OLE DB server type other than SQL Server.
- Select an OLE DB data source from the list box (in this case ODBC Drivers).
- Type the product name and the data source will be the ODBC Data Source we created before.
The first RPC setting is mainly for a legacy feature called Remote Server. The RPC OUT setting is very pertinent to linked servers on SQL Server. Think about an RPC (Remote Procedure Call) as being a stored procedure being run remotely from server 1 to linked server 2.
Also, you can create a linked server with these T-SQL commands:
USE [master] GO EXEC master.dbo.sp_addlinkedserver @server = N'REDSHIFT' , @srvproduct=N'Redshift' , @provider=N'MSDASQL' , @datasrc=N'demo' EXEC master.dbo.sp_addlinkedsrvlogin @rmtsrvname=N'REDSHIFT' ,@useself=N'False' ,@locallogin=NULL ,@rmtuser=NULL ,@rmtpassword=NULL GO
Access Amazon Redshift data from SQL Server
After you connect to the linked server, you will be able to see the remote data in the SSMS Object Explorer.
Query Amazon Redshift data from SQL Server
I created an example below to show how to join and merge data with the Redshift environment, load data on SQL Server or insert data into Redshift.
Amazon Redshift is based on PostgreSQL 8.0.2. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of as you design and develop your data warehouse application. You can see more examples here how to create tables on Redshift.
USE demo GO CREATE TABLE tb01 ( id INT IDENTITY , first_name VARCHAR(50) , last_name VARCHAR(50) , age SMALLINT ) GO TRUNCATE TABLE dbo.tb01 GO INSERT INTO dbo.tb01 (first_name, last_name, age) VALUES ('douglas', 'correa', 36) GO EXEC('create table demodb.public.tbdemo( id integer not null ,name varchar(100) ,birth timestamp ) ') AT DEMO EXEC('insert into demodb.public.tbdemo values(1, ''douglas correa'', getdate() )') AT DEMO EXEC('insert into demodb.public.tbdemo values(2, ''renato silvestre'', getdate() )') AT DEMO GO EXEC('delete from demodb.public.tbdemo') AT DEMO GO SELECT T.id, T.first_name, T.last_name, T.age FROM OPENQUERY(DEMO,'select id, name, birth from demodb.public.tbdemo') AS OQ INNER JOIN dbo.tb01 AS T ON OQ.id = T.id GO MERGE dbo.tb01 AS target USING (SELECT * FROM OPENQUERY(DEMO,'select id, name, birth from demodb.public.tbdemo') AS OQ ) AS source (id, name, birth) ON target.id = source.id WHEN MATCHED THEN UPDATE SET target.age = DATEDIFF(yyyy, GETDATE(), source.birth) WHEN NOT MATCHED THEN INSERT (first_name, age) VALUES (SUBSTRING(source.name, 0, CHARINDEX(' ',source.name)), DATEDIFF(yyyy, GETDATE(), source.birth)) OUTPUT Inserted.id, Inserted.first_name, Inserted.age; GO
Running the script above, we can see the output in the screenshots below:
The MERGE results:
SELECT with INNER JOIN results:
The results of these operations depend on type conversion rules and data type compatibility. In general, data types that fall into the same type category (such as different numeric data types) are compatible and can be implicitly converted.
If you compare numeric values with character strings, the numeric values are converted to character strings. When converting DATE or TIMESTAMP to TIMESTAMPTZ, DATE or TIMESTAMP are assumed to use the current session time zone.
Amazon Redshift ODBC Driver Performance
The Connector/ODBC driver has been optimized to provide very fast performance. If you experience problems with the performance of Connector/ODBC, or notice a large amount of disk activity for simple queries, there are a number of aspects to check:
- Ensure that ODBC Tracing is not enabled. With tracing is enabled, a lot of information is recorded in the tracing file by the ODBC Manager.
- You can check, and disable, tracing within Windows using the Tracing panel of the ODBC Data Source Administrator.
Last Updated: 2018-04-05
About the author
View all my tips