SQL Azure Cross Database Querying

By:   |   Comments (6)   |   Related: > Azure


Problem

Azure SQL Database is a Platform As A Service (PAAS) offering from Microsoft. It is really great for databases that are less than or equal to 1 TB in size or larger databases that can be sharded (split) into smaller ones that meet this requirement. Here are the sizing details for both single and elastic Azure SQL Databases.

Many commercial off the shelf systems (COTS) contain modules that you can buy. For instance, a given SAP R/3 installation might have human resource, plant maintenance, material management, project systems, production planning, quality management, sales and distribution, financial accounting, asset accounting and fund management modules. It would not be uncommon for the database developer to query multiple related databases from a single database for reporting.

How can this same business requirement be solved with Azure SQL database?

Solution

Microsoft has provided the database developer with elastic database query that allows multiple databases to be accessed from a single end point. Before the release of this feature, an extract, transform and load (ETL) job would be required to duplicate the data.

Business Problem

This article continues with the Big Jon Investments sample database. Please see my article on Azure Create Database Copy for more details on how to setup this environment.

Like most companies, the business owners have decided to change the requirements of the copied database. Instead of capturing cumulative data for the GSPC mutual fund, they only want to save off the current monthly data. A new database that leverages elastic database queries will be created to join the data into one logical view.

Preparing Data

The image below shows the current proof of concept environment. The [PORTFOLIO-201510] database contains just October information and the [PORTFOLIO-201511] database contains both October and November information. This trend continues until the [PORTFOLIO-201609] database has 12 months of cumulative data.

Twelve Cummlative Databases

How can we reduce the amount of data in each Azure SQL Database (COPY) to one month?

One way to accomplish this goal is to write a custom DELETE statement for each database. The image below is the correct statement to leave just November data in the [PORTFOLIO-201511] database.

Just Monthly Data

Database Setup

The syntax for querying related databases (vertical partitioning) from one Azure SQL database using elastic database query resembles polybase. However, it is not!

PolyBase is a technology that accesses and combines both non-relational and relational data from within SQL Server 2016. Query results can be moved from either Hadoop or Azure blob storage to a SQL Server 2016 relational table or vice versa.

To solve our business problem, we are going to work through the following steps.

  1. For each database, do the following:
    1. Create a contained database user
    2. Give this user [db_owner rights].
  2. Create a new database.
  3. Create a new schema.
  4. Create a master encryption key.
  5. Create a database scoped credential.
  6. For each database, do the following:
    1. Define an external data source.
    2. Define an external table.
  7. Create one view to [UNION ALL] external tables.

The transact SQL code below creates a contained database user named [USR_CROSS_DBMS] and grants that user [db_owner] rights. Manually connect to each of the twelve databases and execute the code below.

--
-- Step 1 - Execute on all databases 
-- 

-- Remove existing user
DROP USER IF EXISTS [USR_CROSS_DBMS]
GO

-- Create new user
CREATE USER USR_CROSS_DBMS WITH PASSWORD = '4WhOjG4bgIW8w7d0',
  DEFAULT_SCHEMA=[ACTIVE] 
GO

-- Add user to the database owner role
EXEC sp_addrolemember N'db_owner', N'USR_CROSS_DBMS'
GO

When writing code, it is import that the Transact SQL is restartable. That means the existence of an object should be checked before creating the object. Remove any existing objects before creating a new object.

The Transact SQL code below creates a database called [ONE-SHEET], a schema referred to as [SHD], a master encryption key and a database scoped credential designated as [CRD_CROSS_DBMS].

For more information, see the MSDN articles on CREATE MASTER KEY ENCRYPTION and CREATE DATABASE SCOPED CREDENTIAL.

--
-- Step 2 - Create new database (run from master)
-- 

-- Delete existing database
DROP DATABASE IF EXISTS [ONE-SHEET]
GO

-- Create new database
CREATE DATABASE [ONE-SHEET]
(
EDITION = 'STANDARD',
SERVICE_OBJECTIVE = 'S0'
)
GO 


--
-- Step 3 - Create new schema (run from user defined database)
--

-- Delete existing schema.
DROP SCHEMA IF EXISTS [SHD]
GO
 
-- Add new schema.
CREATE SCHEMA [SHD] AUTHORIZATION [dbo]
GO


--
-- Step 4 - Create master key
-- 

-- Drop master key
IF EXISTS (SELECT * FROM sys.symmetric_keys WHERE name = '##MS_DatabaseMasterKey##')
DROP MASTER KEY;


-- Create master key
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Qkmof0SV3yxReKEP'; 
GO


--
-- Step 5 - Database Credential
--

-- Drop db credential
IF EXISTS(SELECT * FROM sys.database_credentials WHERE name = 'CRD_CROSS_DBMS')
DROP DATABASE SCOPED CREDENTIAL CRD_CROSS_DBMS ;  
GO 

-- Create db credential
CREATE DATABASE SCOPED CREDENTIAL CRD_CROSS_DBMS 
WITH IDENTITY = 'USR_CROSS_DBMS', 
SECRET = '4WhOjG4bgIW8w7d0';  
GO

At this point, if you followed all the directions correctly, you should have a empty database named [ONE-SHEET]. Next, we need to work on creating external data sources for each related database.

New Empty Database

The transact SQL code below creates an external data source for each of the twelve months. Dynamic Transact SQL will be used instead of having twelve blocks of code. We will be making sure the code is restartable by removing any existing objects before creating new ones. If you have not used the sp_executesql command before, it is really great for database management scripts that need to perform work on a dynamic number of objects.

For more information on the CREATE EXTERNAL DATA SOURCE statement, please see the following MSDN article.

--
-- Step 6A - Create 12 external data sources
--

-- Variables
DECLARE @VAR_NAME1 SYSNAME;
DECLARE @VAR_NAME2 SYSNAME;
DECLARE @VAR_DATE DATETIME;
DECLARE @VAR_STMT NVARCHAR(1024);

-- Starting value
SET @VAR_DATE = '10-01-2015';

-- For each month
WHILE (@VAR_DATE < '10-01-2016')
BEGIN

  -- Set object names
  SET @VAR_NAME1 = 'SRC_PORTFOLIO_' + FORMAT(@VAR_DATE, 'yyyyMM');
  SET @VAR_NAME2 = 'PORTFOLIO-' + FORMAT(@VAR_DATE, 'yyyyMM');
 
  -- Drop external data src
  SET @VAR_STMT = '' +
   'IF EXISTS ' + 
   '( ' + 
   '  SELECT * FROM sys.external_data_sources WHERE NAME = ' +
      CHAR(39) + @VAR_NAME1 + CHAR(39) +
   ') ' + 
   'DROP EXTERNAL DATA SOURCE [' + @VAR_NAME1 + '];';
  exec sp_executesql @VAR_STMT;

  -- Create external data src
  SET @VAR_STMT = '' +
    'CREATE EXTERNAL DATA SOURCE [' + @VAR_NAME1 + '] ' +
    'WITH ' +
    '( ' +
    '  TYPE = RDBMS, ' + 
    '  LOCATION = ''mssqltips16.database.windows.net'', ' +
    '  DATABASE_NAME = ' + CHAR(39) + @VAR_NAME2 + CHAR(39) + ', ' +
    '  CREDENTIAL = CRD_CROSS_DBMS' +
    '); ';
  EXEC sp_executesql @VAR_STMT;

  -- Increment the date
  SET @VAR_DATE = DATEADD(M, 1, @VAR_DATE);
END

If you followed all the directions correctly, you should have twelve external data sources. See the image below for details. Next, we need to work on creating external tables for each of the data sources.

External Data Sources

We will use Dynamic Transact SQL to create external tables for each of the months below. Unfortunately, there is no way to define a primary key using the [ST_SYMBOL] and [ST_DATE] fields. If we were using the on premise version of SQL Server, we could automatically leverage Partitioned Views to increase query performance for certain patterns.

For more information on the CREATE EXTERNAL TABLE statement, please see the following MSDN article.

The image below shows how primary key constraints are not supported with external tables.

Primary Keys Not Supported

The transact SQL code below creates an external table for each of the twelve months.

--
-- Step 6B - Create 12 external tables
--

-- Variables
DECLARE @VAR_NAME1 SYSNAME;
DECLARE @VAR_NAME2 SYSNAME;
DECLARE @VAR_DATE DATETIME;
DECLARE @VAR_STMT NVARCHAR(1024);

-- Starting value
SET @VAR_DATE = '10-01-2015';

-- For each month
WHILE (@VAR_DATE < '10-01-2016')
BEGIN

  -- Set object names
  SET @VAR_NAME1 = 'STOCKS_' + FORMAT(@VAR_DATE, 'yyyyMM');
  SET @VAR_NAME2 = 'SRC_PORTFOLIO_' + FORMAT(@VAR_DATE, 'yyyyMM') + '';
 

  -- Drop external table
  SET @VAR_STMT = '' +
    'IF EXISTS ' +
    '( ' +
    '  SELECT * FROM sys.external_tables WHERE NAME = ' + 
    CHAR(39) + @VAR_NAME1 + CHAR(39) + 
    ') ' +
    ' DROP EXTERNAL TABLE [SHD].[' + @VAR_NAME1 + '];';
  exec sp_executesql @VAR_STMT;


  -- Create external table
  SET @VAR_STMT = ' ' + 
   'CREATE EXTERNAL TABLE [SHD].[' + @VAR_NAME1 + '] ' +
   '( ' +
   '  [ST_ID] [int] NOT NULL, ' +
   '  [ST_SYMBOL] [varchar](32) NOT NULL, ' +
   '  [ST_DATE] [date] NOT NULL, ' +
   '  [ST_OPEN] [real] NULL, ' + 
   '  [ST_HIGH] [real] NULL, ' +
   '  [ST_LOW] [real] NULL, ' +
   '  [ST_CLOSE] [real] NULL, ' +
   '  [ST_VOLUME] [bigint] NULL, ' +
   '  [ST_ADJ_CLOSE] [real] NULL ' +
   ') ' +
   'WITH ' +
   '( ' +  
   '  DATA_SOURCE = ' + @VAR_NAME2 + ',  ' + 
   '  SCHEMA_NAME = ' + CHAR(39) + 'ACTIVE' + CHAR(39) + ', ' +
   '  OBJECT_NAME = ' + CHAR(39) + 'STOCKS' + CHAR(39) + 
   '); ';   
  EXEC sp_executesql @VAR_STMT;

  -- Increment the data
  SET @VAR_DATE = DATEADD(M, 1, @VAR_DATE)
END

If you followed all the directions correctly, you should have twelve external tables. See the image below for details. Next, we need to create a view that combines all the tables together.

External Tables

The transact SQL code below creates a view named [STOCKS] which is the UNION ALL of the twelve external tables.

--
-- Create view on all external tables
--

-- Drop existing view
DROP VIEW IF EXISTS [DBO].[STOCKS]
GO

-- Create new view
CREATE VIEW [DBO].[STOCKS]
AS
SELECT * FROM [SHD].[STOCKS_201510]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201511]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201512]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201601]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201602]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201603]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201604]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201605]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201606]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201607]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201608]
  UNION ALL
SELECT * FROM [SHD].[STOCKS_201609]
GO

There can be a lot of Transact SQL coding to setup the Azure SQL database for elastic queries. In our example, we want to combine all the monthly data into one view. However, once the external table is defined, a database developer can start writing queries using both the local and external tables.

Executing Elastic Queries

Getting back to our business problem, you have been asked to pull summary data for each month for the GSPC mutual fund. The query below returns the high trading value, low trading value, average close value and the number of trading days for a given month. This query will supply the business owners with their required data.

--
-- Show summary statistics
--

SELECT 
  year(st_date) as st_year,
  month(st_date) as st_month, 
  max(st_high) as st_high,
  min(st_low) as st_low,
  avg(st_close) as st_avg_close,
  count(*) as st_days
FROM 
    [DBO].[STOCKS]
GROUP BY 
  year(st_date),
  month(st_date)
ORDER BY
  year(st_date),
  month(st_date)
GO

The image below shows the results of the above query.

Query Results

We can see that elastics query can SELECT data from multiple Azure SQL Databases and return the results. However, on premise queries can actually INSERT, DELETE and UPDATE data in related databases using the three part notation: database name, schema name and table name.

Can external tables support such DML actions?

The answer to this question is not at this time! The image below shows a simple INSERT statement that fails.

No DML Statements

Summary

Big Jon's Investments wanted to modify the proof of concept project in which stock portfolio data was stored in a table in Azure on a daily basis. Each database was changed to hold one month's worth of data. A parent database was created to reference all the child databases by using external tables. A view was created to UNION ALL the data into one object. Last but not least, an elastic database query was used to return summary data on all of the trading days grouped by month.

Some of the current limitations of external tables is the inability to define constraints and/or apply data manipulation statements. However, this new feature closes a gap that previously existed. The database developer can now reference both local and remote tables when creating Azure SQL database queries.

If you want to learn more, check out Azure Documentation on this subject.

Next Steps
  • Right now, we are pulling just summary data for the GSPC mutual fund. In real life, the stock price is always changing from when the market opens at 9:30 am and closes at 4:00 pm. This pattern lends itself to a current day database that having a large number of inserts and other day's databases that might be queried on an ad-hoc basis. Such a pattern shows an imbalance of resource usage across a set of sharded (horizontally partitioned) databases.

    How can we implement elastic database pools for such databases?
  • There are many other techniques that can be used to perform horizontal partitioning. One technique is database sharding. Microsoft has released the Azure SQL database elastic scale library for .NET which is a sharding solution.

    How can we take advantage of the library for our own business problems?
  • There are many situations in which you need to change the schema for a set of database shards. Is there a way to perform this action without connecting to each database to make the necessary DDL changes?

    How can we take advantage of elastic database jobs for our own business problems?


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author John Miner John Miner is a Data Architect at Insight Digital Innovation helping corporations solve their business needs with various data platform solutions.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Thursday, January 24, 2019 - 5:34:09 PM - Sunny Back To Top (78874)

Hi John,

Thank you for the response. I would first try to test some performance on the Cloud with external table setup. But yes thanks for the additional information regarding MI.


Thursday, January 24, 2019 - 3:18:01 PM - john miner Back To Top (78870)

Hi Sunny,

Here is a link to the Microsoft documentation on external tables. 

https://docs.microsoft.com/en-us/sql/t-sql/statements/create-external-table-transact-sql?view=sql-server-2017

There is not support for constraints in the database (local) the references the database (remote).  It does not mean the remote database can not have constraints and indexes.  Performance or lack of comes from the remote database (table).  So if you are using a S0 with 10 DTUs, this is your max performance.

Bascially, an external table is a replacement for the three dot notation that is lost when moving from an on-premise database to in the cloud.

If you want and need this option, I would take a look at managed instances (MI).  It is a more pricier option than PAAS but it supports cross database queries and many other features you miss from on premise.  The total size of a MI can be a total of 8 TB in total size.  Check the documentation since things change in Azure frequently.

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-managed-instance

Sincerely

John Miner

The CraftyDba


Thursday, January 24, 2019 - 1:31:03 PM - Sunny Back To Top (78869)

 Hi John, this is a great article. I have a question that if Constraints/Identity/Primary key cannot be created on the external tables, would it affect the query performance and how is it different from the on-premise SQL ?

Again, thanks for this wonderful and detailed article.


Friday, August 24, 2018 - 9:53:45 AM - John Miner Back To Top (77299)

Hi Yash,

You can not have two objects with the same name. 

Objects are defined by server, database, schema and object name for on premise systems and just the last two elements for in cloud, Platform As A Service offerings.

Since both the user defined table and the external table have the same schema and name, you are having issues.

Looking at the on-line documentation, the sys.tables view shows this issue very well.

https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-tables-transact-sql?view=sql-server-2017

I suggest using schemas just like the example above to seperate the actual sources of the data.

Also, if you need a consolidated table, consider using a VIEW with a SELECT statement using a UNION ALL.  

In summary, external tables for cross database querying are read only.

Sincerely

John


Sunday, August 19, 2018 - 3:57:33 AM - Yash Back To Top (77227)

Hi,

I am able to to query cross DB after following above mentioned steps but i have question what if table name is same in both DB.

Like i have TB1 in DB1 and TB1 in DB2

for this scenario i am not able to create external table by the name of TB1 as its alreay present in DB1 as normal table.

Please advise on this scenario.

Thanks


Monday, September 18, 2017 - 2:05:11 AM - AB Back To Top (66378)

Hi,

In 1st point: Create new user

DEFAULT_SCHEMA=[ACTIVE]  

What does Active refers to?

Regards,

AB















get free sql tips
agree to terms