Learn more about SQL Server tools

 

Tutorials          DBA          Dev          BI          Career          Categories          Events          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Events

DBA    Dev    BI    Categories

 

Azure SQL Database - Table Partitioning


By:   |   Read Comments (11)   |   Related Tips: More > Azure

Problem

The Gartner Group specializes in surveying leading companies and determining industry trends in Information Technology. It is not surprising that cloud computing and big data (information explosion) are on top of the 2015 technology trending list.

In December of 2014, Microsoft release the preview version of Azure SQL Database update V12. The main purpose of this version is to narrow the syntactical differences between the on-premises and in-cloud database engines. The hope is that more companies will migrate their data to this software as a service platform.

Given these trends, the main question a database administrator might have is "How can I manage larger tables in Azure SQL Database?".

Solution

The new version of Azure SQL database has introduced table partitioning. This feature is part of the enterprise only version for on premises, but is available in all versions in the cloud.

To demonstrate this new feature we need to have a fictitious business problem. Since one of my majors in college was applied mathematics, I am going to solve a math problem.

Business Problem

Calculate and store the primes numbers from 1 to 1 million with ten data partitions. Thus, the primes numbers will be hashed in buckets at every one hundred thousand mark.

The trial division algorithm that I am going to introduce is a brute force method for calculating prime numbers. It is great for comparing the computing power of two machines by looking at overall execution times.

This routine consists of dividing a number n by each integer m which is greater than 1 and less than or equal to the square root of n. If the result of any of these divisions is an integer, then n is not a prime; otherwise, it is a prime.

Creating the database

I am using my MSDN ultimate license which comes with a free $150 per month Azure subscription. This is a great way to learn about what Azure has to offer without any real investment.


MSDN Ultimate License

This demonstration assumes you have a Azure Database Server already created with a valid login. The server login I created is named jminer. It is important to record the web address of the server (connection string) since this will be used in SSMS. The image below shows the V12 preview has been enabled.

AZURE SQL Databases

I will be using SQL Server Management Studio (SSMS) 2014 with cumulative update 5 installed to design and deploy the solution. I will be referring to the Azure Portal to review the results of our work.

To connect to our Azure Database server, enter the connection information using SQL Server standard authentication.

SSMS Connect To Server

One statement that is still not supported is the USE statement. This limitation can be overcome by selecting the correct database in the object explorer and right clicking to open a new query window. I will be leaving this statement in the code since they are a reminder of what database you should be in. Executing this statement in the wrong database generates an error.

To verify the server version and default database, we can use the db_name() and @@version statements.

SQL Version & Default Database

The code below recreates the MATH database.

/*  
 Create a database to hold the prime numbers
*/

-- Which database to use.
USE [master]
GO

-- Delete existing database
IF  EXISTS (SELECT name FROM sys.databases WHERE name = N'MATH')
DROP DATABASE MATH
GO

-- Create new database
CREATE DATABASE MATH
(
MAXSIZE = 20GB,
EDITION = 'STANDARD',
SERVICE_OBJECTIVE = 'S2'
)
GO   

It is interesting to note two new keywords have been introduced to describe database type. I will be investigating this new syntax in my next tip.

Creating the Partition Function and Scheme

The key concept behind any type of horizontal partitioning is to group similar records into a single file group and/or file. In turn, this changes major record operations into file operations. For instance, DELETE all data with partition value Y turns into a remove file operation. Searching for data with partition value Y as part of the WHERE clause directs the storage engine to retrieve data from that one file.

The overall benefits should result in increased speed. However, like most things in life your delta might vary.

The main question that comes to mind is "How do we do create a partition scheme in Azure since we have no control over file placement?

The product team has assured me that mapping ALL the partitions to the PRIMARY file group will be optimized by the storage engine in Azure.

The diagram below is a conceptual view of how table partitioning works for our example in Azure SQL database.

Partition Function & Schema

The code below creates a partition function named PF_HASH_BY_VALUE and partition scheme named PS_HASH_BY_VALUE.

/*  
 Use table partitioning
*/

-- Which database to use.
USE [MATH]
GO

-- Create the partition function
CREATE PARTITION FUNCTION PF_HASH_BY_VALUE (BIGINT) AS RANGE LEFT 
FOR VALUES (100000, 200000, 300000, 400000, 500000, 600000, 700000, 800000, 900000)
GO

-- Show the function
SELECT * FROM sys.partition_functions
GO

-- Create the partition scheme
CREATE PARTITION SCHEME PS_HASH_BY_VALUE 
AS PARTITION PF_HASH_BY_VALUE
ALL TO ([PRIMARY]);
GO

-- Show the scheme
SELECT * FROM sys.partition_schemes
GO

The output from querying the system tables is shown below.

Partition Function & Schema

A simple call to the $PARTITION system function can be used to test the hash index. The example below uses a derived table with key boundary values.

-- Test partition function
SELECT 
  MY_VALUE,
  $PARTITION.PF_HASH_BY_VALUE(MY_VALUE) AS HASH_IDX
FROM 
(
 VALUES 
   (1),
   (100001), 
   (200001), 
   (300001), 
   (400001), 
   (500001), 
   (600001), 
   (700001), 
   (800001), 
   (900001)
) AS TEST (MY_VALUE);
GO

The output from the test is shown below.

PARTITION System Function

Creating the Partitioned Table

The TBL_PRIMES table contains three columns. The first one is the value of the prime number. The second one is how many divisions were tried before the number was declared prime. The third one is the date and time the data was stored. The first column is chosen as the primary key for the table.

The code below creates the new table with the partitioning scheme implemented on the primary key.

/*  
 Create a table to hold the prime numbers
*/

-- Which database to use.
USE [MATH]
GO

-- Delete existing table
IF  EXISTS (SELECT * FROM sys.objects 
  WHERE object_id = OBJECT_ID(N'[DBO].[TBL_PRIMES]') AND type in (N'U'))
DROP TABLE [DBO].[TBL_PRIMES]
GO

-- Add new table
CREATE TABLE [DBO].[TBL_PRIMES] 
(
  [MY_VALUE] [bigint] NOT NULL,
  [MY_DIVISION] [bigint] NOT NULL CONSTRAINT [CHK_TBL_PRIMES] CHECK ([MY_DIVISION] - 0),
  [MY_TIME] [datetime] NOT NULL CONSTRAINT [DF_TBL_PRIMES] DEFAULT (GETDATE())
  CONSTRAINT [PK_TBL_PRIMES] PRIMARY KEY CLUSTERED ([MY_VALUE] ASC)
) ON PS_HASH_BY_VALUE ([MY_VALUE])
GO

User defined stored procedures

First, we need a procedure that takes a number as a parameter and determines if it is prime. In this example we will use an old fashion WHILE loop. Some relational algebraic purest might argue that we should use a TALLY table. However, this is only a simple example focused on table partitioning.

The code below creates the procedure named SP_IS_PRIME.

/*  
 Create a procedure to determine if number is prime
*/


-- Which database to use.
USE [MATH]
GO

-- Delete existing procedure
IF  EXISTS (SELECT * FROM sys.objects 
  WHERE object_id = OBJECT_ID(N'[dbo].[SP_IS_PRIME]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[SP_IS_PRIME]
GO

-- Create the stored procedure from scratch
CREATE PROCEDURE [dbo].[SP_IS_PRIME]
    @VAR_NUM2 BIGINT
AS
BEGIN
    -- NO DISPLAY
    SET NOCOUNT ON
 
    -- LOCAL VARIABLES
    DECLARE @VAR_CNT2 BIGINT;
    DECLARE @VAR_MAX2 BIGINT;

    -- NOT A PRIME NUMBER
    IF (@VAR_NUM2 = 1)
        RETURN 0;            

    -- A PRIME NUMBER
    IF (@VAR_NUM2 = 2)
        RETURN 1;            

    -- SET UP COUNTERS    
    SELECT @VAR_CNT2 = 2;
    SELECT @VAR_MAX2 = SQRT(@VAR_NUM2) + 1;

    -- TRIAL DIVISION 2 TO SQRT(X)
    WHILE (@VAR_CNT2 <= @VAR_MAX2)
    BEGIN
        -- NOT A PRIME NUMBER
        IF (@VAR_NUM2 % @VAR_CNT2) = 0
            RETURN 0;            

        -- INCREMENT COUNTER
        SELECT @VAR_CNT2 = @VAR_CNT2 + 1;
        
    END;

    -- A PRIME NUMBER
    RETURN 1;
    
END
GO

Second, we need a procedure that takes a starting and ending value as input and calculates and stores primes numbers between those two values as output. This procedure will allow us to run multiple calls in parallel against Azure SQL Database at the same time.

The code below creates the procedure named SP_STORE_PRIMES.

/*    
 Create a procedure to store primes from x to y.
*/

-- Which database to use.
USE [MATH]
GO

-- Delete existing procedure
IF  EXISTS (SELECT * FROM sys.objects 
  WHERE object_id = OBJECT_ID(N'[dbo].[SP_STORE_PRIMES]') AND type in (N'P', N'PC'))
DROP PROCEDURE [dbo].[SP_STORE_PRIMES]
GO

-- Create the stored procedure from scratch
CREATE PROCEDURE SP_STORE_PRIMES
    @VAR_ALPHA BIGINT,
    @VAR_OMEGA BIGINT
AS
BEGIN
    -- NO DISPLAY
    SET NOCOUNT ON
 
    -- DECLARE VARIABLES
    DECLARE @VAR_CNT1 BIGINT;
    DECLARE @VAR_RET1 INT;
    
    -- SET VARIABLES
    SELECT @VAR_RET1 = 0;
    SELECT @VAR_CNT1 = @VAR_ALPHA;

    -- CHECK EACH NUMBER FOR PRIMENESS
    WHILE (@VAR_CNT1 <= @VAR_OMEGA)
    BEGIN
        -- ARE WE PRIME?
        EXEC @VAR_RET1 = DBO.SP_IS_PRIME @VAR_CNT1;
        
        -- FOUND A PRIME
        IF (@VAR_RET1 = 1)
          INSERT INTO [DBO].[TBL_PRIMES] (MY_VALUE, MY_DIVISION) 
    VALUES (@VAR_CNT1, SQRT(@VAR_CNT1));
    
        -- INCREMENT COUNTER
        SELECT @VAR_CNT1 = @VAR_CNT1 + 1        
    END;
    
END
GO

Parallel execution

Many of the SQL Server tools that come with the on-premises edition work the same way for the cloud edition. I am going to leverage the SQLCMD utility in a batch program. The command line interpreter has the start keyword that can be used to launch a program asynchronously. Putting all this concepts together with the right calls to SP_STORE_PRIMES, we can calculate the prime numbers in ten even batches.

The command file below calls our user defined stored procedure to solve our business problem.

REM
REM  Calculate primes numbers <= 1M asynchronously.
REM 

REM [Partition 1]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 1, 100000;"

REM [Partition 2]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 100001, 200000;"

REM [Partition 3]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 200001, 300000;"

REM [Partition 4]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 300001, 400000;"

REM [Partition 5]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 400001, 500000;"

REM [Partition 6]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 500001, 600000;"

REM [Partition 7]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 600001, 700000;"

REM [Partition 8]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 700001, 800000;"

REM [Partition 9]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 800001, 900000;"

REM [Partition 10]
start cmd /c sqlcmd -S codf58h5ey.database.windows.net,1433 -U jminer -P SQLtip$2015 
  -d MATH -Q "EXEC SP_STORE_PRIMES 900001, 1000000;"

Solution validation

Even though I have been in the IT industry for a quarter century, I still test and re-test my solutions to make sure that my algorithms work correctly for both positive and negative testing.

One question that a tester might have is "How do I know the data was stored in the correct partition?"

The Azure preview V12 has exposed over 100 new dynamic management views that the database administrator can use for monitoring and troubleshooting. The sys.dm_db_partition_stats view can be used to answer such a question. However, I already introduced the $PARTITION system function that can obtain the same answer.

The code below shows how to investigate row counts by partition number.

/*  
 Validate data placement
*/


-- Use dmv to get partitions
SELECT 
  Partition_Number, Row_Count 
FROM sys.dm_db_partition_stats
WHERE object_id = object_id('TBL_PRIMES'); 


-- Using the $PARTITION function
SELECT 
    $PARTITION.PF_HASH_BY_VALUE([MY_VALUE]) as Partition_Number, 
    COUNT(*) as Row_Count
FROM 
    MATH.[dbo].[TBL_PRIMES]
GROUP BY 
    $PARTITION.PF_HASH_BY_VALUE([MY_VALUE]);

The output from the test is shown below.

Validate Data Placement

Next Steps
  • Azure V12 has introduced edition and service objective to provide different levels of processing power.
    How do these offerings compare to a on-premises edition?
  • Many times table partition goes hand in hand with a sliding window for data warehousing. The current solution for prime numbers less than or equal to 1 million is static and does not account for adding and subtracting partitions.
    How do you implement such a concept?
  • There are many other techniques that can be used to perform horizontal partitioning. One technique is database sharding. Microsoft has released the Azure SQL database elastic scale library for .NET which is a sharding solution.
    How can we take advantage of the library for our own business problems?


Last Update:






About the author
MSSQLTips author John Miner John Miner is currently a Microsoft Technology Solutions Professional (TSP) advising North East District (NED) corporations.

View all my tips
Related Resources


 









Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


Get free SQL tips:

*Enter Code refresh code     



Tuesday, November 15, 2016 - 6:09:43 AM - Anil Back To Top

 

 Hi, 

     Great article about Azure table partitioning, Thanks!.

 I am wondering how can we keep table parrtitioing based on date/months for keep growing database. There is no end of dates and hence the bucket count. But say we may need to keep create bucket for last 3 months.

Regards,

Anil


Friday, April 08, 2016 - 8:48:05 PM - Rick Willemain Back To Top

 Excellent demo and article !  Tank you !!! Rick Willemain

 


Wednesday, October 14, 2015 - 3:04:25 PM - John Miner Back To Top

Good Afternoon Ivo,

I am sorry I was not able to help you before you found your own answer.  Things are changing really fast in the Azure playground.  I guess before, you had to have a premium sku P#.  Now it is part of standard sku S#.  I will have to retest this when I have a chance.

Sincerely

John


Wednesday, October 14, 2015 - 2:38:16 PM - Ivo Pels Back To Top

Never mind. I am a moron.

We were using test queries with literals and the queries were been considered for simple parameterization by SQL Server to promote query plan reuse.

It works.

And even better, it also works on the Standard Tier !

Regards,

Ivo

 


Tuesday, October 13, 2015 - 9:19:57 AM - Ivo Pels Back To Top

Hi John,

We just tried to use horizontal table partioning on Azure SQL Server Premium Tier and we can see the data being populated in different partitions. However the explain plans do not reflect the optimizer actually using the partitioning and we also do not see any performance difference between a partioned and unpartioned table.

We actually rolled out the partitioning on Standard Tier before rescaling to Premium and code seems to work fine on Standard Tier as well. We assume this has something to do with being able to scale up/down without breaking things and Standard tier will just not utilize the partioning, but Premium seems not to do also unfortunatly.

Any ideas?

Regards,

Ivo

 


Wednesday, May 20, 2015 - 1:02:05 AM - Isaac Carter Back To Top

Hey John,

You mentioned that you have been in the IT industry for a quarter of a decade.  That is 2.5 years, 30 months, not very long...   Perhaps you ment a quarter of a century?


Wednesday, March 04, 2015 - 7:38:12 PM - John Miner Back To Top

Hi Garrilla,

Thanks for the updated link stating that table partitioning is part of the premium edition.  I wrote this article in the beginning of January, before that update from Microsoft.

To recap, before using any feature cross check the edition against the feature.

Sincerely

John


Wednesday, March 04, 2015 - 4:27:53 AM - Garrilla Back To Top

Thanks for the tip

 

The new version of Azure SQL database has introduced table partitioning. This feature is part of the enterprise only version for on premises, but is available in all versions in the cloud

 

Just a quick correction:

 

According to https://msdn.microsoft.com/en-us/library/azure/ee336281.aspx Table POartioning is only avaiable in Premium Tiers on Azure


Sunday, February 01, 2015 - 4:00:41 AM - Kartar Rana Back To Top

Great Article John!!!!


Friday, January 30, 2015 - 3:27:32 PM - Matt B Back To Top

John,

 

As someone who is new to Azure SQL Database, this article is very well written, easy to follow and grasp, and provides tons of technical info.  Keep up the great work!!

 

Matt

 


Thursday, January 29, 2015 - 8:09:35 AM - Jeremy Kadlec Back To Top

John,

Congrats on your first tip.

Thank you,
Jeremy Kadlec
Community Co-Leader


Learn more about SQL Server tools