Getting Started with HDInsight - Part 3 - Creating an HDInsight Cluster in Microsoft Azure Cloud

By:   |   Comments   |   Related: More > Big Data


Problem

I have read the previous tips in the Getting Started with HDInsight series and I am eager to jump into the action and learn more about HDInsight. I would like to know what the prerequisites for creating a cluster are, how you create a cluster and the other aspects related to creating a HDInsight cluster.

Solution

In this tip we will take a look at the creation of HDInsight cluster and other related aspects.

We will take a look at two different ways of creating an HDInsight Cluster:

  • Creating an HDInsight Cluster through Azure Management Portal
  • Creating an HDInsight Cluster through Windows Azure PowerShell

To be able to create an Azure HDInsight Cluster, we need an active Windows Azure Subscription. If you do not have one, you can sign up for a free trial here: Windows Azure - Free one-month trial.

For the purpose of this tip, we will create a cluster in its simplest form without any custom configurations. Let's get started.

Creating an HDInsight Cluster with the Azure Management Portal

Before we can create an HDInsight cluster, we need a Storage Account in place which can be used by the HDInsight cluster that we want to create. The cluster needs a Storage Account on which it can store its system files and user data files (default) in the default container.

Creating a Storage Account in Azure Management Portal

Let us create a Storage Account by following the below steps.

  • Go to https://manage.windowsazure.com

  • Go to Azure Management Portal

  • Type in the Email Address associated with the Azure Subscription, enter the credentials, and login.

  • Enter the Credentials and Log into Azure Management Portal

  • After logging into the Portal, we can see an empty environment as we have not created any services as shown below.

  • Empty Environment in Azure Management Portal

  • Click on Storage in the Left Navigation Pane which lists all the available services.

  • Storage Services in the Left Navigation Pane

  • Click on Create a Storage Account and it will open up the Quick Create option from the bottom service provisioning pane as shown below. Enter the following information in the respective fields.

  • Property Value Description / Additional Information
    URL hdidemosa
    • This is the name of the Storage Account to be created. This name should be globally unique. Choose this name carefully so that there is no probability of conflict.
    • We only need to specify the name of the Storage Account. No need to specify the entire URL.
    • Fully qualified URL "hdidemosa.*.core.windows.net" is built by Azure.
    • Only lower case letters are allowed here.
    LOCATION/AFFINITY GROUP Central US
    • This is the location where you want the Storage Account to be created / located.
    • Storage Account and the HDInsight Cluster should be located in the same location / data center. Hence choose this location carefully.
    REPLICATION GEO-REDUDANT
    • This setting determines how the data in the Storage Account is replicated. For mission critical applications / data, it is recommended to replicate the data across different geographical locations so that the data is available all the time. More information about Azure Storage Replication: Azure Storage Redundancy Options.
    • For the purpose of this tip, let us leave the default as "Geo-Redundant".

    Storage Account Quick Create

  • Click on Create Storage Account.
  • It will take few minutes to provision a Storage Account. Once the Storage Account is provisioned, a success message is displayed and we can see the Storage Account listed in the list of items as shown below.

  • Storage Account Created Successfully

Creating an HDInsight Cluster in Azure Management Portal

Now let us create an HDInsight cluster by following the below listed steps.

  • Click on HDInsight in the Left Navigation Pane which lists all the available services.

  • HDInsight Services in the Left Navigation Pane

  • Click on Create an HDInsight Cluster and it will open up the Quick Create option from the bottom service provisioning pane as shown below. This option is used to quickly create an HDInsight cluster with limited customization.

  • HDInsight Quick Create Option. Choose Custom Create.

  • Click on "CUSTOM CREATE" and it will launch the "New HDInsight Cluster" wizard. Enter the details from the below table in to the respective fields on the 1st screen ("Cluster Details" screen) of the wizard as shown below.

  • Property Value Description / Additional Information
    CLUSTER NAME HDIDemoCluster
    • This is the name of the HDInsight Cluster to be created.
    • This name is globally unique. Choose this name carefully and make it unique by adopting some standards and incorporating your project / organization specific keyword(s) in it. Only use keywords which are not sensitive or confidential.
    CLUSTER TYPE Hadoop
    • This setting determines the type of cluster we want to create. At the time of writing this tip, Windows Azure offers 3 different types of clusters - Hadoop, HBase, and Storm.
    HDINSIGHT VERSION default (3.1)
    • This setting determines the Version of the cluster we want to create. It is always recommended to use the latest version as it would include latest and more advanced features.
    • As HDInsight is based on Hortonworks Data Platform (HDP), each release / version of HDInsight maps to a particular release / version of HDP, which in turn maps to a particular release / version of Apache Hadoop.
    • More information on what features are included in each version of HDInsight: HDInsight versions

    New HDInsight Cluster Wizard - Cluster Configuration

  • Click on the Right Arrow and enter the details from the below table into the respective fields on the 2nd screen ("Configure Cluster" screen) of the wizard as shown below.

  • Property Value Description / Additional Information
    DATA NODES 2
    • This determines the number of data nodes that we want in our cluster. There should be at least one data node in the cluster.
    • There is a maximum limit on how many nodes we can have in the cluster and this limit varies based on subscription. One can get this limit enhanced by contacting Microsoft Support.
    • Apart from these data nodes, two head nodes are included in the cluster to ensure high availability, irrespective of how many data nodes we have in the cluster.
    • More information on pricing here: HDInsight Pricing Details
    REGION/VIRTUAL NETWORK Central US
    • This setting determines the region / location where the Cluster is to be created.
    • Ensure that the location chosen here is same as the location where the Storage Account has been created.

    New HDInsight Cluster Wizard - Configure Cluster

  • Click on the Right Arrow and enter the details from the below table into the respective fields on the 3rd screen ("Configure Cluster User" screen) of the wizard as shown below.

  • Property Value Description / Additional Information
    USER NAME HDIAdmin
    • This is the account to be used for accessing the HDInsight cluster.
    PASSWORD MSSQLTips@2014
    • This is the account password. Choose a strong password.
    Enter the Hive/Oozie Metastore Leave Unchecked
    • This setting enables us to use an Azure SQL database as a metastore for Hive/Oozie. For the purpose of this demonstration, keeping things simple, leave this unchecked.
    • By using an Azure SQL Database as a Hive/Oozie metastore, we can retain the necessary metadata / configurations even after deleting the cluster.

    New HDInsight Cluster Wizard - Configure Cluster User

  • Click on the Right Arrow and enter the details from the below table in to the respective fields on the 4th screen ("Storage Account" screen) of the wizard as shown below.

  • Property Value Description / Additional Information
    STORAGE ACCOUNT Use Existing Storage
    • This is the Storage Account to be used by HDInsight cluster for storing its installation / system files / data along with user data (default).
    • We can either have the wizard created a new Storage Account for us or we can choose an existing Storage Account. Since have already created a Storage Account, choose the option as "Use Existing Storage".
    ACCOUNT NAME hdidemosa
    • This is the Storage Account to be used by HDInsight cluster. Choose the Storage Account which we created earlier during this demonstration.
    DEFAULT CONTAINER Leave it as "Create Default Container"
    • Since we have not created any containers, it defaults to "Create Default Container". If we had other containers in the selected storage account, then it would show us a drop down to either choose a default container from available list of containers or allow us to create a new container to be used as the default container.
    • For the purpose of this demonstration, let's allow the wizard to create the default storage container for us.
    ADDITIONAL STORAGE ACCOUNTS 0
    • This setting allows us to associate one or more other Storage Accounts to be used with the cluster apart from the Storage Account selected above.
    • This can be handy in scenarios like the data to be processed is located on a different Storage Account or when the processed data needs to be pushed to a particular Storage Account for other users / applications to consume and various other scenarios.

    New HDInsight Cluster Wizard - Storage Account

  • Click on the "Right" symbol to start provisioning the HDInsight cluster.
  • It takes few minutes to provision a cluster. Once the cluster is provisioned, Azure notifies us with a notification message at the bottom and we can also see the cluster in the list of available clusters as shown below.

  • HDInsight Cluster Created Successfully

Clicking on "ALL ITEMS" in the left navigation pane shows us the list of all the services created by us as shown below.


List of all Active Services

Next let us go ahead and create the Storage Account and Cluster using Windows Azure PowerShell.

Creating an HDInsight Cluster with Windows Azure PowerShell

Before proceeding with the creation of Storage Account and Cluster, make sure that you have Windows Azure PowerShell Cmdlets installed and the Windows PowerShell environment configured for your subscription as described in Getting Started with HDInsight - Part 2 - Introduction to Azure HDInsight PowerShell.

For the purpose of this tip, we will create the Storage Account and Cluster with the simplest configuration without any customizations. Let's get started.

Creating a Storage Account via Windows Azure PowerShell

To start with, let us create a Storage Account. For the purpose of our demonstration, let us create this Storage Account in the "East US" location.

Launch Windows Azure PowerShell ISE and copy-paste the below script. Replace the Subscription Name in the below code with the name of your Subscription and execute the script.

$subscriptionName = "Microsoft Azure HDInsight - Trial Subscription" # Name of Subscription
$storageAccountName = "hdipsdemosa" # Name of Storage Account to be Created
$storageAccountLocation = "East US" # Location where Storage Account should be Created
$storageAccountDescription = "HDInsight PowerShell Demo Storage Account" # Optional Description

Select-AzureSubscription -SubscriptionName $subscriptionName -Current

New-AzureStorageAccount `
    -StorageAccountName $storageAccountName `
    -Location $storageAccountLocation `
    -Description $storageAccountDescription

It takes few minutes to create the Storage Account. Once the script execution is completed, an output message with the status of the execution is displayed as shown below.

Create Storage Account via Windows Azure PowerShell

Run the below script and verify the details in the output to ensure that the Storage Account was created successfully.

Get-AzureStorageAccount -StorageAccountName "hdipsdemosa"

The output of the above command looks as shown below.

Verify the Storage Account Creation via Windows Azure PowerShell

Now that we have created the Storage Account, let's go ahead with the creation of cluster.

Creating an HDInsight Cluster via Windows Azure PowerShell

Now, let us create an HDInsight Cluster. Since we have created the Storage Account in "East US" location, we need to create our HDInsight Cluster also in this location to be able to associate this storage account with our cluster.

Copy-paste the below script into Windows Azure PowerShell ISE. Replace the Subscription Name, Cluster Name, and Cluster Node Count in the below code with appropriate values and execute the script.

$subscriptionName = "Microsoft Azure HDInsight - Trial Subscription" # Name of Subscription
$storageAccountName = "hdipsdemosa" # Name of Storage Account to be used
$storageAccountLocation = "East US" # Location of Storage Account to be used
$clusterName = "HDInsightDemoCluster" # Name of the Cluster to be Created
$clusterNodeCount = "2" # Number of Data Nodes in the Cluster

Select-AzureSubscription -SubscriptionName $subscriptionName -Current

$storageAccountKey = Get-AzureStorageKey -StorageAccountName $storageAccountName | %{$_.Primary}
$blobStorageName = "$storageAccountName.blob.core.windows.net" # HDInsight uses Blob Storage. Construct Fully Qualified Blob Storage Account Name.

New-AzureHDInsightCluster `
    -Name $clusterName `
    -Location $storageAccountLocation `
    -DefaultStorageAccountName $blobStorageName `
    -DefaultStorageAccountKey $storageAccountKey `
    -ClusterSizeInNodes $clusterNodeCount

Once the above script is executed, you will be prompted to enter the missing information, which in our case are credentials and the name of the Cluster Root Container. Firstly, you are prompted for credentials. Enter the credentials as shown below.

Create HDInsight Cluster via Windows Azure PowerShell

Next, enter the name of the Cluster Root Container to be created and press enter.

Create HDInsight Cluster via Windows Azure PowerShell

It takes some time to provision the cluster. While the cluster is being provisioned, the status is reported to the user in the PowerShell window as shown below.

HDInsight Cluster Creation Status Reporting via Windows Azure PowerShell

Once the cluster is provisioned, the status is returned to the user as shown below.

HDInsight Cluster Created Successfully

Now run the below command and verify the output to ensure that the cluster has been provisioned successfully.

Get-AzureHDInsightCluster -Name "HDInsightDemoCluster"

The output of the above command looks as shown below.

Verify HDInsight Cluster Creation via Windows Azure PowerShell

We can also verify that the Storage Account and Cluster are created by logging into the Azure Management Portal and checking the list of active services as shown below.

List of Services created via Windows Azure PowerShell and active in Azure Management Portal

That's it. It's as simple as this to create the Storage Account and Cluster through Windows Azure PowerShell. PowerShell is a very powerful tool and is used extensively for automation of deployment, administration, management, data processing and various other activities on Azure / HDInsight.

Now that we have created the HDInsight Cluster, we can start exploring more about HDInsight. We will explore more about HDInsight in future tips. So, stay tuned!

Next Steps
  • Explore more about the above demonstrated approaches and see how you can customize the creation of a Storage Account and HDInsight Cluster.
  • Check out the tips on Microsoft Azure
  • Check out the tips on Windows PowerShell
  • Check out my previous tips
  • Stay tuned for the next tip in this series!


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Dattatrey Sindol Dattatrey Sindol has 8+ years of experience working with SQL Server BI, Power BI, Microsoft Azure, Azure HDInsight and more.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article

















get free sql tips
agree to terms