Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Importing Complex JSON files using SQL Server Integration Services


By:   |   Last Updated: 2018-04-26   |   Comments (2)   |   Related Tips: More > Import and Export

Problem

How to import UK petition JSON files using SQL Server Integration Services?

Solution

In tip #1 we covered the UK petition file format and how to import the JSON files using SQL Server Integration Services. We have also learned about importing nested JSON files in tip #2.  These two tips form the basis for this tip. It is assumed that you have practiced those examples and are familiar with the data format.

In this tip, we are going to download the UK petition data (here) as a JSON file. This JSON file is very complex in nature and will be very difficult to understand and define classes to store the value at runtime. Hence initially, I have come up with simple example to explain the complex parts. Then I will showcase the actual working example to load the UK petition file directly using SSIS.

Reviewing JSON File Format

In the below example #1, two petition details have been consolidated and provided with attributes. This example is an advanced version of the example mentioned in tip #2.

In this example, I have an included additional object "attributes" for the data item. Each petition has attributes: "action", "background", "additional_details", "state", "signature_Count" and "created at" petition.

Sample Petition File - Description: Sample Petition File

After refreshing the JSON data content using the Notepad++ JSON viewer, the JSON object hierarchy has been displayed on the left-hand side. It is observed that the inner most object attribute has several attributes (like action, background, state, etc.).

In addition to the "links" object, the attributes object defined under the array. Each item in the array represent a petition. Hence a petition object will have type and id as simple attributes.

In addition, the petition object will have the link details and attribute details as objects. At the root level, a JSON object has been defined.

SSIS Output Columns

From the supplied sample petition file, it is assumed that a petition has column details like id, type, link, action, background, additional_details, state, signature_count and created_dt.

As a first step, let's create the output columns as mentioned in the picture below.

Creating Output - Description: Creating Output

Creating “DataItem” Class

To enable us to deserialize the JSON file content, we need to create a class definition which reflects the hierarchy of the supplied JSON file. The below represents the class definitions.

 using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;
using System.Text;
namespace PetitionNamespace
{
   public class DataItem
   { 
      public Collection<DataSubItem> data { get; set; }
   }
   public class DataSubItem
   {
      public string type { get; set; }
      public string id { get; set; }
      public Selflink links { get; set; }
      public AttributeSubItems attributes { get; set; }
   }
   public class Selflink
   {
      public string self { get; set; }
   }
   public class AttributeSubItems
   {
      public string action { get; set; }
      public string background { get; set; }
      public string additional_details { get; set; }
      public string state { get; set; }
      public string signature_count { get; set; }
      public string created_at { get; set; }
   }
}			
Creating DataItem Class - Description: Creating DataItem Class

The "AttributeSubItems" class is the inner most class to contain the attributes of a petition. In addition, the SelfLink class is also an inner most class to define the link (self).

The DataSubItem class contains the definition of a petition which defines the property of the petition like type, id as simple properties. It also contains other properties "link" and "Attribute" as objects. The root level class DataItem has the property "data" which returns a collection of DataSubItem.

Deserialization of JSON Data

We need to read the JSON file and store the file content as a string. The JSON file content can be deserialized using the JavaScript serializer. After the successful deserialization, the data content can be stored as a type of “dataitem” object. The DataItem contains a collection of “datasubItem”. Each DataSubItem represents a petition. The petition details like type and id can be directly availed from the “datasubitem”. The link "self" object can be derived from the "links" property. Also, the attributes of the petition can be derived from the attribute property of the “datasubitem”.

public override void CreateNewOutputRows()
{
   /*
     Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
     For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
   */
   String jsonFileContent = File.ReadAllText(@"E:\WorkArea\Analysis\SSIS_JSON\Petition.JSON");
   JavaScriptSerializer js = new JavaScriptSerializer();
   DataItem dataItem = js.Deserialize<DataItem>(jsonFileContent);
   foreach (DataSubItem dataSubItem in dataItem.data)
   {
      Output0Buffer.AddRow();
      Output0Buffer.type = dataSubItem.type;
      Output0Buffer.id = dataSubItem.id;
      Selflink selflink = dataSubItem.links;
      Output0Buffer.Link = selflink.self;
      AttributeSubItems attributeSubItems = dataSubItem.attributes;
      Output0Buffer.Action = attributeSubItems.action;
      Output0Buffer.Background = attributeSubItems.background;
      Output0Buffer.additionaldetails = attributeSubItems.additional_details;
      Output0Buffer.state = attributeSubItems.state;
      Output0Buffer.signaturecount = attributeSubItems.signature_count;
      Output0Buffer.createdat = attributeSubItems.created_at;
   }
}			
Creating output Rows - Description: Creating output Rows

After the successful execution of the package, we can see two records are getting generated by the script component. A data viewer has been setup between the script component and the union all to display the records.

Records in Data Viewer - Description: Records in Data Viewer

Downloading UK petition JSON file

The UK petition details can be downloaded as a JSON file using this URL. Let’s open the petition JSON file in Notepad++ and refresh the JSON viewer. The below image represents the JSON object hierarchy. The root level JSON object has a links object and an array of petition objects.

Extened Petition File - Description: Extened Petition File

After expanding the "Links" object, we can see the links object has properties (self, first, last, next and prep).

Extened Petition File - Description: Extened Petition File

Let’s expand the data attributes to understand the hierarchy. It is observed that the data array has multiple petition objects. Each petition object has attributes "type" and "id". Petition also has other properties "links" and "attributes" as objects.

It is evident that the supplied JSON has two datasets. The links dataset has only one record with columns "Self", "First", "Last", "Next" and "Prev". The Petition dataset contains all the petition details as records. Each petition record has the details like "type", "id", "links", "action", "background", "additional_details, "state", and "signature_count". As the "links" and "attributes" are defined as a nested JSON, we need to extract the details and add into the petition details.

Extened Petition File - Description: Extened Petition File

As we have two datasets, we need to create two outputs in the script component. The LinksOutput will be crated with the columns "self", "First", "Last", "Next" and "Prev".

The Petitionoutput will have the columns "id", "type", "link", "Action", "Background",  "additional_details", "state", "signature_count" and "created_dt". The below image represents the outputs with the columns.

Creating Output - Description: Creating Output

Once the outputs have been created, we can add two union all components as a “dump” destination and connect the outputs as shown in the picture below.

Data Flow Overview - Description: Data Flow Overview

Creating Petition Object in SSIS

The supplied petition JSON object has many nested JSON objects and arrays. Hence, we need to define multiple classes to store the JSON data contents at runtime using the C# libraries.

The innermost class "Attributes" has the details about the attributes of a petition. This class has the properties like "Action", "Background", "additional_details", "state", "signature_count" and "created_dt".

Another innermost class is "SelfLink" which contains the class definition of the link. The class DataSubItem represents a petition. This class has properties like "type", "id", "SelfLink" and "AttributeSubItems".

The class DataItem has a property data which will return a collection of DataSubItems. The root level class "Petition" has two properties namely "Links" and "Data". The "links" property represents the "Links" dataset and the "data" property represents the Petition dataset. Using these objects, we can navigate and extract the data from the supplied JSON file.

The below represents the petition class definition.

using System;
using System.Collections.Generic;
using System.Collections.ObjectModel;
using System.Linq;
using System.Text;
namespace PetitionNamespace
{
   public class Petition
   {
      public LinkSubItem links { get; set; }
      public System.Collections.ObjectModel.Collection<DataSubItem> data { get; set; }
   }
   public class DataItem
   {
      public Collection<DataSubItem> data { get; set; }
   }
   public class DataSubItem
   {
      public string type { get; set; }
      public string id { get; set; }
      public Selflink links { get; set; }
      public AttributeSubItems attributes { get; set; }
   }
   public class Selflink
   {
      public string self { get; set; }
   }
   public class AttributeSubItems
   {
      public string action { get; set; }
      public string background { get; set; }
      public string additional_details { get; set; }
      public string state { get; set; }
      public int signature_count { get; set; }
      public string created_at { get; set; }
   }
   class Link
   {
      public LinkSubItem links { get; set; }
   }
   public class LinkSubItem
   {
      public string self { get; set; }
      public string first { get; set; }
      public string last { get; set; }
      public string next { get; set; }
      public string prev { get; set; }
   }
}			
Petition Class Definition - Description: Petition Class Definition

Deserialization of JSON Data

The UK petition file can be deserialized using the JavaScriptSerializer. The deserialized object can be stored as an object of type Petition. This petition object contains linkSubItem object which can be accessed using the Links property of the Petition object. The link attributes "Self", "First", "last", "Next", "Prev" are available as properties of the linksubitem object.

The petition details are available as a collection of DataSubItem object. This can be accessed using the data property of the petition object. Each item in the DataSubItem collection represent a petition.

The petition properties id and type can be accessed directly using the datasubitem object. The "self" property of the petition can be accessed using the "Links" object. The links property of the dataSubItem will return the selflink object.

The self-property of the Selflink object will return the actual link. The petition attributes can be accessed using the attributes property of datasubitem. The datasubitem will return attributesubitems. The actual attributes "action", "background" can be accessed using the attributesubitem object.

The below code will help you to deserialize the JSON file and will return links and Petition outputs.

public override void CreateNewOutputRows()
{
   /*
     Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
     For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
   */
   String jsonFileContent = File.ReadAllText(@"C:\SSIS_JSON\samples.JSON");
   JavaScriptSerializer js = new JavaScriptSerializer();
   Petition petition = js.Deserialize<Petition>(jsonFileContent);
   
   //Extract links details as a separate output
   LinkSubItem linkSubItem = petition.links;
   LinksOutputBuffer.AddRow();
   LinksOutputBuffer.Self = linkSubItem.self;
   LinksOutputBuffer.First = linkSubItem.first;
   LinksOutputBuffer.Last = linkSubItem.last;
   LinksOutputBuffer.Next = linkSubItem.next;
   LinksOutputBuffer.Prev = linkSubItem.prev;
 
   //Extract petition details in a Petition output
   System.Collections.ObjectModel.Collection<DataSubItem> dataSubItems = petition.data;
   foreach (DataSubItem dataSubItem in dataSubItems)
   {
      PetitionOutputBuffer.AddRow();
      PetitionOutputBuffer.type = dataSubItem.type;
      PetitionOutputBuffer.id = dataSubItem.id;
      Selflink selflink = dataSubItem.links;
      PetitionOutputBuffer.Link = selflink.self;
      AttributeSubItems attributeSubItems = dataSubItem.attributes;
      PetitionOutputBuffer.Action = attributeSubItems.action;
      PetitionOutputBuffer.Background = attributeSubItems.background;
      PetitionOutputBuffer.additionaldetails = attributeSubItems.additional_details;
      PetitionOutputBuffer.state = attributeSubItems.state;
      PetitionOutputBuffer.signaturecount = attributeSubItems.signature_count;
      PetitionOutputBuffer.createdat = attributeSubItems.created_at;
   }
}			
Creating Output Rows - Description: Creating Output Rows

Once the changes have been made, let's execute the package to see the results. As we have setup the data viewer on both the outputs, we can see the data in the data pipeline.

The LinksOutput has one record with the columns "Self", "First","Last","Next" and "Prev". The Petition output will have all the petition details with the columns "id", "type", "link", "Action", "Background", "additional_details", "state" and "siganture_count".

Data Viewer Results - Description: Data Viewer Results
Data Flow - Execution - Description: Data Flow - Execution

Summary

In this tip, we have learned how to import a UK petition JSON file using SQL Server Integration Services. We have also learned about analyzing nested JSON data files in a step by step manner.

Next Steps
  • Learn JSON basics with this tip
  • Challenge your JSON knowledge with this tip


Last Updated: 2018-04-26


next webcast button


next tip button



About the author
MSSQLTips author Nat Sundar Nat Sundar is working as an independent SQL BI consultant in the UK with a Bachelors Degree in Engineering.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Friday, November 16, 2018 - 10:51:59 PM - Fred A Back To Top

 Hello Nat,

thank you for taking the time to write this...i am having problems with my JSON following this tutorial.

are you still monitoring this tutorial?


Thursday, April 26, 2018 - 11:55:22 AM - JOSE FRANCISCO BRIONES FLORES Back To Top

 Hello. Thanks for the article, it's very helpfull, we'll going to try your example.

We have a question about the JSON and SSIS, we have known that JSON Support files are available from SQL Server 2016, thus we'd like to know if this example would work on SQL Server 2014 Installation.

 Thanks in advance.


Learn more about SQL Server tools