Analyzing Multiple Pages of UK Petition Data Using Power BI


By:   |   Updated: 2018-01-31   |   Comments   |   Related: More > Power BI


Problem

I have read the previous tip about analyzing UK Petition data using Power BI. I understand because of technical limitations, the UK parliamentary system provides only 50 petition details in a JSON feed. So how can I load all open petition details into Power BI?

Solution

UK Petition Data

Because of a technical limitation, the JSON feed can only provide 50 petition details. The next set of 50 petition details are available in the next JSON feed. Now let’s understand the format of the supplied JSON feed.

Understanding the JSON feed format

The JSON feed has two major groups. They are “Links” and “Data”. In the below mentioned example, the link for the first, last and next JSON feed details are available under the navigation “Links".  The first 50 petition details are available under the navigation “Data”. It is well understood that there were 65 pages of JSON feed petition details available.

JSON First Page - Description: JSON First Page

The “first” link has the details about the current URL and the next set of petition details can be extracted from the link "https://petition.parliament.uk/petitions.json?page=2&state=open". By looking at the link carefully, we can come to a conclusion that the page number is getting increased in ascending order till the last page.

JSON Page Details - Description: JSON Page Details

Based on this finding, we can modify the Power BI query through the advanced editor to extract petition details from the first page to the last page incrementally. As a first step, we need to extract the final page number from the JSON link.

Creating a query to find the last page number

As a first step, let’s try to extract the final page number from the JSON feed. For this purpose, I am going to extract data from the JSON feed "https://petition.parliament.uk/petitions.json?page=1&state=open".

Follow the below steps to extract data from the 1st page of the JSON feed for petition.

Click on GetData menu and "More" sub menu to open the "Get Data" dialog box as shown below.

Open GetData menu - Description: Open GetData menu

Now select on the "Other" sub menu and select "Web" and click on "Connect".

Select Web to load JSON - Description: Select Web to load JSON

On the dialog box, type the URL (https://petition.parliament.uk/petitions.json?page=1&state=open) as below and click OK.

Type web URL - Description: Type web URL
Type web URL - Description: Type web URL

Once the JSON feed has been loaded, you will be see the query properties.

Edit the Query - Description: Edit the Query

This highlights the fact that there are two possible ways to navigate. There are "Links" and "Data". Details about the links and the data can be explored by clicking on the header respectively.

By clicking on the "links" header, it is revealed that it has the links for "Self", "First","Last","Next" and "Prev" page details.

Preview Links navigation - Description: Preview Links navigation

Our main objective is to extract the last page number from the "Last" page link. So let’s navigate to the "Links" to see the details.

Load Links into table - Description: Load Links into table

The above image lists all the available links with the JSON feed. To enable us to apply data transformation rules, we need to convert the result of the query into a table.

This can be done by clicking on the "Convert into Table" ribbon. After converting into a table, the result set will be available as shown below.

Load Links into table - Description: Load Links into table

Now let’s apply the filter to list only the "Last" page as shown below.

Apply filter to select last page - Description: Apply filter to select last page

The resultant table will have only one record with the last page URL. The last page number 42 (in this example) is embedded in the URL. We may need to apply few transformations to the page number.

As a first transformation, I would like to delimit the value by the "?" character. This will split the column value into two different columns. Let’s apply this transformation using the ribbon menu "Split Column" by delimiter and supply the delimiter "?" in the given dialog box as shown below.

Last page URL - Description: Last page URL
Apply demiliter - Description: Apply demiliter

Once the transformation has been applied, this will split the column "value" into two columns namely "Value.1" and "Value.2".

Split columns - Description: Split columns

We need to split the column "Value.2" to get the actual page number. Let’s apply the transformation "Split column by delimiter" by the character "&" this time.

Split columns - Description: Split columns

After applying the transformation, we are able to split the Value.2 column into two columns namely "Value2.1" and "Value 2.2". Now we need to apply one more transformation to get the final page number. Let’s split the column further by using the delimiter "=".

Apply transformations - Description: Apply transformations

Now the final page number can be extracted by referencing the column "value.2.1.2". We are interested only the column which contains the last page number. So the other unwanted columns can be removed from the view.

Remove unwanted columns - Description: Remove unwanted columns

Also, I have renamed the column as "LastPageNumber". The below picture represents the final result set.

Column Value - Description: Column Value

Column renamed - Description: Column renamed

Though we have derived the final page number, it may not be useful as is, because we will not in be in a position to lookup this value to get the final page number every time. So let’s wrap the extraction and transformation logic in a function, so the function can be called from other queries to get the final page number.

Converting the transformation logic to a function

We can easily convert the query to a function using the below steps. Select and right click on the query to open the menu. Click on the "Create function" menu to create a function, as shown in the picture below.

Creating a function - Description: Creating a function

This function doesn’t have any parameters. So let’s create the function without the parameters.

Create a function without the parameters - Description: Create a function without the parameters

Type the name of the function in the text box as shown below.

Create function - Description: Create function

Now the function has been created.

Invoke function - Description: Invoke function

Now let’s click on the “Invoke” button to test the function. This will return a table with a column "LastPageNumber", as shown in the picture below.

Invoke function - Description: Invoke function

We are interested only in the value of the column. Hence the transformation logic can be updated as below to return only the value. Click on the “Advanced“ editor to see the transformation logic and update as below.

 
let
Source = () => let
Source = Json.Document(Web.Contents("https://petition.parliament.uk/petitions.json?page=1&&state=all")),
links = Source[links],
#"Converted to Table" = Record.ToTable(links),
#"Filtered Rows" = Table.SelectRows(#"Converted to Table", each ([Name] = "last")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Value", Splitter.SplitTextByDelimiter("?", QuoteStyle.Csv), {"Value.1", "Value.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Value.1", type text}, {"Value.2", type text}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Value.2", Splitter.SplitTextByDelimiter("&", QuoteStyle.Csv), {"Value.2.1", "Value.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Value.2.1", type text}, {"Value.2.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type1", "Value.2.1", Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv), {"Value.2.1.1", "Value.2.1.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Value.2.1.1", type text}, {"Value.2.1.2", Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Name", "Value.1", "Value.2.1.1", "Value.2.2"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Value.2.1.2", "LastPageNumber"}}),
LastPageNumber = Table.Column(#"Renamed Columns","LastPageNumber"){0}
in
LastPageNumber
in
Source
open advanced query editor - Description: open advanced query editor

Once done, click OK and then invoke and now you will see only the actual value.

Invoke function - Description: Invoke function

Iterating through multiple pages

In the previous tip, we have learned to process a single JSON feed which contains only 50 petitions. Now we learned to extract the last page number in this tip. Let’s have a close look at the JSON feed for multiple pages.

Almost all the components of the URL are same except the page number. Hence the JSON link for multiple pages can be generated dynamically from the first page to the last page in Power BI. In the below steps, I will help you setup a query to iterate thru all the available JSON web pages to load data into Power BI.

Creating a blank query

Let’s create a blank query and edit the name of the query as "AllPetitions" as below.

Creating a blank query - Description: Creating a blank query

Now click on the advanced editor and paste the below transformation logic.

 
let
Source1 = GetLastPageNumber(),
Starts = List.Generate(()=>1, each _ < Source1 , each _ + 1),
#"Converted to Table" = Table.FromList(Starts, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Json.Document(Web.Contents("https://petition.parliament.uk/petitions.json?page="&[Column1]&"&state=all"))),
#"Expanded Custom" = Table.ExpandRecordColumn(#"Added Custom", "Custom", {"links", "data"}, {"Custom.links", "Custom.data"}),
#"Expanded Custom.data" = Table.ExpandListColumn(#"Expanded Custom", "Custom.data"),
#"Expanded Custom.data1" = Table.ExpandRecordColumn(#"Expanded Custom.data", "Custom.data", {"type", "id", "links", "attributes"}, {"Custom.data.type", "Custom.data.id", "Custom.data.links", "Custom.data.attributes"}),
#"Expanded Custom.data.attributes" = Table.ExpandRecordColumn(#"Expanded Custom.data1", "Custom.data.attributes", {"action", "background", "additional_details", "state", "signature_count", "created_at", "updated_at", "rejected_at", "opened_at", "closed_at", "moderation_threshold_reached_at", "response_threshold_reached_at", "government_response_at", "debate_threshold_reached_at", "scheduled_debate_date", "debate_outcome_at", "creator_name", "rejection", "government_response", "debate"}, {"Custom.data.attributes.action", "Custom.data.attributes.background", "Custom.data.attributes.additional_details", "Custom.data.attributes.state", "Custom.data.attributes.signature_count", "Custom.data.attributes.created_at", "Custom.data.attributes.updated_at", "Custom.data.attributes.rejected_at", "Custom.data.attributes.opened_at", "Custom.data.attributes.closed_at", "Custom.data.attributes.moderation_threshold_reached_at", "Custom.data.attributes.response_threshold_reached_at", "Custom.data.attributes.government_response_at", "Custom.data.attributes.debate_threshold_reached_at", "Custom.data.attributes.scheduled_debate_date", "Custom.data.attributes.debate_outcome_at", "Custom.data.attributes.creator_name", "Custom.data.attributes.rejection", "Custom.data.attributes.government_response", "Custom.data.attributes.debate"})
in
#"Expanded Custom.data.attributes"

The GetLastPageNumber function will calculate the "Last Page Number" and will return a variable "Source". As a next step, a list will be generated containing the numbers from 1 to Last Page Number (say 65).

Now for each number the JSON feed will get generated by adding the number to the link. For the first page, it will be https://petition.parliament.uk/petitions.json?page=1&state=open and then the data gets loaded into the query and then it will iterate thru the next page and so on. Like this it will iterate thru all the pages and all petitions details will get loaded into the query.

Query settings - Description: Query settings

Once the changes have been made and closed, then the final result set will look like the below.

Final resultset - Description: Final resultset

Summary

In this tip, we learned how to overcome the limitation of the UK petition system by leveraging a custom function in Power BI. Stay tuned to read the next tip to develop a custom UK parliamentary constituency map to display signature counts for each petition.

Next Steps


Last Updated: 2018-01-31


get scripts

next tip button



About the author
MSSQLTips author Nat Sundar Nat Sundar is working as an independent SQL BI consultant in the UK with a Bachelors Degree in Engineering.

View all my tips





Comments For This Article





download





Recommended Reading

Power BI Histogram Example using DAX

Using Power BI with JSON Data Sources and Files

Calculating MTD, QTD, YTD, Running and Cumulative Total in Power BI

Power BI Conditional Formatting for Matrix and Table Visuals

Deploy Reports from Development to Test to Production using the Power BI Deployment Pipelines








get free sql tips
agree to terms


Learn more about SQL Server tools