Understanding Power BI Dataflows and Connect to Various Data Sources

By:   |   Updated: 2023-07-12   |   Comments   |   Related: > Power BI


Problem

As the world moves from conventional desktop data transformation to a more cloud and enterprise-based solution, getting big data cleaned, well-structured, and ready for analytic insight is becoming a top priority for most enterprise firms.

Solution

The need for a self-service analytic solution for business intelligence (BI) users that allows them to perform the whole ETL process without the need for support from IT or BI experts is the goal. Users can thus analyze data, produce reports, and develop dashboards without being proficient in writing SQL or utilizing sophisticated statistical applications.

A business intelligence solution that provides all these features allowing developers to perform data transformation on the Power BI services is known as dataflows. In other words, dataflows are known as Power Query online. In this article, I will walk through different applications using Power BI dataflows and how to connect with various data sources.

What are Power BI Dataflows?

Power BI dataflows is cloud-based and enables the creation and administration of reusable data transformations to help Power BI reports and dashboards operate better and be more accurate. When using dataflows, the destination of the data is usually Azure Data Lake Storage, Dataverse, or other location, unlike Power BI desktop Power Query which loads data into the dataset.

Microsoft Dataflows Architecture

Microsoft Dataflows Architecture

When to Use Dataflows

Dataflows are called Power Query online and can be used in the following scenarios:

  • Reusability: This creates a reusable transformation logic that can be used by numerous datasets and reports. By encouraging the reuse of core data components, dataflows eliminate the need for separate connections to your cloud or on-premises data sources.
  • Storage: Data in Dataflows are usually stored in Azure Data Lake Gen 2, which provides a secure way of storing your data.
  • ETL Process: Dataflows with Power BI Premium scale more effectively and allow you more options if you want to work with massive data volumes and do ETL at scale.

Project Architecture

For this article, I will walk through how to connect Power BI dataflows to different data sources, both cloud and on-premises, and we will create a parameter that will help us switch between data sources easily if one source is down.

Project Architecture

Project Architecture

Create Dataflows

Dataflow is a collection of tables created and managed in a workspace. Directly from the workspace where your dataflow is created, you can add and change tables, control data schedules refresh, and more.

In your Power BI workspace, you can check our previous article on creating a workspace, Securities in Power BI. Click on the New icon at the top left corner, then select Dataflows. This will take you to another window.

Create dataflows

In the new window of dataflow, we are provided with four options. You are expected to select any option based on how you want to create your dataflow.

Create dataflows

Connect Dataflow to SharePoint Folder

SharePoint is a web-based collaboration and document platform naturally integrated with the Microsoft 365 suite. Launched in 2001, it is a central hub for businesses to store, arrange, exchange, and work together on many kinds of content, including files, lists, papers, and web pages.

Step 1: Get SharePoint URL Link

In your SharePoint site, navigate to the document tab at the left corner. On the URL tab, copy the URL till you get to the "Site name."

Note: You are not expected to copy the entire link on the path, just the part of the SharePoint site.

Get SharePoint URL Link

Step 2: Dataflow Get SharePoint Folder

In your Dataflow environment, click "Add new tables," which will open a new window.

Dataflow Get SharePoint Folder

In the new window, search for the SharePoint folder, then select the SharePoint folder.

Get SharePoint Folder

Step 3: Connection Settings

In the connection settings, you are expected to fill in the following:

  • Site URL: This is the link from the SharePoint site earlier.
  • Gateway: No gateway is needed since the data source is cloud-based.
  • Authentication kind: This is set to the organization account used in logging in.

Click on the Next button at the bottom right to load the data.

Connection Settings

Step 4: Transform Data

In the dataflow preview environment, click on the transform data. This will take you to the Power Query editor environment for dataflow.

Transform Data

We need just one table, the 1.1.xlsx file, so we will remove the unwanted table.

From your Home tab, click on Remove rows, then select top rows. Input the number of rows you want to remove and click OK.

Transform Data

Step 5: Expand Data

Click on the Combine file dropdown in the Content column. This will expand the entire data table. This should take a couple of minutes, depending on your internet connection.

Expand Data

After successfully combining the file, we need to remove the Source.Name column. This can be done by clicking on the column and selecting the Remove Columns icon.

Lastly, we can rename the table name from Query to SharePoint. This will help to identify the table easily.

Expand Data
Expand Data

Diagram View

The diagram view is a feature that allows you to see the connection between the various tables and columns in your dataflow. Understanding the structure of your data and addressing potential issues can benefit from this. It can be said that the diagram view simplifies the entire data-wrangling process.

In the dataflow Power Query editor, click on the View tab, then select the Diagram View. The diagram view also provides a means of performing data transformation using a graphical user interface.

Diagram View

Now that we are done transforming the data, let's save & close the dataflow Power Query environment by clicking the Save button in the bottom right corner.

Step 6: Save Dataflow

After clicking on the Save button, a new window appears. You must provide the dataflow with a name in the new window.

Save dataflow

Connect Dataflow to Azure Blob Storage

The Microsoft Azure Blob Storage is a fully managed object storage that provides high performance, scalability, durability, and security for your data. Blob storage supports structured and unstructured data such as text or binary data, images, documents, videos, and log files.

To create the Azure Blob storage, you can check out our previous article Mount Azure Blob Storage.

Step 1: Get the Property URL

In your Azure portal, navigate to your blob storage where the file to be used is uploaded. In your blob container, click on the 1.1.csv file. This should open another window.

Get the Property URL

In the new window, copy the property URL.

Get the Property URL

Step 2: Get Data from Blob Storage

In your Dataflow environment, click Add tables at the top right corner and add a new data source.

Get Data from Blob Storage

In the Choose data source window, search for blob storage, and select Azure Blobs storage.

Get Data from Blob Storage

Step 3: Configuration Settings

The property URL copied from the blob storage is divided into two parts. The first is the blob storage URL, while the other is the file name.

Configuration Settings

Get Account Key

In your Azure storage account, search for Access keys, then copy the key into a secure place.

Get account key

In the connection settings, you are expected to fill in the following:

  • Account name: Copy the Blob Storage URL to the Account name.
  • Data Gateway: This is optional since the data source is cloud-based.
  • Authentication Key: Paste the access key copied from the Storage account.
Get account key

Step 4: Transform Data

Perform the same amount of transformation for the SharePoint folder aspect.

Start by expanding the data by clicking on the Content dropdown column.

Transform Data

After expanding, drop the Source.name column, then save and close.

Connect Dataflow to SQL Server Database

The SSMS is an integrated environment for managing Microsoft SQL Server. The following parameter is needed to connect Power BI Dataflows to SQL Server, which is an on-premises data source.

Create and Insert Data into Table

The following steps are needed to create a data table in SSMS.

Step 1: Create a Table.  In your SQL Server, create a new table by writing the query below. They should provide the individual column headers.

create table sales_target(
   SN varchar(999),
   Date_ date,
   Branch varchar(50),
   Pizza_Sold varchar(50),
   Price int,
   Quantity int,
   Sales_Discount numeric,
   Time_ time,
   Time_Range varchar(30),
   Total numeric
);

Step 2: Insert Table Using SQL Server Import and Export Wizard. The Import and Export wizard is a graphical interface that helps users import data from various sources to a destination.

In your SSMS, right-click on the database, navigate to Tasks, and select Import Data. This should open another window for you.

Insert Table Using SQL Server Import and Export Wizard

In the Data Source, click on the dropdown and select Microsoft Excel, then navigate to the path in which the file is saved on your local machine and click the Next button when done.

Insert Table Using SQL Server Import and Export Wizard

For the destination, change the dropdown to SQL Server Native Client 11.0. The authentication can be Windows since the data source is on an on-premises source.

Insert Table Using SQL Server Import and Export Wizard

The table and view are important parts of the entire data import process. Ensure the destination the data has been imported to is the right table.

Insert Table Using SQL Server Import and Export Wizard

Click on Edit Mapping at the bottom right. In the new window, ensure each Destination has the right Data type.

Insert Table Using SQL Server Import and Export Wizard

Click on the Next tab until you get to the final process, with the message "The execution was successful."

Insert Table Using SQL Server Import and Export Wizard

Run the select all query to check if the data was successfully inserted into the sales_target table.

Insert Table Using SQL Server Import and Export Wizard

Set Data Gateway

A data gateway acts as a bridge between an on-premises and cloud-based data source. It enables rapid and safe data transfer between several Microsoft cloud services and on-premises data.

Step 1: Install and Setup Data Gateway.  Microsoft provides two types of gateways: the personal and the standard gateway. For this article, we will be using the standard gateway.

In our previous article on Power BI Data Gateway, we explained all you need to know on how to download the data gateway and configure it.

Your data gateway should be up and running at the end of your configuration.

Set Data Gateway

Note: Leave the data gateway location at the default location for this project.

Step 2: Add a New Connection.  We need to add a new connection to our Power BI service that will be used in connecting to the SQL Server.

In your Power BI service, click the top right corner gear icon and select Manage connections and gateways. This will take you to another window.

Add a New Connection

Step 3: Configure New Connection.  Before configuring the connection, go to your SQL Server and navigate to the Connection Properties. This was also explained in our previous article, Power BI Data Gateway.

In the manage connection environment, click on the New icon at the top left corn and fill in the following information:

  • Gateway Cluster Name: Click on the dropdown and select your data cluster currently running.
  • Connection Name: Provide your new connection with a name to easily identify the connection.
  • Server: The Server is from the Connection Properties in your SQL Server.
  • Database: This is the database in which our table is located.
  • Authentication Method: We will be using Windows Authenticator for the authentication method.
  • Username: You can get the username from your connection properties or Command prompt using the command whoami.
  • Password: Provide it with your SQL Server login password.
  • Single Sign-on: Use the direct and import query.
  • Privacy Level: For this, I am using Organizational and clicking on Create.
Add a New Connection

Get SQL Server Data

Now, let's connect our Power BI Dataflow to SQL Server. In your dataflows, click Add Tables and search for SQL Server database.

The following set below will be used in configuring the database:

Step 1: Connection Settings.  In the Connection Settings, you must provide the following settings from the image below, then click Next.

Connection Settings

In your Power Query preview, select the table you want to use, the Sales_Taget Table, then Transform data.

Connection Settings

Step 2: Data Transformation.  Perform all necessary transformations in your Power Query Editor, then Save & close.

Data Transformation

Step 3: Save Dataflow.  Click on the Close button at the top right and save the file.

Save Dataflow

Create Report Using Dataflow in Power BI Desktop

Now that we are done creating the dataflow, let's connect Power BI Desktop to dataflow.

Step 1: Get Dataflow Data

On your Power BI desktop, click Get Data and select Dataflows. Provide the same login credential as that of your Power BI Services.

Get Dataflow Data

In your Power Query editor, you will notice little or no transformation is required since the data has been transformed in Power BI Service dataflow, then click the Close & Apply button.

Get Dataflow Data

Step 2: Create a Report

In your Power BI Report, create a simple report in your report canvas.

Create Report

Step 3: Publish Report to Power BI Workspace

To migrate the report from the Power BI desktop to the Power BI service, you must first create a Power BI workspace. Workspaces are Power BI service containers that house content such as datasets, reports, dashboards, and dataflows. You can check out our previous article on how to create a Power BI workspace.

In your Home tab, click Publish and select the workspace where the report will be published.

Publish Report to Power BI Workspace

In your Power BI service workspace, two new contents have been added: the Report and Dataset.

Publish Report to Power BI Workspace

To better understand, let's change the view from a list view to a data lineage view. The data lineage displays which data sources are utilized by which datasets, the transformations applied, and which are subsequently used by which reports.

Publish Report to Power BI Workspace

Step 4: Refresh Dataflow

Refreshing the dataflow saves you the stress of refreshing the dataset due to the fact transformation is first carried out in the dataflow before going to the dataset.

Click refresh now to automatically refresh your datasets to the most recent one.

Refresh Dataflow

To track the refresh history of your dataflow, click on the refresh history.

Refresh Dataflow

Conclusion

This article taught us about Power BI dataflows, when to use it, and the different applications. We also covered how to get data using the dataflow from multiple data sources such as SQL Server, SharePoint, and Azure Blob Storage.

We further explained data gateway and a use case as our previous article on Microsoft Data Gateway.

Next Steps

Learn more about Power BI in this 3 hour training course.


Click here to start the Power BI course






sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Temidayo Omoniyi Temidayo Omoniyi is a Microsoft Certified Data Analyst, Microsoft Certified Trainer, Azure Data Engineer, Content Creator, and Technical writer with over 3 years of experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-07-12

Comments For This Article