Power BI Datasets
I want to create a reporting solution for my company using Power BI. I've been learning the tools, but I wonder what the difference is between a dataset and a workbook. When would I use which solution?
In this tip, we'll give an overview of the dataset and workbook objects of Power BI. It's not the goal to list and explain every possible feature, but rather give an introduction and explain their usage. You can follow along if you have a free, a Pro or a Premium Power BI subscription, but some features might not be available in all subscription models. The screenshots in this tip are made with Power BI Desktop August 2019 edition. You can download the latest version here. Since Power BI (PBI) Desktop is continuously evolving, it's possible the screenshots differ from the PBI Desktop version you're currently using.
Power BI Datasets
In PBI Desktop, there are 3 main ways to retrieve the data upon which you will build your visualizations:
- Live. Here you connect to a server which holds all the data. No data is transferred, but the metadata of the model is imported into PBI Desktop. When you create visualizations, a query is sent to the server where it will be executed. The results are then returned to Desktop and visualized. Live connections are typically used with SQL Server Analysis Services (SSAS) models, be it Tabular or Multidimensional. In this case, PBI Desktop acts like any other thin client, such as Excel or Reporting Services (SSRS). It's not possible to make fundamental changes to the model, although you can add new measures to the model which are then available in that .pbix file.
- DirectQuery. Similar to a Live connection, but here you can make more changes to the model. Like in Live, the data stays at the server and queries are executed on the server. But it's for example possible to create relationships in the PBI Desktop model.
- Import. Here the data is imported in the PBI Desktop file (.pbix) using Power Query queries. The data is highly compressed so it's possible to load millions of records into a file on your machine. Behind the scenes, a model is created which is very similar to an SSAS Tabular model. This mode is the most flexible, since you can combine data from all possible sources. All data however needs to be imported into your model which can cause long refresh times.
If you have Power BI Premium, you can create a hybrid of DirectQuery and Import.
In short, a Power BI dataset is the model created in PBI Desktop that holds all of your data. For Live and DirectQuery you connect to the data, for Import you import the data. For the remainder of the tip, we'll use the Import mode.
Creating a Power BI Dataset
A dataset is linked one on one with the .pbix file where it was created. You can create a new dataset by clicking on Get Data when you start PBI Desktop.
Or you can select a source from the dropdown menu:
Let's suppose we imported a couple of tables from the SQL Server sample database WideWorldImporters (you can download the .pbix file here). In the Model view, you can see the tables and their relationships:
In the Data view, you can see the actual data of one table at a time.
In the Report view, you can create, view and interact with visualizations build on top of the data and the model.
You could say the dataset comprises the data and the model view.
Publishing a Power BI Dataset
Once your content with your report, you can publish the .pbix file to the Power BI service. While PBI Desktop is free, you need a Power BI subscription to share your reports with other people. Simply click on Publish in the ribbon to start the process.
If necessary, you'll be asked to log into your Power BI account. Once logged in, you need to choose a workspace where you want to store your items. When publishing, the .pbix file will be split into two pieces: a report based on what you created in the report view and a dataset (the combination of the data and model view).
When publishing is complete, you can view the artifacts in the Power BI service.
Either click on the link as indicated in the screenshot, or go to app.powerbi.com to view the results. In the workspace, you can see there's a report and a dataset created with the name of the PBI Desktop file:
If you click on your dataset, the service will open a new blank canvas where you can start creating reports, just as in Power BI Desktop:
Another exciting feature is that you can re-use the dataset is a data source for a new Power BI Desktop file. In the Get Data dialog, you can select Power BI datasets as the data source.
After connecting to the service, you can choose any dataset from any workspace you have permission to:
This source behaves as a "live" source, meaning you only see the model. Just like you would when connecting to Analysis Services.
The biggest advantage is that you can create one model for your company, publish it to the service and anyone else in your company can benefit from your modeling efforts. You'll have a centralized model which leads to less mistakes and less ambiguity.
Scheduling a Power BI Dataset Refresh
It's not practical to refresh your dataset manually in PBI Desktop and then upload it yourself to the service in order to have up-to-date data. Luckily you can schedule a refresh so your dataset contains the latest data automatically. It's also possible to refresh the data manually from the service for any uploaded dataset:
If you click on the "schedule refresh" icon, you'll go to the properties page of the dataset:
There are two important source categories for a scheduled data refresh:
- Cloud data sources. The Power BI service can connect to those sources without any intervention. Examples are: Azure SQL Database, Azure SQL Datawarehouse, Azure Analysis Services and SharePoint Online.
- On-premises data sources. In this case, the Power BI service needs a gateway to connect to the data. The gateway is responsible for the connection between the data source and the PBI service. Keep in mind some cloud sources may still need a gateway, such as for example a data source inside an Azure Virtual Machine, but also some data sources from other parties, such as Snowflake for example.
Once you have a gateway configured, you can enter the data source credentials:
Finally, you can configure a schedule to refresh your data:
Depending on your subscription, you can refresh your dataset more frequently.
In this part of the tip we gave an overview of the Power BI dataset object. It's the data and model from a Power BI Desktop file and can be compared with an Analysis Services Tabular model. A dataset can be uploaded to the Power BI service. It can serve as a source for new reports created in the service, but also for new Power BI Desktop reports. Finally, you can schedule a refresh of the dataset. Depending on the source data, you might need a gateway to establish a connection.
- You can download the sample model here.
- More information about the gateways:
- Official documentation on data refresh.
- You can find more Power BI tips in this overview.
About the author
View all my tips