Power BI Desktop Data Source Considerations
Before you even start developing a data analytics solution using the Power BI platform, some thought must be put into the source data. Are you going to use a data warehouse? Do you need Analysis Services? How is the data going to be loaded into the Power BI Desktop model? In this tip, an overview is given of the considerations you need to take into account before you fire up Power BI Desktop for your project.
Power BI Desktop is a free tool from Microsoft, allowing you to create sophisticated data models which you can use to create compelling visualizations and interactive reports. Even though it's a self-service business intelligence solution at heart, it doesn't mean no thought should be put into the design of the model. Some practical considerations regarding the source data are provided in this tip. The term "source data" in this tip means the data that is loaded or presented in the Power BI Desktop file. If you have a data warehouse, the source data is the data from the data warehouse. These considerations are by no means an exhaustive list, but they can serve as a checklist for when you're creating your own Power BI model.
To Data Warehouse or To Not Data Warehouse?
When starting a data analytics project, the question often arises: do we need a data warehouse? A data warehouse can be complex, time-consuming and expensive. However, when the project is done right, a data warehouse has numerous advantages: data from multiple sources is integrated into a centralized location, you can combine batch and real-time data streams, complex business calculations can already be done (meaning you won't have to implement them in Power BI (PBI) Desktop et cetera. One of its main advantages is the historization of data. Using slowly changing dimensions, you can build up history of your dimensions members, allowing for much more rich data analysis. This is something that is very hard to even impossible to achieve in PBI Desktop.
For small ad-hoc analysis projects, such as analyzing the data from one text file, you most likely don't need a data warehouse. But if you see a particular data set being used all over the organization, or if multiple people make reports over the same data (sometimes with even different results!), it might be time for a centralized effort to create a data warehouse. This also rings true when modelling the data in PBI Desktop becomes too complex, the measures are too hard to write or it takes too long to process the model.
With or without a data warehouse, another question still stands: do we need a semantic model using Analysis Services? For small to medium data volumes, the answer is probably no. If this surprises you, keep in mind the model in PBI Desktop is in fact an Analysis Services model. PBI Desktop has most of modelling features you expect from SSAS. Only for data volumes too large for a PBI model or when you absolute need an Enterprise feature (such as generating partitions) you can choose between either an on-premises SSAS installation or Azure Analysis Services. Another advantage of using SSAS means you can refresh the data as many times as you want during the day, while in the Power BI service you need Premium to have this many refreshes. If you use Pro, you can only refresh up to 8 times a day.
The following diagram gives an overview of the options for storing your data before it is even loaded to PBI Desktop:
Keep in mind much more complex scenarios are also possible. For example, you can add a data lake before your data warehouse and even the data warehouse itself can be split into multiple layers (staging, enterprise data warehouse using data vault for example and smaller data marts as a presentation layer). Discussing all possible options is out of scope for this tip.
Having a centralized location for your Power BI source data means you have less configuration time in Power BI regarding data source credentials and gateways: you only have to configure one. If you directly import all of your different sources into PBI Desktop, this can be more of a challenge. If only one of the sources is inaccessible, it's possible your data set won't refresh in the Power BI service.
Loading Source Data
As described in the tip Power BI Datasets, there are three main ways to get data into PBI Desktop. Let's take a look at each.
Using a Live Connection to an SSAS model (Multidimensional or Tabular). In this case, you have almost no options. The only best practice is: use the actual live connection to the model, where the data stays in SSAS and the results of the query are sent to Power BI. Do not re-import the data into PBI Desktop. The SSAS model is typically made by IT professionals and importing the data into another tool will mean there's an uncurated copy of the data going around in the company.
If you use SSAS Tabular, model the data using a star schema. This means you'll have tables acting as fact tables and other tables as dimensions. Even though this is not a requisite in Tabular, it is certainly recommended since a star schema has the best performance and is the most intuitive for end users.
You can find some SSAS performance tips in this overview.
Using DirectQuery on top of a relational data store. Here the data also stays at the source and only the results of the queries are sent to Power BI. This is option is for example preferred when the volume of the data is too large to be contained in a PBI Desktop file. Again, your options here are fairly limited. Preferably the model is kept as simple as possible, meaning the source data is modelled using a star schema. DirectQuery models on top of normalized databases or data vaults might have query performance issues and are most likely not intuitive for the end user. Optimally the source is optimized for read access, for example by using indexes supporting the most common queries. Optionally you can extend the model with aggregate tables to further improve performance.
The most flexible option is the Import. Here you import data from one or more sources into the data model. You can either import the actual source data into PBI Desktop, or you can build a data warehouse first.
Best practices regarding the import with Power Query will be handled in an upcoming tip.
- For more information about DirectQuery, you can check out the tip DirectQuery Mode in SQL Server 2012 Analysis Services (SSAS) Tabular (the principle is the same for SSAS Tabular as for Power BI).
- If you want to improve the performance of DirectQuery or Import, the tip Improve Power BI Performance with SQL Server Indexing might give you inspiration.
- More Power BI tips can be found in this overview.
About the author
View all my tips