Power BI Desktop Data Source Considerations


By:   |   Updated: 2020-05-28   |   Comments   |   Related: More > Power BI


Problem

Before you even start developing a data analytics solution using the Power BI platform, some thought must be put into the source data. Are you going to use a data warehouse?  Do you need Analysis Services? How is the data going to be loaded into the Power BI Desktop model? In this tip, an overview is given of the considerations you need to take into account before you fire up Power BI Desktop for your project.

Solution

Power BI Desktop is a free tool from Microsoft, allowing you to create sophisticated data models which you can use to create compelling visualizations and interactive reports. Even though it's a self-service business intelligence solution at heart, it doesn't mean no thought should be put into the design of the model. Some practical considerations regarding the source data are provided in this tip. The term "source data" in this tip means the data that is loaded or presented in the Power BI Desktop file. If you have a data warehouse, the source data is the data from the data warehouse. These considerations are by no means an exhaustive list, but they can serve as a checklist for when you're creating your own Power BI model.

To Data Warehouse or To Not Data Warehouse?

When starting a data analytics project, the question often arises: do we need a data warehouse? A data warehouse can be complex, time-consuming and expensive. However, when the project is done right, a data warehouse has numerous advantages: data from multiple sources is integrated into a centralized location, you can combine batch and real-time data streams, complex business calculations can already be done (meaning you won't have to implement them in Power BI (PBI) Desktop et cetera. One of its main advantages is the historization of data. Using slowly changing dimensions, you can build up history of your dimensions members, allowing for much more rich data analysis. This is something that is very hard to even impossible to achieve in PBI Desktop.

For small ad-hoc analysis projects, such as analyzing the data from one text file, you most likely don't need a data warehouse. But if you see a particular data set being used all over the organization, or if multiple people make reports over the same data (sometimes with even different results!), it might be time for a centralized effort to create a data warehouse. This also rings true when modelling the data in PBI Desktop becomes too complex, the measures are too hard to write or it takes too long to process the model.

With or without a data warehouse, another question still stands: do we need a semantic model using Analysis Services? For small to medium data volumes, the answer is probably no. If this surprises you, keep in mind the model in PBI Desktop is in fact an Analysis Services model. PBI Desktop has most of modelling features you expect from SSAS. Only for data volumes too large for a PBI model or when you absolute need an Enterprise feature (such as generating partitions) you can choose between either an on-premises SSAS installation or Azure Analysis Services. Another advantage of using SSAS means you can refresh the data as many times as you want during the day, while in the Power BI service you need Premium to have this many refreshes. If you use Pro, you can only refresh up to 8 times a day.

The following diagram gives an overview of the options for storing your data before it is even loaded to PBI Desktop:

overview of possible data stores

Keep in mind much more complex scenarios are also possible. For example, you can add a data lake before your data warehouse and even the data warehouse itself can be split into multiple layers (staging, enterprise data warehouse using data vault for example and smaller data marts as a presentation layer). Discussing all possible options is out of scope for this tip.

Having a centralized location for your Power BI source data means you have less configuration time in Power BI regarding data source credentials and gateways: you only have to configure one. If you directly import all of your different sources into PBI Desktop, this can be more of a challenge. If only one of the sources is inaccessible, it's possible your data set won't refresh in the Power BI service.

Loading Source Data

As described in the tip Power BI Datasets, there are three main ways to get data into PBI Desktop. Let's take a look at each.

Live Connection

Using a Live Connection to an SSAS model (Multidimensional or Tabular). In this case, you have almost no options. The only best practice is: use the actual live connection to the model, where the data stays in SSAS and the results of the query are sent to Power BI. Do not re-import the data into PBI Desktop. The SSAS model is typically made by IT professionals and importing the data into another tool will mean there's an uncurated copy of the data going around in the company.

If you use SSAS Tabular, model the data using a star schema. This means you'll have tables acting as fact tables and other tables as dimensions. Even though this is not a requisite in Tabular, it is certainly recommended since a star schema has the best performance and is the most intuitive for end users.

You can find some SSAS performance tips in this overview.

ssas to power bi

DirectQuery

Using DirectQuery on top of a relational data store. Here the data also stays at the source and only the results of the queries are sent to Power BI. This is option is for example preferred when the volume of the data is too large to be contained in a PBI Desktop file. Again, your options here are fairly limited. Preferably the model is kept as simple as possible, meaning the source data is modelled using a star schema. DirectQuery models on top of normalized databases or data vaults might have query performance issues and are most likely not intuitive for the end user. Optimally the source is optimized for read access, for example by using indexes supporting the most common queries. Optionally you can extend the model with aggregate tables to further improve performance.

direct query to power bi

Import

The most flexible option is the Import. Here you import data from one or more sources into the data model. You can either import the actual source data into PBI Desktop, or you can build a data warehouse first.

import

Best practices regarding the import with Power Query will be handled in an upcoming tip.

Next Steps


Last Updated: 2020-05-28


get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips





Comments For This Article





download





Recommended Reading

Power BI Histogram Example using DAX

Using Power BI with JSON Data Sources and Files

Calculating MTD, QTD, YTD, Running and Cumulative Total in Power BI

Power BI Conditional Formatting for Matrix and Table Visuals

Deploy Reports from Development to Test to Production using the Power BI Deployment Pipelines








get free sql tips
agree to terms


Learn more about SQL Server tools