Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

Analytics with Power BI Desktop Dynamic Line References


By:   |   Last Updated: 2017-10-02   |   Comments   |   Related Tips: More > Power BI

Problem

In a previous tip, we learned how to represent a pre-calculated fixed statistic on visualizations using a constant line. In today’s time where data grows at faster speeds than ever, a constant line is useful only for parameters that remain constant like targets or for historical data that will not change. Predictive modeling has a considerable dependency on summary statistics to explain the structure of data. Some of the key summary / descriptive statistics are mean, median, standard deviation, variance, and percentiles / quantiles. In this tip, we will learn how to represent dynamic visualizations using Power BI Desktop.

Solution

Power BI dynamic reference lines can be used to create a reference line for a set of visualizations.

First, we need data to populate a visualization on which we to create reference lines. I recommend installing the AdventureWorks sample database, which has a number of tables to populate visualizations with sample data. Assuming the sample data is available, open Power BI Desktop and click on the Get Data menu. We need to connect to the data source where you have the sample data and import the data. In this tip, we will be using the ProspectiveBuyer table from the AdventureWorks database by following the steps mentioned below.

Click on the Get Data menu in the Power BI user interface as shown below. This should bring up options to select from a list of different data sources. In this case, the data is hosted in SQL Server in the AdventureWorks database. Therefore, we will select SQL Server as the data source.

PowerBI Menu to Get Data - Description: Menu

After selecting the data source, the next step is to provide connection details as shown below. Provide the server name and optionally the database name on which the data is hosted. It is recommended to provide the database name as the connecting string to SQL Server needs it as a part of the connection. The next option is Data Connectivity mode for which there are two options. Either we can import data into Power BI or the data can be read directly from the data source while the report is being processed. There are certain Power BI options that require data in a specific format to enable analytics related features, so consider selecting the Import option for now.

SQL Server Database Connection - Description: DB Connection

After you select the data source, you will be presented with a list of different tables and views hosted in the database to which the connection has been made.

Import Data into PowerBI - Description: Import Data

After the data is available, we need to create a clustered bar chart of Yearly Income by Occupation. Click on the clustered bar chart from the visualizations pane and click on the report layout to create a bar chart.

PowerBI Visualization Gallery - Description: Visualization Gallery

Select the Yearly Income and Occupation fields, to add these fields to the chart. After you select the relevant fields, the visualization should look as shown below.

Clustered Bar Chart in PowerBI - Description: Clustered Bar Chart

Now let's say we intend to represent measures of central tendency to analyze and explain the data visually. Mean, median, mode, percentiles are common statistics used to measure. You can learn more about these from the exploratory data analysis section of the Machine Learning Tutorial.

Now that you understood these statistic functions, now it is time to create reference lines for these on the chart.

Click on the analytics pane and you should be able to find options to create a constant line, min line, max line, average line, median line and percentile line. Note, the Area chart, Line chart, Scatter chart, Clustered Column chart, and Clustered Bar chart are the only charts that support full range of dynamic reference lines.

Here is a description of common properties for all these lines.

  1. Measure – This property sets the field for which the statistic is to be calculated. In our case, let's say that we want to add this line for the field that the bars are representing, which is the Yearly Income field.
  2. Color – You can select any color for the line. It is recommended to select a contrasting color compared to the color of the bars on the chart, so that the line is clearly visible on the chart.
  3. Transparency – This property determines how bright the line will appear on the chart. A transparency value of zero means the line is a pure solid color line on the chart. On the other hand, a transparency of 100% would make the line invisible on the chart. You can set the value of the property to 50%, so the line blends partially into the UI without visually obstructing any details, though retaining its significance.
  4. Style – This property can have one of the three values: Dashed, Dotted and Solid. I personally prefer to use the dashed line, which provides a perforated line type visual to show tear-out style of separation between different sections.
  5. Position – This property can have one of the two values – "In-Front" or "Behind". This means whether you want to line to be on the top of the bars of the chart or behind the bars. If you select "Behind", the line would be visible only from the bar separation sections. It is advisable to select the "In-Front" value for this property.
  6. Data Label – This property determines whether you want to display text in the form of a data label for the line. By default, this property is set to off. Once you set the value of this property as "On", other options to format the label like Color, Text, etc. will appear. You can then select appropriate values for these properties of data label to format the text as required.
Average Line Properties in PowerBI - Description: Average Line Median Line Properties in PowerBI - Description: Median Line

Mean is the average of the entire range of values in a field, and median is the exact middle value given all the values of the field in an ascending sorted order. The good part is that we do not have to calculate this, as the reference line has the built-in functionality to calculate this. We just need to specify the field and format options to display the line on the visualization.

There can be only one mean and median for a field. To explain the distribution of data, percentiles / quantiles are a good statistic. We can use a percentile line as shown below. Generally, in descriptive statistics data in analyzed in terms of quantiles and used in visualizations like box-plots. Therefore, we have to create at least three lines to represent 25, 50 and 75 percentiles. The 100 percentile is obvious and does not need an explicit line to represent it. Select the proper formatting options to display the three percentile lines.

Percentile Line 3 Properties in PowerBI - Description: Percentile Line

Once you have added the lines, the visualization should look as shown below. Looking at the below visualization, the following derivations become visually explicit:

  • Mean and median have a difference of approximately 2M, which is a possible indication that the data may have outliers.
  • Yearly income of all occupations falls under 75 percentile and only the Professional occupation category is ranging beyond it. This is a sign that this category of occupation may have outliers.
  • These kind of interpretations are not explicit just by looking at the scales of the visualization. One may though calculate it from the scales, but these dynamic reference lines eliminate the need for such calculations and makes the data analysis easier and faster.
report output

In the next part of the Power BI Analytics tip series, we would learn how to forecast data using Power BI Desktop.

Next Steps


Last Updated: 2017-10-02


next webcast button


next tip button



About the author
MSSQLTips author Siddharth Mehta Siddharth Mehta is an Associate Manager with Accenture in the Avanade Division focusing on Business Intelligence.

View all my tips




Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Email me updates 


Signup for our newsletter
 I agree by submitting my data to receive communications, account updates and/or special offers about SQL Server from MSSQLTips and/or its Sponsors. I have read the privacy statement and understand I may unsubscribe at any time.



    



Learn more about SQL Server tools