Data Forecasting and Analytics with Power BI Desktop
In the last tip, we learned how to represent summary statistics and measures of distribution on visualizations using dynamic reference lines. Reporting and visualizing data over time is the most applied form of data reporting. The most basic form of predictive analytics starts here, by studying the historical trend of data and forecasting the future trend based on the data. In this tip, we will learn how to forecast data and represent the data visually using Power BI Desktop.
Power BI provides options to forecast data in the analytics pane for line charts. First, we need data to populate a visualization on which we would create reference lines. I recommend installing the AdventureWorks sample database, which has a number of tables to populate visualizations with sample data. Assuming the sample data is available, open the Power BI Desktop and click on the Get Data menu. We need to connect to the data source where you have the sample data and import the data. In this tip, we will be using the FactInternetSales table from the AdventureWorks database by following the steps mentioned below.
Click on the Get Data menu in the Power BI user interface as shown below. This should bring up options to select from a list of different data sources. In this case, the data is hosted in SQL Server in the AdventureWorks database. Therefore, we will select SQL Server as the data source.
After selecting the data source, the next step is to provide connection details / authentication credentials as shown below. Provide the server name and optionally the database name on which the data is hosted. It is recommended to provide the database name as the connecting string to SQL Server since it needs to be a part of the connection. The next option is Data Connectivity mode for which there are two options. Either we can import data into Power BI or the data can be read directly from the data source while the report is being processed. There are certain options in Power BI that require data in a specific format to enable the analytics related features. So consider selecting the Import option for now.
After you select the option, you will be presented with a list of different tables and views hosted in the database to which the connection has been made. Select and load the FactInternetSales table as shown below.
Our intention is to create a chart where data is reported over time. Therefore, we need at least one field to be of date data type and another field has to be a numeric field, which is reported over time. Most of the business data in real life scenarios is reported over time. In addition, there is a very regular requirement to forecast the performance of the business for a defined period in future. This kind of forecasting is generally done by learning from the historical data using statistical algorithms, which are typically data mining algorithms or machine learning algorithms, deriving the probable results and projecting the results in an explicit manner with a graphical visualization.
After the data is available, click on the line chart from the visualizations pane, and click on the report layout to create a line chart. We want to create a visualization where the X-axis represents Order Date field and the Y-axis represents Sum of Sales Amount. Select the chart and then the respective fields, which should create a visualization as shown below.
Once you click on the Analytics pane, you should be able to find options to add a Trend line as well as Forecast. Our intention is to display the forecasted data on the visualization. Click on the Forecast option, and click on Add. The first option requires specifying the length of the forecast. Here we have selected 1 year. The next option allows us to ignore a specific range of data used as an input by the forecasting algorithm. The data fed to the forecasting algorithm is used to train the model, and based on this the forecasted value is projected. Let's say we want to ignore the last 2 years of data, which means we are feeding only the first year's worth of data to the model, and asking the model to predict the next year’s results. The confidence interval is a probability statistic that defines the range of area having 95% probability of finding the results. A grey range displays this area. Click on the apply button and you should be able to see the forecasted value as shown below.
If you analyze the grey area, you will find that the algorithm successfully forecasted the data within the 95% confidence of the interval range. You can see how easy this was to do without any coding and just selecting a few options.
Now let's try to ignore just 1 year of data, thereby providing 2 years’ worth of training data to the forecasting algorithm. The output looks as shown below. This time the forecast was not successful based on historical data. If you analyze the trend, it looks like sales made a near linear progress in the last 1 year compared to historical progress. The key take-away from this is that forecasting is not always precise. It depends on the kind of data fed to the forecasting option and the trend.
The option to ignore a certain duration of data is very helpful in testing and training the forecasting model. If you do not ignore any data and try to forecast the next 1 year, the forecast looks as shown below.
Again the key take-away here is that forecasting depends on the nature of data fed to the forecasting option / algorithm. In this way we can use the forecasting option effortlessly and display the output of the forecast analysis on the visualization.
In the next part of the Power BI Analytics tip series, we will learn how to analyze trends using Power BI Desktop.
- Consider using some time series based data to test forecasting capabilities in Power BI Desktop.
- Check out these previous tips:
About the author
View all my tips