SQL Server Integration Services Power Query Source


By:   |   Updated: 2020-02-19   |   Comments (2)   |   Related: More > Integration Services Development

Problem

Some transformation in SQL Server Integration Services (SSIS) are quite complex to configure and understand and some types of transformations are simply not provided out-of-the box in the SSIS data flow, such as finding the previous not empty value in a column. Implementing those with .NET script components is cumbersome and adds extra complexity to the project. On the other hand, Power Query makes everything seem so easy. Is there a way to combine the best of both worlds?

Solution

In 2019, Microsoft released a preview transformation: the Power Query source for SSIS. With this source component, you can execute an M language formula to read data into the SSIS data flow. For those unfamiliar with Power Query, it is the standard method of reading and transforming data in the Power BI ecosystem. You can look at it as a sort of self-service ETL tool. Power Query is very user friendly but at the same time really powerful. The tip Introduction to Power Query for Excel gives you a first impression of the tool. It's available in many different tools: it's the Get Data feature in Excel and Power BI Desktop. It's available in Power BI Dataflows. It's the basics of the wrangling data flow in Azure Data Factory. It's also the Get Data experience in Analysis Services Tabular 2017 and up. And now it's available in SSIS as well.

In this tip, we'll introduce the Power Query source while it is still in preview. This means not every possible action or source from Power BI Desktop might be supported, or the component itself might still have some bugs.

Installation of the Power Query Source

Initially, the Power Query source could only be used in Visual Studio 2019 for development and the Azure-SSIS Integration Runtime for executing a package containing this source. With version 3.3 of the SSIS Projects for Visual Studio 2019 (which is a Visual Studio extension), the Power Query source is available as a separate download. It is available in 32-bit (which you need for Visual Studio) and in 64-bit (which you need to run it on a server).

Once you downloaded the transformation, it's a simple "next-next-finish" set-up to install it on your machine. Keep in mind, if you want to install the 64-bit version, you need to have SQL Server 2019 (or newer) with SSIS installed on the same machine. In the Azure-SSIS IR, the preview component should already be installed so a custom set-up is not necessary.

64-bit installation error

Currently, there's a 2017 and a 2019 version available. In SSIS, you can change the TargetServerVersion property of the project. If you installed the 2019 version for example, this property needs to be set to SQL Server 2019. And the same for 2017.

target server version property

If you installed the 2019 version and the project is set to SQL Server 2017 (the current default), the Power Query Source will not show up in the SSIS Toolbox.

In a package, drag a new data flow to the control flow canvas. From the SSIS Toolbox, drag the Power Query Source to the data flow canvas. The source component can be found in the Common category of the toolbox.

add power query source

The source is now ready to be configured.

unconfigured power query source

Some Examples of Using the Power Query Source

Filling Items Down

The first problem can be quite common when reading Excel files created by users: grouped data is not repeated on each line, but rather only displayed on the first line. An example:

sample data fill down

In SSIS, you'd need to write a script component in .NET that keeps track of the last not empty value and if an empty value occurs, replace it with this last value. In Power Query, it's a lot easier. Unfortunately, the source doesn't have an editor, only a text field where you can paste an M formula. If you're not fluent in M (almost no-one is), a better option is to create your transformations in another tool, such as Power BI Desktop and then copy paste the M code to SSIS.

power query source without editor

Let's open up Power BI Desktop to read the sample Excel file. From the Get Data menu, choose Excel.

excel power bi desktop

Next, locate your file and choose the worksheet containing the data. Choose to transform the data.

power bi navigator

In the editor, set the first line as headers in the Transform menu.

set first row as header

The data should now look like this:

data with headers

In the Transform tab, click on Fill and then on Down, while the Continent column is selected.

fill down

The data now looks like this:

data filled down

Let's now find the M formula that makes all this happen. In the View tab, click on Advanced Editor.

This will show you all the different steps in the M formula language.

M formula

Copy the code and return to Visual Studio. In the Queries pane of the editor of the Power Query source, you can choose between a Single Query, or a query from a variable (for example if you want to dynamically construct the M formula in SSIS):

single query or variable

Let's keep it at Single Query, and paste the M formula in the Single Query Text textbox. The next step is to configure the connection manager. In the Connection Manager pane, click on Detect Data Source.

This will parse the M formula and search for external sources.

1 connection manager found

The wizard will add a new file connection (not an Excel connection as you might expect), but the actual Power Query connection manager still needs to be created:

create new connection manager

In the dialog, verify the data source kind, the path to the file and the authentication settings.

power query connection manager to file

In the Columns pane, the columns Country and Continent will be added, but the external columns are empty.

external columns empty

You need to map these against the output columns, otherwise the SSIS package will fail.

map external columns

In the data flow, add a multicast (as a dummy transformation) and add a data viewer so we can inspect the output of the source.

add data viewer

When we run the package, it's successful, but the output is hidden! This is because the source outputs the text columns as Unicode text stream (DT_NTEXT), which cannot be showed in the data viewer.

text is not shown

To fix this, we need to change the data types of the output columns in the advanced editor of the source. Right-click the source and open the advanced editor. Go to the Input and Output Properties tab and change the data types to Unicode String (DT_WSTR).

change data types in advanced editor

When we re-run the package, the data view now shows the actual output:

success output filldown

Unpivot all other columns

Suppose we have the following data about farms and their animals.

pivot sample data

This kind of data is presented in a user-friendly way, but it's not practical to store it like this in a database table. What if a new animal is introduced? Alter the table to add a new column each time. Ideally, we unpivot the data to three columns: one storing the name of the farm, one storing the type of animal and a final column storing the number of animals for that type. There's an unpivot transformation in SSIS, but the configuration is not straight forward. You could also use T-SQL, but again, the unpivot syntax is not the easiest. The downside of both options is the number of columns is fixed. If a new animal is added, you need to modify the transformation or the SQL statement. But not in Power Query, where we can select the pivot column (Company in our example) and all other columns are automatically normalized.

To get the M formula, open the sample Excel file in Power Desktop and go to the editor. If necessary, set the first row as headers. The data should look like this:

sample data in power bi desktop

Right-click the Company column and choose Unpivot Other Columns.

unpivot other columns

The data is now in the desired format. Rename the middle column to Animal.

unpivoted data

It's important to notice only the Company column is included in the formula. If new animals are added, the transformation will still work which is a huge improvement over the fixed metadata of SSIS.

m formula unpivot

Like in the previous example, go to the advanced editor, copy paste the M formula into the Power Query source and add a new connection manager to the Excel file. Make sure the data types are set correctly and the external columns are mapped to the output columns.

mapped columns unpivot

When we run our package, we can see the unpivot has done its work!

unpivot success
Next Steps


Last Updated: 2020-02-19


get scripts

next tip button



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a BI professional, specializing in the Microsoft BI stack with a particular love for SSIS.

View all my tips





Comments For This Article




Saturday, June 27, 2020 - 10:14:54 PM - Dr. Edward Hanna Back To Top

Great job--and a very important topic! You have been very thorough and clear in your explanation. Thank you for taking the time to do this.


Friday, April 10, 2020 - 8:16:00 AM - Eric Back To Top

Hello

probably my very bad english is the cause

" It is available in 32-bit (which you need for Visual Studio) and in 64-bit (which you need to run it on a server). "

Probably mean that we must install both x86 and x64 versions

The x86 installation solved my problem

Thanks

Eric



download


Recommended Reading

Import UTF-8 Unicode Special Characters with SQL Server Integration Services

SSIS Expression Examples for Dates, String Concatenation, Dynamic File Names and More

SSIS Toolbox is not visible in SQL Server Data Tools

Backwards Compatibility in SQL Server Data Tools for Integration Services

SSIS Variable and Expression Example - Part 1





get free sql tips
agree to terms


Learn more about SQL Server tools