Using Power Query in Excel for Data Extraction from a SQL Server Database

By:   |   Updated: 2023-05-17   |   Comments   |   Related: More > Microsoft Excel Integration


Problem

Microsoft Excel is one of the most famous self-service BI tools (Ehrenmueller-Jensen, 2020). When using Excel, users may need to use Power Query to extract data from various sources and reshape it as required. Then, the tool can perform data analysis and build reports. This way, business users can discover insights into the data without the help of IT. However, some domain experts may have little IT background and need assistance using Power Query. These experts may want to get their hands dirty and do some exercises showing the data preparation process. By practicing, they become familiar with Power Query. Then, they can independently perform data analysis.

Solution

Learning by doing is essential for developing many skills in the digital age (Bates, 2019). One of the most well-known ancient Chinese proverbs says: "I hear, and I forget. I see, and I remember. I do, and I understand." The learning-by-doing approach is always an effective method for studying a data analysis tool. After people (especially those with little background in IT) practice data analysis techniques, they know what the desired data set should look like, remember how to prepare for such a data set, and understand why they should apply those data preparation steps. Then, they become comfortable giving computer instructions and independently working on data analysis projects.

In this tip, we give practice in the use of Power Query to handle a real business problem. For example, when evaluating sales performance, sales managers want to know detailed information about the salesperson who made the highest monthly sales and how much the top performers contribute to the total monthly sales. For demonstration purposes, the tip uses data from the Microsoft sample database "AdventureWorks2019," a transactional database used by a fictitious multinational manufacturing company called Adventure Works Cycles. The report is created using Microsoft® Excel® for Microsoft 365 MSO (Version 2303 Build 16.0.16227.20202) 32-bit. The final report should look like Figure 1.

Excel data

Figure 1 The top performers and their contributions to the monthly total sales

When generating a report, you should analyze business requirements and discover where to find all the required information. In this exercise, the data in this report are from the three database tables [Person].[Person], [Sales].[SalesOrderHeader], and [HumanResources].[Employee]. Then, you can use Power Query (a built-in feature for Excel 2016 or later for Windows and Microsoft 365) to read data from these tables, perform data transformation, and load data into an Excel spreadsheet. While walking through the process of using Power Query to prepare data, you explore the following techniques:

  1. Extracting data from a Microsoft SQL server database
  2. Choosing and removing columns
  3. Extracting date or time elements from date columns
  4. Data profiling and cleansing
  5. Grouping and summarizing rows
  6. Adding custom columns
  7. Previewing and expanding structured columns
  8. Merging queries
  9. Merging text columns
  10. Creating calculated columns
  11. Reordering columns
  12. Loading data to an Excel workbook

To get started, open Excel and go to the Data tab. The Get & Transform Data section should look like Figure 2. Then, you are ready to manipulate data.

power query in Excel

Figure 2 The Get & Transform Data group of the Data tab in an Excel File

1 – Extracting Data from a Microsoft SQL Server Database

Power Query can easily connect to various data sources (Alexander, 2022). For example, the tool allows you to explore a SQL server database without writing SQL scripts. Furthermore, the processes of connecting different data sources and loading data into the Power Query Editor are similar (Shruti, 2023). The following steps walk you through the basic steps of connecting a SQL server database. Note that other data sources may use different authentication types.

1.1 Select the Get Data Command

The first step in every version of the Power Query tool is to select Get Data in the Get & Transform Data group on the Data tab (Janicijevic, 2021). The pull-down menu appears, as illustrated in Figure 3. Power Query can import data from many data sources. In this exercise, select From Database and then from SQL Server Database.

get data

Figure 3 A wide variety of connectors

1.2 Connect to the SQL Server Database

In the SQL Server database dialog box that appears, enter the SQL server instance name and the database name. Then, expand the Advanced options and make sure the Include relationship columns checkbox is unchecked. The dialog should look like Figure 4.

connect to sql server

Figure 4 Creating a database connection

Next, click the OK button to open another dialog, as shown in Figure 5. In this dialog, specify the database authentication method. Then, click on the Connect button to connect to the database.

connect to sql server

Figure 5 Specifying the authentication method

You may not know this information for establishing a database connection. You can copy these dialogs to database administrators (DBA) and ask them to help fill these dialog boxes.

1.3 Select the Three SQL Server Database Tables

After you connect to the database successfully, the Navigator dialog opens (see Figure 6). First, check the checkbox to select multiple items on the left pane. Then, select the three database tables:

  • [Person].[Person]
  • [Sales].[SalesOrderHeader]
  • [HumanResources].[Employee]
select tables

Figure 6 Selecting the tables in the Navigator dialog

When selecting a database table, you can preview data in the right pane of the dialog. You should choose all data tables required by the report, no more and no less.

1.4 Load Data into the Power Query Editor

Click on the Transform Data button in the Navigator to open the Power Query Editor, where you can perform data transformations. As illustrated in Figure 7, the Power Query Editor interface consists of the following major components (Raviv, 2018):

  1. The Ribbon: provides several ribbon tabs to add transformation steps or perform other actions to complete various tasks. If a button on the Ribbon has a downward pointing arrow underneath it, click that arrow to see other similar commands.
  2. Formula bar: shows the M code of the current transformation step. You can edit the M code in the formula bar.
  3. Queries pane: lists all the queries in the current editor. You can select, edit, or create queries in this pane.
  4. Data Preview pane: displays a preview of the data for your query. You can access many transformation commands from the arrow icons in the column headings or with a right-click on any element in the table.
  5. Query Settings pane: provides a view of the currently selected query with relevant information, such as query name, a chronological list of all the transformation steps applied to the data, and various indicators. You can modify the query here.
preview data in power query editor

Figure 7 Loading data into the Power Query Editor

When you select a query in the Queries pane, the Query Setting pane displays the query’s properties (for example, name). This way, you can edit the Name text box in the Properties section to rename the query. Next, press the Enter key to confirm the rename operation. You use this method to rename the three queries:

  • Person Person -> Person
  • Sales SalesOrderHeader -> SalesOrderHeader
  • HumanResources Employee -> Employee

2 – Choosing and Removing Columns

You loaded all data from the three database tables to the Power Query Editor. However, the report does not need all the information from these tables. Therefore, you can use Choose Columns or Remove Columns commands to specify what columns your table needs to keep and which ones it needs to remove. This section explores three methods to remove unneeded columns. It would help if you kept BusinessEntityID and SalesPersonID columns to merge the three queries later.

2.1 Use Choose Columns Command to Keep Columns

In the Queries pane, select the query Person. Next, select Choose Columns in the Manage Columns group on the Home tab. Then, the Choose Columns dialog box appears and contains all the available columns in the query. You need to select four columns, as shown in Figure 8. When selecting a few columns, you can uncheck the (Select All Columns) checkbox to clear the selection and then choose columns.

choose columns

Figure 8 The Choose Columns dialog

Next, click on the OK button to close the Choose Columns dialog. The query has four columns, as shown in Figure 9. In the Query Setting pane, a new query step (Removed Other Columns) was appended in the Applied Steps box. You can select the x icon next to the step to delete the step. You can also click on the gear icon to reconfigure the step. In addition, the Formula bar displays the M code of the newly generated query step. The power query editor records and converts all your transformations into the M code (MacDougall, 2018).

query columns

Figure 9 Removing columns from the Person query

2.2 Use Remove Columns Command to Remove Columns

In the Queries pane, select the query Employee. Next, press and hold the CTRL key. Then, click on the column headings in the Data Preview pane to choose multiple columns. You use this method to select all columns except for BusinessEntityID, and HireDate columns, as illustrated in Figure 10.

query columns

Figure 10 Selecting multiple columns

Then, on the Home tab, select Remove Columns in the Manage Columns group. The command removes all selected columns. As shown in Figure 11, the query has two columns in the query: BusinessEntityID and HireDate. You can also observe the new query step added in the Query Settings pane and the M formula in the Formula bar. Since the code corresponds to each step, you can try to understand the code and build confidence in learning a computer programming language (Infante, 2020).

power query formula

Figure 11 Removing multiple columns

2.3 Use Remove Other Columns Command to Keep Selected Columns

In the Queries pane, select the query SalesOrderHeader. Next, select multiple columns in the Data Preview pane in this order: SubTotal, SalesPersonID, and OrderDate. Then, select the Remove Columns > Remove Other Columns command on the Home tab, as shown in Figure 12. The command re-arranges the column order according to your selection sequence.

remove columns

Figure 12 The Remove Other Columns command

After selecting the Remove Other Columns command, the SalesOrderHeader query only contains the SubTotal, SalesPersonID, and OrderDate columns, as shown in Figure 13.

query columns

Figure 13 Removing the unselected columns

3 – Extracting Date Elements from Date Columns

Since the report needs to calculate the monthly sales, you must extract years and months elements from the OrderDate column in the SaleOrderHeader query. The Add Column and Transform tabs have a Date button and a down arrow underneath the button. Click that arrow to see many other date commands (for example, the Year and Month commands). There are several ways to obtain date and time elements from date columns. In this exercise, you use the Year and Month commands to pull years and months elements.

3.1 Extract the Years Element from the OrderDate Column to Create a New Column

In the Data Preview pane, click on the OrderDate column heading to select the column. Then, on the Add Column tab, in the From Date & Time group, select the Date > Year > Year command, as shown in Figure 14. After selecting the Year command, the command adds the Year column to the query.

extract year from date

Figure 14 Extracting the years element as a new column

3.2 Transform the Date Column into a Month Column

In the Data Preview pane, click on the OrderDate column heading to select the column. Then, on the Transform tab, in the Date & Time group, select the Date > Month > Month command. The command transforms a Date column to a Month column containing only numeric month values. After renaming the OrderDate column to Month, the query preview should look like Figure 15.

transform date column

Figure 15 Extracting years and months elements from the date column

4 – Performing Data Cleansing

The original data may contain some undesired records, such as incorrect, incomplete, duplicated, and irrelevant (Stedman, 2022). The data cleansing process identifies erroneous data items and corrects them. Unfortunately, the process usually consumes countless hours for experienced data analysts (Microsoft Learning, 2023). In this tip, you use the Column Profile command to discover and investigate data quality issues in the SalesPersonID column and the Remove Empty command to clean data.

4.1 Column Profile

Select the SalesOrderHeader query in the Queries pane. Then, select the SalesPersonID column in the Data Preview pane. Next, select the Column profile command in the Data Preview group on the View tab, as shown in Figure 16.

column profile

Figure 16 Column profiling

By default, the Data Preview pane displays Column statistics and Value distribution based on the top 1000 rows. To change the profile to operate over the entire dataset, select the Column profiling based on the entire data set option in the lower-left corner of the editor, as illustrated in Figure 17. The whole data set has 27659 empty cells in the SalesPersonID column. The rows with empty SalesPersonID represent Internet sales; therefore, they are irrelevant to the report. You should remove them.

column profile

Figure 17 Column profiling based on the entire data set

Column profile is an excellent feature for understanding data and checking data quality. However, it is not recommended to enable this feature all the time because this feature may make the Power Query Editor slower. Therefore, disable the feature by unchecking the Column profile checkbox on the View tab soon after you have done data profiling.

4.2 Filter by Values in a Column

You see an icon with an inverse triangle in a column header. The Sort and Filter menu appears when you select this icon, as shown in Figure 18. With this menu, you select the Remove Empty command to remove all rows with an empty value in this column. So, you use this technique to remove all rows in the SalesOrderHeader query with an empty value of SalesPersonID. The menu indicates that you can define several filters. By and large, you can apply or remove any filters to or from the selected column (ptyx507 et al., 2008).

filter values

Figure 18 The Sort and Filter menu

5 – Grouping and Summarizing Rows

The SalesOrderHeader query contains sales transactions for every salesperson. However, the report wants to find who made the highest monthly sales and how much the top salesperson contributes to the total monthly sales. Therefore, you first need to summarize the customer order at the year-month and salesperson level. Then, you need to summarize every salesperson’s monthly sales volume at the year-month level.

5.1 Use an Aggregate Function to Group by Columns

Select the SalesOrderHeader query in the Queries pane. Then, select SalesPersonID, Month, and Year columns. Next, select Group By in the Table group on the Transform tab. In the Group By dialog that opens, enter MonthlySales as the new column name, select Sum from the Operation dropdown list, and select SubTotal from the Column dropdown list. The dialog should look like Figure 19.

aggregate functions

Figure 19 Grouping by multiple columns

The Group By command calculates the monthly sales for every salesperson. The preview of the SalesOrderHeader query should look like Figure 20.

query data preview

Figure 20 Every salesperson's monthly sales

Next, you select the Month and Year column and then select the Group By command again. Use the same technique in Step 5.1 to create a new column TotalMonthlySales. Next, add a new allRows column using the All Rows operation (Hansberry, 2020). The Group By dialog should look like Figure 21.

group by in power query

Figure 21 Creating two new columns when grouping

Click on the OK button to close the dialog. The Group By command creates two new columns. The allRow column is a Table structured column that stores a table. Each table value contains all the rows grouped by the Year and Month columns from the previous query. You can select the cell to see a preview of the table’s contents at the bottom of the dialog box, as shown in Figure 22. The figure illustrates that a table cell value indicates a many-to-one relationship with the current table.

query data

Figure 22 A preview of the contents of the Table structured cell

5.2 Add a Custom Column to Extract the Top Performer in Every Group

Select the Custom Column in the General group on the Add Column Tab. The Custom Column dialog appears as shown in Figure 23. Use the default name of the new column and enter the following formula in the Custom column formula box:

Table.Max([allRows],"MonthlySales")
create custom column

Figure 23 The Custom Column dialog

The Table.Max() function returns the largest row in a table. Therefore, the formula finds the maximum value for the MonthlySales column of each value in the allRows column. This way, the new column is a Record structured column​​, which indicates a one-to-one table relationship with the current query. You can select a value in the Record structured column to see a preview of the record's contents at the bottom of the dialog box, as shown in Figure 22.

custom column added

Figure 24 A preview of the contents of the Record structured cell

5.3 Expand a Record Structured Column

Next, click the expand icon in the column header of the Record structured column. Then, in the Column Name drop-down list that appears, select the columns you want. You can uncheck the Use original column name as prefix checkbox, as shown in Figure 25, since the column names are unique.

expand a record

Figure 25 Expand a Record structured column

Then, click on the OK button to close the dialog. The Expand command adds two new columns to the SalesOrderHeader query. Remove the allRows column; the preview of the query should look like Figure 26.

data preview

Figure 26 Each row representing the maximum value from one column when grouping

6 – Merging Queries

You created three queries, and each contains some information about salespersons. You can merge data horizontally to enrich the SalesOrderHeader query with columns from the Person and Employee queries. This way, you create a query containing detailed information about the top performers. You can use the Power Query Merge Queries command to join the records from one query to the records in another by matching on a unique identifier. The unique identifier could be one column or a composite of multiple columns. In this exercise, the unique identifier is the BusinessEntityID column. The SalesPersonID column represents the same unique identifier as the BusinessEntityID column.

6.1 Merge the Person Query into SalesOrderHeader Query

In the Queries pane, select the query SalesOrderHeader. Then, on the Home tab, in the Combine group, select the Merge Queries command. In the Merge dialog box that appears, select the SalesPersonID column and select the Person query from the drop-down list. Then, select the BusinessEntityID column in the Person query. Next, select Left Outer (all from first, matching from second) for the Join Kind field. The Merge dialog should look like Figure 27.

merge data in power query

Figure 27 Left join two queries

After clicking the OK button, you can see a new column, Person, in the SalesOrderHeader query, as shown in Figure 28. The new column is a Table structured column.

data preview

Figure 28 Adding a Table structured column

6.2 Expand a Table Structured Column

Click the expand icon in the column header of the Table structured column. Next, select the columns you want in the Column Name drop-down list that opens. Then, uncheck the Use original column name as prefix checkbox, as shown in Figure 29, since the column names are unique.

expand a table column

Figure 29 The Column Names drop-down list

Click on the OK button to close the dialog. The Expand command adds new columns to the SalesOrderHeader query, as illustrated in Figure 30.

preview data

Figure 30 Expanding the Table structured column

6.3 Merge the Employee Query into SalesOrderHeader Query and Expand the Table Structured Column

Use the methods introduced in steps 6.1 and 6.2 to merge the Employee query into the SalesOrderHeader query. Then, remove the SalePersonID column. After you expand the Table structured column, the query preview should look like Figure 31.

preview data

Figure 31 The merged three queries

7 – Merging Text Columns and Creating a Calculated Column

Salesperson names occupy three columns: FirstName, MiddleName, and LastName. The final report only needs the full name of each salesperson. Therefore, you should combine these three columns into one column. You can use Power Query to combine text columns to replace them with one column or create a new one adjacent to your merged columns (Higgins, 2022). Since the report does not need the original columns, you can use the newly merged column to replace the original three columns.

In addition, the report needs to show how much percentage the top performers contribute to the total monthly sales. The ratio can be found by dividing the values in the MonthlySales column by the value in the TotalMonthlySales column. Use the Standard->Divide command in the From Number group on the Add Column tab.

7.1 Merge Multiple Text Columns

In the Data Preview pane, click on the column headings to select the three columns: FirstName, MiddleName, and LastName. Then, on the Transform tab, in the Text Column group, select the Merge Columns command. The Merge Columns dialog appears. As shown in Figure 32, separate the text by space and name the new column SalesPerson.

merge columns

Figure 32 The Merge Columns dialog

Click on the OK button to close the dialog. The query preview should look like Figure 33.

preview data

Figure 33 Merging the FirstName, MiddleName, and LastName columns into the SalesPerson column

7.2 Creating a Calculated Column

In the Data Preview pane, you first select the MonthlySales column, then select the TotalMonthlySales column. Next, you select the Standard > Divide command in the From Number group on the Add Column tab. The command creates a new column called Division. Rename the column to Percentage of Total. Then, delete the TotalMonthlySales column. The preview of the query should look like Figure 34.

preview data

Figure 34 The preview of the SalesOrderHeader query

8 – Reordering Columns

Reordering columns in a query is a typical process when preparing data. You can use the Move command or drag and drop columns to arrange columns. When you have dozens of columns or more, modifying the M function in the formula bar is another option (Acampora, 2021). The Move command provides four options: Left, Right, To Beginning, and To End. In this exercise, you arrange the columns of the query SalesOrderHeader in this order:

  1. Year
  2. Month
  3. SalesPerson
  4. HireDate
  5. MonthlySales
  6. Percentage of Total

8.1 Move the Year Column to the Beginning of the Table

In the Data Preview pane, click on the Year column heading to select the column. Then, on the Transform tab, in the Any Column group, select the Move > To Beginning command, as shown in Figure 35.

move columns

Figure 35 Moving the Year column to the beginning of the table

8.2 Drag the MonthlySales Column and Drop to the Left Side of the Percentage of Total Column

In the Data Preview pane, click and hold the MonthlySales column heading and drag the mouse to the left side of the Percentage of Total column. Then, release the mouse. The preview of the query should look like Figure 36.

data preview

Figure 36 Arranging the columns

9 – Loading Data to Excel Workbook

The Close & Load button in the Close group on the Home tab has two options: Close & Load and Close & Load To, as shown in Figure 37. To load to a worksheet, select Home > Close & Load > Close & Load. If you want to load to a data model or create a connection early, select Home > Close & Load > Close & Load To.

load data to excel

Figure 37 Load a query from the Power Query Editor to an Excel worksheet

If you select Home > Close & Load > Close & Load To command, the Import Data dialog opens, as shown in Figure 38. Since the queries Employee and Person are an intermediate step in the data preparation, you can select the Only Create Connection option. This way, the command creates connections with names the same as the query names. The connection only queries mean that there is no data output to the workbook, but you can still use this query in other queries.

import data

Figure 38 The Excel Import Data dialog

Click the OK button to close the Power Query Editor. Then, select the Queries & Connections command in the Queries & Connections group on the Data tab to show the Queries & Connections pane. Next, right-click the SalesOrderHeader query name in the Queries & Connections pane and select the Load To command from the context menu. The Import Data dialog opens. Select the Table option in the dialog and click on the OK button. After loading the data into a worksheet, you can use your Excel skills to format data and finalize the report, as shown in Figure 39.

final data preview

Figure 39 Load data to a table in your workbook

The report indicates that Shu K Ito made the highest monthly sales (i.e., $382066.95) in August 2011. This salesperson contributed 19% to the total monthly sales. You can also find out who are the top performers in other months. However, the data in May 2011 seems unreasonable because the salesperson joined the company on May 31, 2011. If the data is from an actual database, you should conduct further investigation.

Summary

Power Query is a business intelligence tool that allows you to import data from many sources and clean and transform data as needed. In this tip, you learned how to use Power Query to extract data from a SQL server database and perform data profiling, cleansing, and transformation. The article also explored the methods to load data to the Excel worksheet. According to the learning-by-doing approach, you should follow the steps in this article to create an Excel report. Such practice lets you familiarize yourself with the Power Query editor interface and these commands. Then, you can work on data analysis projects without assistance from IT.

Reference

Acampora, J. (2021). How to Reorder Multiple Columns in Power Query with Drag & Drop. https://www.excelcampus.com/powerquery/reorder-multiple-columns/.

Alexander, M. (2022). Excel Power Pivot & Power Query For Dummies, 2nd Edition. Hoboken, NJ: John Wiley & Sons.

Bates, A. W. (2019). Teaching in a Digital Age: Guidelines for designing teaching and learning. https://opentextbc.ca/teachinginadigitalage/

Ehrenmueller-Jensen, M. (2020). Self-Service AI with Power BI Desktop. New York, NY: Apress.

Hansberry, E. (2020). Return Row Based on Max Value From One Column when Grouping. https://www.ehansalytics.com/blog/2020/7/16/return-row-based-on-max-value-from-one-column-when-grouping.

Higgins, B. (2022). How To Use Power Query to Merge Columns. https://nexacu.com.au/insights-blog/how-to-use-power-query-to-merge-columns/

Infante, K. (2020). Why You Should Learn PowerQuery M as Your First Programming Language? https://medium.com/@kennethjhim/why-you-should-learn-powerquery-m-as-your-first-programming-language-5cb5c1ceb0ae

Janicijevic, A. (2021). Power Query Cookbook. Birmingham, UK: Packt Publishing.

MacDougall, J. (2018). The Complete Guide to Power Query. https://www.howtoexcel.org/the-complete-guide-to-power-query/.

Microsoft Learning. (2023). Automate data cleaning with Power Query. https://learn.microsoft.com/en-us/training/modules/automate-data-cleaning-power-query/.

ptyx507, Klopfenstein, D. & Howell. (2022). Filter by values in a column. https://learn.microsoft.com/en-us/power-query/filter-values.

Raviv, G. (2018). Collect, Combine, and Transform Data Using Power Query in Excel and Power BI, First Edition.

Shruti, M. (2023). Power Query in Excel: A Complete Guide. https://www.simplilearn.com/tutorials/excel-tutorial/power-query-in-excel

Stedman, C. (2022). DEFINITION: data cleansing (data cleaning, data scrubbing). https://www.techtarget.com/searchdatamanagement/definition/data-scrubbing.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Nai Biao Zhou Nai Biao Zhou is a Senior Software Developer with 20+ years of experience in software development, specializing in Data Warehousing, Business Intelligence, Data Mining and solution architecture design.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-05-17

Comments For This Article