Foreign Keys and Performance in a Microsoft Fabric Warehouse

Problem

In late 2024, I noticed a comment on the Microsoft Learn site stating that foreign keys could improve query performance on tables in a Fabric warehouse. That claim immediately caught my attention. I wanted to answer a simple question: Do relationships help, hurt, or have no effect when added to tables in a Fabric warehouse?

Let’s get more specific—do foreign keys improve query performance when reading data (not loading)? In other words, do they make queries run faster?

Before reading further, take a guess! What do you think will happen? Drop your answer in the comments, or just make a mental note before we dive in.

Solution

In this article, I will explore Microsoft Fabric performance effects of adding relationships to tables in a Fabric warehouse. We’ll start by looking at how to add relationships using T-SQL. Next, we’ll create two tables and load one billion rows. Then, we’ll add some relationships. Finally, using our trusty Query Insight view, we’ll determine what sort of effect they produce.

By the end of this article, I’ll answer the big question: Do foreign key relationships make queries run faster in a Fabric warehouse?

Foreign Keys

Foreign keys have long been a staple in databases, primarily serving as a data integrity tool rather than a performance booster. Like any constraint, they add overhead when inserting, updating, and deleting rows in a table. However, we’re often willing to accept that overhead in exchange for the certainty they provide in maintaining data integrity.

If you’re staring down a database with hundreds of tables and need to purge data, you’ll hope those foreign keys exist.

Foreign Key Performance Boost

In SQL Server, you can experience join elimination using foreign key constraints. Grant Fritchey covered this on his ScaryDBA blog. Also, Bert Wagner recorded a terrific YouTube video titled “What’s Join Elimination?”

Put simply, it’s when SQL skips scanning tables that aren’t needed without affecting the results. For example, if you join Table A to Table B only as a stepping stone to reach Table C, SQL Server may eliminate Table B from scanning if it determines Table B is unnecessary. However, if you need to retrieve any columns from the skipped table, elimination does not occur.

Beyond join elimination, I’m unaware of any further performance boost that foreign keys provide.

Foreign Key Relationships in a Warehouse

One big difference between foreign keys in a Fabric warehouse and a system like SQL Server is that you can’t enforce them when creating one. You need to use the NOT ENFORCED keyword like in the code below:

-- MSQLTips.com
 
ALTER TABLE Orders
   ADD CONSTRAINT FK_Orders_CustomerId 
   FOREIGN KEY (CustomerId) 
   REFERENCES Customers (CustomerId) 
   NOT ENFORCED;
GO

You might then be asking: What are they good for?

The main benefit seems to be helping the default semantic model understand how tables are related. The default semantic model sits between the raw tables in the warehouse and an analytics tool like Power BI. If you define relationships in the warehouse, Power BI automatically detects and applies them when using the semantic model. In other words, defining relationships is kind of like having a built-in ERD or road map. If you’ve worked with Power BI and a massive database, you know it’s sometimes difficult to establish relationships; having those in place is handy.

Warehouse Performance Booster?

As mentioned in the intro, at the end of 2024, a reference in the Microsoft Learn documents indicated that adding a relationship could help improve query performance. As far as I can tell that reference is now gone. However, I made a note to test this claim. Andy Cutler also referenced this claim in his article on ServerlessSQL.com.

Putting Relationships to the Test

To test this hypothesis, I will use my dedicated F2 capacity and create a fact table with one billion rows and a large dimension table with 10 million rows. I doubt we will see performance benefits with smaller tables, but you never know.

The F2 capacity is the smallest one available because I do not like spending money, and we get the same horsepower when running limited experiments like these.

I have tried using the query editor built into the Fabric portal, just like I tried using the newly “retired” Azure Data Studio but believe it or not, the portal’s editor is even worse. The user experience does not even match Synapse Studio. If you want to know why, try using it for 10-15 minutes.

In the words of the Comic Book Guy from The Simpsons: “Worst. Editor. Ever.”

For this reason, I will be using my trusty SQL Server Management Studio (SSMS).

Setting the Stage

Before opening SSMS, I created a fresh warehouse named RelationshipDemo. Microsoft makes it super easy to create new warehouses on the fly, a huge benefit of the SaaS offering.

Now, let’s move over to SSMS. To connect, you’ll need the connection string found in the settings section of the Home ribbon. I’m using the syntax below to create my two tables:

-- MSSQLTips.com
 
DROP TABLE IF EXISTS dbo.FactSales;
GO
CREATE TABLE dbo.FactSales
(
    SalesId INT NOT NULL,
    CustomerId INT,
    ProductId INT NOT NULL,
    StoreId INT,
    SalesPersonId INT,
    SalesDate DATE,
    ShipDate DATE,
    SalesQuantity INT,
    SalesAmount DECIMAL(10, 2)
);
GO
 
DROP TABLE IF EXISTS dbo.DimProducts;
GO
CREATE TABLE dbo.DimProducts
(
    ProductId INT NOT NULL,
    ProductName VARCHAR(260),
    ProductNumber VARCHAR(260),
    ProductCategory VARCHAR(50),
    CreatedDate DATE
);
GO

Then, I’ll use the COPY INTO statement to import data from a blob storage account. For the FactSales table, I’ll import the file 10 times, giving me one billion rows. The COPY INTO statement delivers the best performance when loading data and Microsoft recommends it, especially if you’re not transforming data during the load.

-- MSSQLTips.com
COPY INTO [dbo].[DimProducts] 
FROM 'https://[yoururl].blob.core.windows.net/mssqltipsfiles/DimProducts.txt' 
WITH
(
CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='[I am not telling you mine.]'),
FILE_TYPE = 'CSV',
MAXERRORS = 0,
FIELDTERMINATOR = ',',
FIRSTROW = 2
);
GO
 
COPY INTO [dbo].[FactSales] 
FROM 'https://[yoururl].blob.core.windows.net/mssqltipsfiles/FactSales.txt' 
WITH
(
CREDENTIAL=(IDENTITY= 'Shared Access Signature', SECRET='[I am not telling you mine.]'),
FILE_TYPE = 'CSV',
MAXERRORS = 0,
FIELDTERMINATOR = ',',
FIRSTROW = 2
);
GO

The dimension table, DimProducts, contains product data, including the name, number, and category. With 10 million products, we could rival Walmart.

Testing Without a Foreign Key

For our first test, I’ll run the query below 12 times. The first two runs will be without the label option to get the data into cache and build column statistics. Alternatively, I could create the statistics manually with a FULL SCAN. Adding the label option is helpful because we can easily find it in DMVs or Query Insight views.

-- MSSQLTips.com
 
SELECT COUNT(1) AS ProductsSold,
       p.ProductCategory,
       YEAR(s.SalesDate) AS SalesYear
FROM dbo.FactSales s
    INNER JOIN dbo.DimProducts p
        ON p.ProductId = s.ProductId
GROUP BY p.ProductCategory,
         YEAR(s.SalesDate)
OPTION (LABEL = 'Relationships');
GO

Let’s examine the estimated execution plan of our query and compare it to the one generated after we add the foreign key.

Through testing, I’ve found that data takes about three to five minutes to show up in Query Insights, so we must be patient. I used the query below to pull the performance metrics, focusing on the view queryinsights.exec_requests_history.

-- MSSQLTips.com
 
WITH TopQueries
AS (SELECT TOP 10
        r.total_elapsed_time_ms,
        r.data_scanned_memory_mb,
        r.data_scanned_disk_mb,
        r.data_scanned_remote_storage_mb,
        r.command,
        r.start_time,
        r.label
    FROM queryinsights.exec_requests_history AS r
    WHERE r.label = 'NO Relationships'
          AND r.start_time >= DATEADD(HOUR, -1, GETUTCDATE())
    ORDER BY r.start_time DESC
   )
SELECT t.total_elapsed_time_ms,
       AVG(t.total_elapsed_time_ms) OVER () AS avg_elapsed_time,
       SUM(t.total_elapsed_time_ms) OVER () AS total_elapsed_time,
       t.data_scanned_memory_mb,
       t.data_scanned_disk_mb,
       t.data_scanned_remote_storage_mb,
       t.command,
       t.start_time,
       t.label
FROM TopQueries AS t;
GO

Testing With a Foreign Key

With our first test complete, let’s add the foreign key relationship using an ALTER TABLE statement. Remember, you must also add a primary key to the DimProducts table to create a foreign key for FactSales. For completeness, I will add a primary key to the FactSales table.

-- MSSQLTips.com
 
ALTER TABLE dbo.DimProducts
ADD CONSTRAINT PK_DimProducts_ProductId
    PRIMARY KEY NONCLUSTERED (ProductId)
    NOT ENFORCED;
GO
 
ALTER TABLE dbo.FactSales
ADD CONSTRAINT PK_FactSales_SalesId
    PRIMARY KEY NONCLUSTERED (SalesId)
    NOT ENFORCED;
GO
 
ALTER TABLE dbo.FactSales
ADD CONSTRAINT FK_DimProducts_ProductId
    FOREIGN KEY (ProductId)
    REFERENCES dbo.DimProducts (ProductId)
    NOT ENFORCED;
GO

First, let’s look at the estimated execution plan:

It looks like the exact same plan, and to verify, the plan hashes match.

Like before, I’ll run the query twice and then 10 more times with a different label. After a few minutes, let’s collect our metrics from Query Insights.

Reviewing Results

Going into this test, I assumed we would not see any improvement in query performance after adding the foreign key, but I was hopeful. I based this assumption on the fact that we do not see a performance benefit in SQL Server except when join elimination comes into play. But I would love for someone to prove me wrong. Maybe there is something different about the warehouse query processor.

I have outlined the results for each run in the table below. Based on this data, adding the foreign key had no performance impact. I am sure we could continue testing this with several other designs, but like trying to disprove the existence of Santa Claus or the Easter Bunny, the burden of proof is on the person making the claim.

Query Strategy	Executions	Total Elapsed Time (ms)	Average Elapsed Time (ms)	Data Scanned (MB)
No Relationships	10	105,000	10,500	5418.224
With Relationships	10	108,000	10,800	5418.224

What did you predict at the start? What’s your biggest takeaway? Let me know in the comments below.

Key Points

As illustrated in this article, adding foreign keys does not appear to improve query performance. One way to think about them is as a helpful map showing how your tables are related when building reports or updating your ELT process.
While join elimination exists in SQL Server, it only works in specific use cases. For example, if you are not returning columns in the SELECT list of the eliminated table, it works like magic.
For a Fabric warehouse, you must rely more on your Extract, Load, and Transform (ELT) process to ensure referential integrity. Unless someone uses the warehouse as an OLTP system, you should have better control over what data goes into the tables.

Next Steps

Are you looking to start building a warehouse in Microsoft Fabric? Koen Verbeeck wrote the article, “Using Microsoft Fabric to Build a Data Warehouse,” to help you get started.
Andy Cutler has written several articles about warehouses on his blog, ServerlessSQL.com. If you are interested in learning more, check them out.
Besides the YouTube video mentioned above, Bert Wagner wrote the article, “Join Elimination: When SQL Server Removes Unnecessary Tables,” detailing the performance benefits.

Resources

Community

Subscribe