What is OneLake in Microsoft Fabric?

By:   |   Updated: 2023-12-15   |   Comments (1)   |   Related: > Microsoft Fabric


Problem

I've been reading some tips about Microsoft Fabric on this website with great interest, and the term "OneLake" was used in some of them. Is this a data lake or something? What is its purpose?

Solution

Microsoft Fabric is a new end-to-end unified analytics platform in the cloud. It can handle different workloads, such as Spark or warehousing with T-SQL. One of its key features is the separation of storage and compute, allowing you to run those different compute workloads on the same sets of data. The driving force behind this is OneLake. When we represent Fabric in a diagram, we can see OneLake is the storage layer for every possible compute in Fabric:

fabric overview

OneLake is a single logical data lake for the entire Fabric platform in your organization. Microsoft commonly compares OneLake with OneDrive: "OneLake is the OneDrive for your data." Just like OneDrive, OneLake is a single place to store all your analytics data. It's a logical lake because, from the user's perspective, it looks like there's only one storage account, but there may be more than one Azure Data Lake storage account involved. However, OneLake acts as an interface between you and the actual storage.

At the time of writing, Microsoft Fabric was in public preview, where features or user interfaces may change.

There are several distinct features associated with OneLake, including:

  • Open
  • OneCopy
  • Shortcuts
  • OneLake File Explorer for Windows
  • Connect Azure Storage Explorer to OneLake

OneLake is Open

All tables (both in the lakehouse and the warehouse) are stored behind the scenes as delta tables (which use a transactional layer above Parquet files). KQL tables are an exception as they use the proprietary format of Azure Data Explorer. But you can expose the tables as delta tables if you want. Both delta and Parquet are open-source, which means the risk of vendor lock-in is reduced. Any tool can be used to get data into Fabric if it writes delta tables that follow the standard. You can use any BI tool that can read from delta tables to do your reporting. Fabric provides you with all the options, but you are not obligated to use them all.

To clarify, like any data lake, you can store files of any format in OneLake. The lakehouse and the warehouse automatically create delta tables whenever you create a table. For example, in a lakehouse, you can see the delta tables at the top of the explorer and the actual files at the bottom:

tables and files in a Fabric lakehouse

Fabric uses a more efficient compression method for the Parquet files called V-ordering, but the resulting Parquet and delta table still follow the standard so that they can be read by other tools.

OneCopy

Since all tables are delta tables, if someone creates a table in a lakehouse and another developer uses the warehouse to create a table, they contribute to the same data lake. A great benefit of this strategy is that each analytical engine can read data from the other.

For example, you can write a T-SQL query in a warehouse that joins a table to a table from a lakehouse using the 4-part naming convention. This eliminates the need to copy data between two services. The following screenshot shows an example of a T-SQL query in a Fabric warehouse reading data from a Lakehouse table:

read data from lakehouse in a warehouse query

It is important to note that a warehouse and a lakehouse can currently read each other's delta tables, but they cannot write data to each other.

Shortcuts

Shortcuts are also a part of the OneCopy paradigm and play a pivotal role. Shortcuts reference data that is stored in other file locations. This can be a file in the same workspace or another Fabric workspace, or it can even be a folder in an external location, such as a blob container in Azure Blob Storage or an S3 bucket in AWS. You can create a shortcut in a Lakehouse or a KQL database.

create a shortcut in lakehouse

At the time of writing, internal and external shortcuts to Azure Data Lake Storage and Amazon S3 are supported.

possible shortcut options

When creating an internal shortcut, you can choose between the available objects in your different workspaces:

create internal shortcut

In the case of a warehouse as the source, you can choose to create a shortcut for one or more tables or even an entire schema:

choose for which tables you want to create a shortcut

In the Lakehouse explorer, you can tell a table (or file/folder) is a shortcut when it has a little link icon:

shortcut icon

Keep in mind there's a difference between creating a shortcut to files and a table. In the previous screenshots, I created a shortcut to a warehouse table from within the files section of a lakehouse. This results in a shortcut to the delta files and not to the "logical" table:

shortcut to a table in the files section

Shortcuts are an easy way to minimize the amount of data that needs to be copied between services. It can also play an important role in the data mesh architecture. If your organization has built data products, you can create shortcuts between the domains to make the data accessible.

OneLake File Explorer for Windows

Like OneDrive, you can view your OneLake files in Windows on your machine. You can download the tool here. After installing the tool and logging in, a new section will be available in your Windows Explorer:

onelake file explorer

It's almost exactly like OneDrive: you can view all the folders and files, but to open them, you need to sync them to your local machine first. In the explorer on the left, you can view all the Fabric workspaces in your tenant, and inside a workspace, you can view the different analytical services along with their folders and tables.

Connect Azure Storage Explorer to OneLake

You can also use the free Azure Storage Explorer tool to connect to your OneLake. In the documentation, you can learn more about connecting to your account. Once connected, you can view your files, folders, and tables (as the individual Parquet files and the delta logs):

onelake in azure storage explorer

Even here, OneLake acts as an interface. When you browse through your OneLake folders, there's no way of telling whether there are multiple storage accounts involved. Also, when you create a clone of a table, it will look like the Parquet files are copied, but in reality, they are just pointers to the original files, as explained in this blog post.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Koen Verbeeck Koen Verbeeck is a seasoned business intelligence consultant at AE. He has over a decade of experience with the Microsoft Data Platform in numerous industries. He holds several certifications and is a prolific writer contributing content about SSIS, ADF, SSAS, SSRS, MDS, Power BI, Snowflake and Azure services. He has spoken at PASS, SQLBits, dataMinds Connect and delivers webinars on MSSQLTips.com. Koen has been awarded the Microsoft MVP data platform award for many years.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-12-15

Comments For This Article




Friday, December 29, 2023 - 12:52:30 PM - David Plaut Back To Top (91819)
Thanks, that's more helpful than about ten Microsoft articles.