Introduction to Azure SQL Database Ledger for Data Tampering Protection
Recently on May 25, 2021, Microsoft announced new functionality – Azure SQL Database Ledger (preview) that includes the power of Blockchain technology in Azure SQL Database for tamper-evidence capabilities in the database. In this tip, we will explore different components, features, and benefits of Azure SQL Database Ledger.
Data integrity is a critical aspect of an organization managing sensitive data such as medical, financial, personal, etc. These databases complete regular audits and require legal compliance (SOX, PCCI, GDPR, etc.). Audits require evidence to proves your data is safe and has not been tampered with by a malicious party. There has been a recent trend of increased digital data breaches, according to UpGuard.com a few recent data breaches include:
- CAM4 data breach of over 10 billion records.
- Yahoo data breach 2017 – 3 billion accounts data
- Aadhaar data breach – Over 1.1 billion data breach
Therefore, Microsoft Azure announced Azure SQL Database Ledger for data protection from any malicious activity by an attacker, including a high privileged user (Database Administrator or System Administrator). It also provides tamper-evidence capabilities so that you can cryptographically attest data to auditors and business parties. It can work as proof that your data has not been tampered with by unauthorized access.
Note: Azure SQL Database Ledger is currently available in the West Central US at the time of writing this article.
Azure SQL Database ledger
The Ledger stores historical data in a history table. The ledger records all changes and retrieves all of them in a chronicle order over time. It can contain both ledger and historical data. Azure manages these data transparently without any application-level changes. You can access the historical data for compliance, audit, forensics, or any other purpose.
The following diagrams give a high-level overview of Azure SQL Database Ledger.
In the Azure ledger database, each transaction is cryptographically hashed using the SHA-256 algorithm. The hash function has an input of the hash value of the transaction along with the previous transaction hash. As shown in the above image, the database ledger links all transaction blocks similar to a blockchain. Azure also uses trusted storage such as Azure Storage immutable blobs or Azure Confidential Ledger for storing the database digests. The latest block hash is known as database digest. Azure stores these digests periodically and represents a database statement at a given point in time. Since Azure stores these digests into separate storage, it prevents tampering from malicious activity. Users also choose to generate the database digest manually and store them in a safe and secure location of their choice. This is very useful in the ledger database to verify the ledger table / data has not been tampered.
Is a collection of system tables for storing the cryptographic hash of processed transactions. Whenever a transaction commits, the database ledger keeps the following information.
- SHA-256 hash of the modified row of ledger table
- Metadata of the transaction – Timestamp and user identity
Azure repeats this process every 30 seconds, and it forms a Merkle tree data structure in a blockchain format, as shown below.
Azure SQL Database Ledger functionality is available in two forms:
- Updatable ledger tables
- Append-only ledger tables
Updatable ledger tables
These tables are the system versioned table that allows users to update and delete operations while maintaining tamper-evidence capability. Once the user connects to Azure SQL Database Ledger and performs an update or delete transaction, Azure stores the previous versions of a row in the secondary history table. The history table has the same schema as the updatable ledger table. Therefore, the updated row is in the ledger table, and the earlier version is in the history table.
Each updatable ledger table must have the following GENERATED ALWAYS columns in a BIGINT data type.
- TRANSACTION_ID START: Transaction ID that created the row version.
- TRANSACTION_ID END: Transaction ID that deleted the row version.
- SEQUENCE_NUMBER START: Sequence number within a transaction that created a row version.
- SEQUENCE_NUMBER END: Sequence number within a transaction that deleted a row version.
Azure ledger database automatically creates a history table for the updateable ledger table. This table stores previous versions (historical value) of the modified or deleted row in the updatable edge table. Azure uses system-generated names for these tables and its naming convention is below:
These history tables with automatic or system-generated names are also known as anonymous history tables. You can also specify schema and history table name to avoid these system-generated names.
The diagram (above) shows that the ledger view joins both the updatable ledge table and the history table. Azure ledger database creates this view automatically for every updatable ledger table. You can use this view to report all data modifications since it has the recent data (ledger table) and historical data (history table). Users can retrieve the transaction timestamp, user identity and operation details (delete or insert statement). For example, we can view chronicled banking transaction history using the ledger view instead of writing your query or view.
- It belongs to the similar schema of the updatable ledge and history table.
- It has the following ALWAYS GENERATED columns of BIGINT data type.
- ledger_transaction_id: Transaction id that created or deleted a row version.
- ledger_sequence_number: Sequence number of an operation inside a transaction.
- ledger_operation_type_id: Value 0 for INSERT and 1 for DELETE operation.
- ledger_operation_type_id: Describes the operation based on the above column.
Append-only ledger tables
The Append-only ledger table is suitable for applications that do only inserts, such as Security and Event Management applications. You cannot perform DELETE or UPDATE transactions even using a high privileged account. These tables do not have any history table because they are non-modified (no update or delete) data. It has a corresponding view to get detailed information about the inserted row into the append-only tables.
It adds two default GENERATED ALWAYS columns of BIGINT data type.
- ledger_start_transaction_id: Transaction ID that created a row version.
- ledger_start_sequence_number: Sequence number of an operation inside a transactions
Azure SQL Database Ledger limitations
- We cannot disable ledger functionality once it is enabled on Azure SQL Database.
- As explained earlier, the updateable ledger table created additional GENERATED ALWAYS columns, and these columns are counted in the maximum number of columns in an Azure SQL Database table.
- It does not support In-Memory Tables with Sparse Columns.
- It does not support XML, SqlVariant, User-defined and FILESTREAM data types.
- You cannot convert an existing tables (not ledger-enabled) over to ledger tables.
- Ledger tables cannot be moved to a different schema or renamed.
- These tables cannot use a FILETABLE or Full-Text Index.
- You cannot drop any columns in the ledger table.
- The ledger table cannot have a non-clustered index (Rowstore) once it has a clustered Columnstore index.
This article gives you a high-level overview of Azure SQL Database Ledger (preview) along with its functionality and components. The ledger is an exciting feature that adds a valuable and essential element in the Azure database for critical and sensitive application data.
- You can go through the Microsoft official announcement on Tech Community for Azure SQL Database Ledger.
- The next tip in this series will deploy an Azure SQL Database Ledger database with updatable and append-only ledger tables. We will also demonstrate how you can audit data in a chronicle way using the ledger view. Stay tuned!
About the author
View all my tips
Article Last Updated: 2021-06-08