PostgreSQL Static Data Masking with DataVeil

Problem

During our development cycle, it is common to restore a production database to a development/test environment. This environment is used by developers to create and test application changes. At particular stages, the developers showcase the application to potential customers. Most of the time, these databases are a subset of production data, in order to have some meaningful data for testing. Unfortunately, having sensitive data in a non-production environment without proper protection creates a security risk. This problem is compounded when potential customers access an application containing sensitive data. This is potentially a breach of many privacy laws such as GDPR in Europe. How can we build our applications in development and test environments with production-like data while protecting sensitive data?

Solution

The solution is to use a data masking product able to retain the meaning of the sensitive data. This should be achieved without revealing or allowing reverse engineering of the production data. In these development and test environments, a static data masking solution is needed to protect the data.

DataVeil is a solution that offers multiple advantages over bigger and very complex products. DataVeil is a Static Data Masking tool that stands out for its simplicity and its approach to protecting sensitive information. While many enterprise tools may have massive platforms that require significant infrastructure, DataVeil is designed as a lightweight, high-performance client that interacts directly with your RDBMS.

So, the trick is to have production-like data in test and development databases for a realistic experience and data distribution. This will prevent potential theft of sensitive data in non-production environments. This also helps ensure compliance with legal requirements, such as GDPR in Europe. For example, GDPR clearly states that all sensitive data such as date of birth, medical or any information regarding religion, political or trade unions enrollment must be properly protected and should not be shared. With DataVeil, we can share these environments, even with external consultants, without the risk of exposing confidential or sensitive data.

DataVeil has many interesting features to help us meet compliance requirements and prevent data leakage. First, DataVeil supports multiple RDBMS platforms: PostgreSQL, SQL Server, Oracle and MySQL. As I mentioned above, it is a Static Data Masking product. DataVeil permanently overwrites the sensitive data instead of just encrypting it with TDE (Transparent Data Encryption). Let me put some emphasis on this point: all the data will be overwritten in the development or test environment. Another important feature of DataVeil is that it ensures that all foreign keys in the masked tables are respected. This means the masking cascades the masked values to all referencing tables, preserving referential integrity.

DataVeil Value

Let me point out these key technical DataVeil features that makes it really stand out:

  • Discovery Engine: DataVeil can scan your database schemas to find suspected sensitive data (names, emails, SSN, National ID, credit cards, etc.). It uses pattern matching and checksum validation (like Luhn for credit cards) to minimize false positives.
  • Format-Preserving Masking: This is a major strong point. It doesn’t just replace “John Smith” with “XXXXX.” It replaces it with another realistic name like “David Stirling,” ensuring that your test applications don’t crash due to invalid data formats.
  • Referential Integrity: In every application database, you likely have complex foreign key relationships. DataVeil ensures that if, for example, “User ID 123” is masked to “User ID 999″ in one table, it is also changed to ” User ID 999″ in the other related tables automatically to maintain the proper relationship.
  • Deterministic Masking: This is one of the best features. DataVeil supports both Deterministic and Non-Deterministic masking in a single project. Deterministic masking ensures that “John Smith” always masks to a value such as “David Stirling” across every database and table in your environment by using a shared seed key. Non-Deterministic instead does not provide this consistency across the database. It can be more secure because there is no repeatable relationship between the original and masked data.
  • Projects: DataVeil offers the ability to encapsulate all logic in a single entity and the ability to reuse it. Also, an important point is that the whole project is stored in XML format, with an encrypted seed. The seed is the key for deterministic data masking.
  • Scripting and Automation: In DataVeil we can add SQL scripts in order to perform actions before and after the data masking process. Also, macro scripting is available that can be used for repetitive masking needs for a significant time savings.
  • Custom Masks: In terms of customization and reusability of components, DataVeil also offers custom masks. This enables each organization to create its own personalized masks for its particular types of sensitive data that can be reused in different projects.
  • Reusability: A masking project can be reused on other databases or copies of the same or similar database. For example, the same schema may be used for multiple clients, or a similar schema may be stored on another database platform.

Enough said, let’s jump into a practical example to show how easy it is to set up DataVeil. We will use PostgreSQL for our example, but the steps are the same for other database platforms

DataVeil Discovery Engine

An easy way to set up DataVeil is to let it do the hard work for you. The Discovery feature can be a great help. First, we define a connection to the database that we need to mask using the Database menu and choosing Add Database Connection:

DataVeil Discovery Engine add database connection
DataVeil Discovery Engine add database connection

We setup our connection using the postgres superuser, as we need a login with admin privileges. As always, I am using the chinook test database that is open source and available for most RDBMS platforms.

DataVeil Discovery Engine configure database connection

We test the connection to ensure it is correct, then we click on Get Schema to connect to our database and retrieve the schema.

DataVeil Discovery Engine database objects

At this point, we can click on the Discovery tab.

DataVeil Discovery Engine configure project query

Next, we can click on Configure on the right of the interface to use some of the search patterns for sensitive data. We can also adjust the default parameters based on our needs. For example, I changed the sample limit from 1000 to 2000 rows. Since I know that I have only a few tables I also raised the Minimum Confidence % to 25.

If we notice that a search pattern is missing, we can add some patterns to the actual search query:

DataVeil Discovery Engine search options

If you have specific search needs that are not covered by the built-in search patterns, then you can click on the Custom Patterns tab. There, you can define your own search patterns that will best search your specific data and you can also define a default mask for each search pattern.

DataVeil Discovery Assessment

We are now ready to press the Search button and see what DataVeil finds:

DataVeil Discovery Engine classification of objects
DataVeil Discovery Engine peformance of objects

What we can see from these two DataVeil screenshots is that it was able to automatically discover and classify all columns with sensitive data in our database. Particularly interesting are the Confidence % column and the Data Hit and Data Negative values. These show examples of data that conform to the pattern classification rules and those that do not. Not too bad and all in a matter of seconds! Yes, it’s a small database, but it is very representative of what DataVeil can do!

We can now check the proposed masking in the Masking tab. If there are any false positives, we can remove them before pushing the masks to the project. In fact, DataVeil marked a table with a list of consecutive IDs that I used for previous tests as account numbers. This can be easily resolved by right-clicking on the table and selecting Remove from Result:

DataVeil Discovery masking confidence

Finally, add the masking configuration to the project:

DataVeil Discovery move discovery results to project
DataVeil Discovery move discovery results to project

Let’s now save the project. When we click the Save icon, we need to specify a Project Key to encrypt the seed and password:

DataVeil Discovery create project key

As I said above, the project concept is another strong point of DataVeil. As we have seen, it gives us the ability to encapsulate all logic in a single entity.

DataVeil Project Review and Configuration

We can now review our masking project in the Diagram and Masks Summary tabs and make any changes if needed:

DataVeil Project Review and Configuration

Column Level Masking

We can override project-level masking settings at the column level. For example, if the project default is for Deterministic masking, we can choose to override the LastName column setting to use Non-Deterministic masking:

DataVeil column level masking

Or we can customize the Deterministic setting for the column, if needed:

DataVeil column level masking

As I described above, the deterministic approach ensures that the masking will be consistent across all tables in the database. This same seed can also be used across multiple databases to maintain consistency. In this case Smith will be always masked as Stirling.

Each mask can be adjusted to suit its column’s data. For example, let’s look at the FirstName column:

DataVeil column level masking

This column was identified as a Person Given Name and it was configured with the appropriate mask. We can also modify the parameters, which is very powerful!

But that is not all we can do. We can add SQL to be executed before or after masking each value. We can also specify a Where condition to narrow down the data to the range that we want to mask.

We also have the Components tab where we can create masks, macros and datasets that can be reused in other projects. This extends DataVeil with our own custom features.

DataVeil column level masking

Manually Adding a Mask

I mentioned above that it is possible to manually configure masking for columns. Suppose we need to mask the Composer column in our database. We can manually add it to the project, mark the column as sensitive and use the proper data mask:

DataVeil manually adding a mask
DataVeil manually adding a mask

Run the DataVeil Project

Finally, we are ready to perform the actual data masking in our development or test environment. I just want to repeat that DataVeil is a static data masking tool. Once you run the project, all the data in columns marked as sensitive data with a mask will be permanently overwritten based on the masking logic. DataVeil projects should never be run in a production environment.

Run the DataVeil Project

Notice the Compile button on the top right of the screen? This is an option to verify the project is ready to run. If you are confident the project is set up correctly, we can skip it and go straight to the preview run:

Run the DataVeil Project

After being prompted for the project key, we have this reminder, and we can proceed with a quick preview:

Run the DataVeil Project

We can see a preview of all the masking that will be applied with just a few sample rows. When we are finally ready, we can begin the run. If you need to keep a copy of the original development or test database, please run a database backup before proceeding.

Run the DataVeil Project warning message

Review Masked Data

After the masking completes, we can review the masked data. For example, let’s take a look at the masked values in the Customer table:

DataVeil Review Masked Data

And compare it to the original values:

DataVeil Review Masked Data

I showed the original and masked data above as you would normally expect to view it using a third-party data browser. This has the obvious downside of having to go back and forth to view the corresponding original and masked data separately. Furthermore, you won’t see the masked data until you actually commit the masking to the database.

This is where DataVeil provides another very useful productivity feature: an integrated Data Browser. This has the advantage of immediately showing the original and masked data side-by-side for easy comparison. It is configurable so you can include or omit specific columns in the view, and you can also specify which rows or ranges are displayed. It also lets you review the masking in preview mode, before actually committing the changes to the database.

DataVeil Review Masked Data

So that’s it. We performed our static data masking with DataVeil and we are now protecting our sensitive data in development and test environments.

Reusability

One final item I would like to cover is reusability. DataVeil projects are reusable. The projects are saved as XML and can be run multiple times against different databases. If the database schema is the same, simply modify the connection settings and you can run the project.

Let’s work through an example using the Migrate Masks feature. Right-click the database icon in the project explorer tree and select the Migrate Masks option:

DataVeil reuse project masking settings

At this point, we can define a new database connection for our project. Please note that the database platform can be different as long as the schema is the same or similar. For example, we can reuse our project on a SQL Server or Oracle database. For this example, we will use PostgreSQL version 17 instead of version 18 that I have been using thus far:

DataVeil reuse project masking settings

We are presented with a list of all the sensitive columns and their corresponding columns in the new database:

DataVeil reuse project masking settings

We can choose to keep, change or ignore a column if there are schema differences. Once reviewed, we can press the OK button to deploy the DataVeil masks to the new database as shown below:

DataVeil reuse project masking settings

Masking another database on another version or platform is so easy and powerful!

Summary

DataVeil is a lightweight enterprise-grade static data masking tool designed to secure sensitive information by replacing it with realistic, functional, but fictitious data. Unlike dynamic masking, DataVeil permanently alters the data of your database, making it safe for use in non-production environments or for sharing with third parties.

DataVeil offers a centralized GUI and better performance for massive datasets compared to other PostgreSQL extensions used for data anonymization. Furthermore, DataVeil offers robust auditing and compliance that is critical for GDPR, HIPAA, and PCI-DSS.

In my opinion what really makes DataVeil stand out are:

  • The Deterministic masking approach, yielding consistent masking across projects and deployments
  • A streamlined GUI to build and manage projects across mainstream database platforms
  • Time savings with the Discovery engine to identify sensitive data and suggest suitable masks
  • Project reusability, custom masks and mask reuse on additional platforms
  • Maintaining foreign key relationships and statistical cardinality consistent with the original database

Next Steps

The obvious next step is to download DataVeil and try it. A free Community license is available, so here are all the links to the product site and documentation:

Here are some additional articles about DataVeil for Microsoft SQL Server:

Leave a Reply

Your email address will not be published. Required fields are marked *