Retaining File Information for ETL Security Analysis

By:   |   Comments   |   Related: > Security


We've witnessed possible infiltrations against a few of our ETL servers and some of our files may have experienced data additions or loss with suspicious timeframes in the file's metadata. From looking at the timestamps of the files, these appear to be the same, but the actual size has changed.

In another case, the timestamp of the file appears to be earlier than our original backups. We have detected that some of these infiltrations affected the data we imported, but also some of our configuration files seem to have been tampered. Some of our team thinks that the information in the files may have been changed, but the timestamps of the files were updated or changed to match what we've historically seen with the files so that we wouldn't catch a change.

We're now considering retaining this information for security audits, as we may have experienced an internal compromise or an infiltration.


For the context of this tip, we'll be looking at retaining security information as it relates to ETL files, servers and configuration files that function with this architecture. Many of the points in this tip apply to other architecture; infiltrators may have just as much interest in a web server, as they do in an ETL server. With compromises happening on a regular basis, we should always be prepared and consider options we have for auditing changes and tracking these changes over time. When we consider our architecture, keep in mind that infiltrators may be interested in data, algorithms, architecture, or what we're looking for within our data.

We'll look at retaining metadata from files in SQL Server with some considerations on how we should think about security and file's information, if we have a situation where someone may want to edit a file while making it appear that the file has been unchanged. Using PowerShell as an example of editing metadata, we can edit the update the file create and last write time, which can affect filters.

In the below sample code, I demonstrate this by creating 2 new files, editing the create and last write time of one file, and then applying a time-based filter.

if (!(Test-Path "C:\Import\fileconsistent.txt")) { New-Item "C:\Import\fileconsistent.txt" -ItemType File }
if (!(Test-Path "C:\Import\fileinconsistent.txt")) { New-Item "C:\Import\fileinconsistent.txt" -ItemType File }

$samefile = Get-ChildItem "C:\Import\fileconsistent.txt"
$changefile = Get-ChildItem "C:\Import\fileinconsistent.txt"
$changefile.CreationTime = (Get-Date).AddDays(-301)
$changefile.LastWriteTime = (Get-Date).AddDays(-301)
$changefile.LastAccessTime = (Get-Date).AddDays(-301)

### Note that your timestamps will be different because of the date execution

Create and change metadata of files
File metadata in properties

The properties of CreateTime and LastWriteTime allow us to set them, just like we can get the values.

If we apply a filter based on time to get all the files created in the last 3 days, only fileconsistent.txt returns:

$filterimportfiles = Get-ChildItem "C:\Import\" -Filter *.txt

foreach ($filterfile in $filterimportfiles | Where-Object {$_.CreationTime -gt (Get-Date).AddDays(-3)})
Filter files by time

Things to Note

Consider that it is entirely possible for someone to edit a file by removing data and adding similar data with the same size, while re-stamping the time data with the information before the file was edited. A colleague of mine once shared a story where a data vendor updated a file post mistake, but made it look like no update had ever happened, even though the team had already imported the file and had the erroneous data. Since we know this is possible, we can look at ways to mitigate this from being an issue, outside of using third party software which may not be available depending on the company or situation.

What files may be at risk?

For any security scenario, any file could be at risk, as an infiltrator may use an existing file to store information that a corrupt service or application uses, so that a user doesn't identify any new file outside of a service. In other situations, we may be concerned about a set of files, whether due to security or due to disputes - for an example, configuration files may be a concern. Files that are a part of source control, such as source code files, may be less of a concern because we can sync the time of a check-in, check-out or update within source control to the time of the file (a configuration file being changed in source control, then a deployment after the change, but a update time for the configuration file of 387 days before the last deployment).

Evaluate the pros and cons of our existing software for security.

We may use software that tracks this information, such as source control which tells us the history of some of this information for files used in deployments (for environments that use source control). Even with tracking software, it may not raise alerts or concerns if an anomaly is spotted, such as the example above where a configuration file was changed in source control, with a deployment that followed, but an update time for the configuration file of 387 days before the last deployment. If an infiltration occurs by adding keys to a configuration file, which are used for a corrupt service or application even though the keys appear normal, a software tool like source control reflecting an inconsistent time frame for the file won't raise alerts unless we're looking for this. When considering software, we should keep this in mind, as some may allow us to track this information or automate tracking this information.

Tracking metadata information over time with our own custom retention and history.

We'll retain the file metadata by tracking the create, update and access time of the file along with the file's length - which uses the System.IO library and reports in bytes.

I caution people to be careful about "rounding up" to the nearest byte, as the below two images show - I added 13 bytes of data to an existing file, but on the file system, the file is reported as 1 kilobyte. As we see from the PowerShell output, the file's original size was 174 bytes, then 187 bytes. If I had tracked information to the nearest kilobyte, I would have missed this addition. Infiltrators may use existing files to plant configuration information or key-value pair information and changing an existing value may not be an option; they may try to add a small value that doesn't raise concerns.

Before and after data addition

By retaining the time and length information, we're combining both a concern of time information being changed with possible additions or updates to data within the files themselves. How often we run this script to track this information should be determined by how high the risk is of an infiltration. In addition, this provides us with a useful audit with other software to spot inconsistencies. With this information, we can write SQL queries (shown also in the below code) to alert when we find unusual patterns.

In the example T-SQL query below, we look for files that have the same create, update and access time over a period of time, while having a slight shift in the number of bytes. This may be indicative of someone keeping the original time information of a file to remain unspotted while adding data to a file.

### PowerShell: in this example, we only look at the below folder for text files:
$filtiles = Get-ChildItem "C:\import\" -Filter *.txt

foreach ($file in $filtiles)
    $insert = "INSERT INTO tbFlAdt VALUES ('" + $file.FullName + "','" + $file.CreationTime + "','"+ $file.LastWriteTime + "','"+ $file.LastAccessTime + "','"+ $file.Length + "',GETDATE())"
    Invoke-Sqlcmd -server "OurServer" -database "OurDatabase" -query $insert

--T-SQL example table and query

 FileFullName VARCHAR(100),
 CreationTime DATETIME,
 LastWriteTime DATETIME,
 LastAccessTime DATETIME,
 SizeInBytes BIGINT,


 , t.SizeInBytes
FROM tbFlAdt t
 INNER JOIN tbFlAdt tt ON t.FileFullName = tt.FileFullName
  AND t.CreationTime = tt.CreationTime
  AND t.LastWriteTime = tt.LastWriteTime
  AND t.LastAccessTime = tt.LastAccessTime
  AND t.SizeInBytes <> tt.SizeInBytes
Result of query

After changing the file to 55 bytes, saving the data, then changing the file to 61 bytes and adding the data while updating the data to retain the same date and time information (simulating a compromise - changing then updating the date metadata), we can see the results.

 In the above example query, we see that the file's time matches on create, update and access time, while the length of the file does not (fileinconsistent.txt). Sophisticated infiltrators may automate ways to mirror the original metadata of the file, such as keeping the timestamps identical. But during a compromise where an infiltrator accesses a file to write, this action may be caught during an audit.

In other cases, an infiltrator may commit an oversight, which is caught in an audit. For this reason, when we schedule these audits will also be important, if we determine they are necessary for our environment. If we're concerned about data size, we can evaluate some data removal when the data repeat, or when nothing suspicious appears to happen - only keeping when changes occur that are validated such as a creation time of January 1st, a last write time of February 2nd and February 19th.

Next Steps
  • Consider what servers may be at risk along with what information or architecture the infiltrator may want. When you consider this, you may also evaluate changing your design. For an example, with some ETL processes, the infiltrator seeks algorithms, not data.
  • With security, use a combination of techniques and evaluate mixing up their use. Like the game of chess, the most predictable patterns of strategy are usually the least successful, as infiltrators will adapt over time to predictable patterns.

sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip

About the author
MSSQLTips author Tim Smith Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Comments For This Article

get free sql tips
agree to terms