PowerShell Scripts for Obtaining File Metadata


By:   |   Updated: 2017-08-30   |   Comments   |   Related: More > PowerShell


Problem

We want to obtain and save our files metadata for tracking files with data that we import. We've seen errors with failed file archives and faced importing multiple files, when the import should have only been one file. We've also seen issues where a file was dropped later than normal, which affected our end users. In other cases, we have an import that executes too early, as the file didn't have a confirmation line at the end of it. With PowerShell, what are some techniques for saving, retaining and tracking these data?

Solution

In this tip, we'll look at a few pieces of metadata that we can obtain from files with PowerShell: date information, size information, and count information. Using these techniques will make it easier for tracking other metadata if necessary for additional purposes.

With PowerShell, we can obtain some basic metadata for files, such as the creation time, last modified time, and size information. Later in this tip, we'll also look at extracting some data from within files, as some ETL files will "write" a last line indicating the file is complete. In the below code sample, we return some of this basic metadata information and we obtain the time values in UTC as well:

$file = Get-Item "C:\ETLFiles\Import\1.txt"

$file.CreationTime
$file.LastWriteTime
$file.CreationTimeUtc
$file.LastWriteTimeUtc
$file.IsReadOnly

### Denominator determining what measurement value is returned, such as Kb for Kilobytes
($file.Length)/(1Kb)
($file.Length)/(1Mb)
($file.Length)/(1Gb)

We can use this information to retain and\or filter the appropriate files. For example, if we import a file daily and the file fails to be archived, a filter on creation date could filter the old file so we avoid importing the file again. Another example, the time of the file being created or modified helps us identify when we expect to have data imported, transformed and sent, if we have users dependent on the data.

In situations where we must have data imported by a time, it may help to retain a record of past import create and modified times. If we notice a pattern of later imports, it may have reasons such as data growth and we may want to adjust users' expectations. If we experienced it just once, we may not be concerned. The below code retains these metadata in a log file, though we can also import it to a table if needed.

Function Get-FileData {
    Param(
        [Parameter(Mandatory=$true)][string]$file
        , [Parameter(Mandatory=$true)][string]$savefile
    )
    Process
    {
        $filedata = Get-Item $file
        $save = $filedata.CreationTimeUtc.ToString() + "|" + $filedata.LastWriteTime.ToString() + "|" + ($file.Length)/(1Kb).ToString()
        Add-Content $savefile $save
    }
}

Get-FileData -file "C:\ETLFiles\Import\1.txt" -savefile "C:\ETLFiles\log.txt"

Another problem we may experience is when a write process hasn't finished. In some environments, another file is created, allowing us to validate its existence. In other cases, a final line is written indicating the file is complete. The data indicating completion is within the file, so the below code validates whether this exists and returns true if this is the case.

Function Validate-File {
    Param(
        [Parameter(Mandatory=$true)][string]$file
        , [Parameter(Mandatory=$true)][string]$validtext
    )
    Process
    {
        $readfile = New-Object System.IO.StreamReader($file)
        [bool]$return = $false

        while ($line = $readfile.ReadLine())
        {
            if ($line.ToLower() -like "*$validtext*") 
            { 
                $return = $true
            }
        }
        $readfile.Dispose()
  return $return
    }
}

Validate-File -file "C:\ETLFiles\Import\1.txt" -validtext "finish writing"

Wrapping this around the import function, we could validate the final line, then import the file:

if (Validate-File -file "C:\ETLFiles\Import\1.txt" -validtext "finish writing")
{
    Write-Host "Beginning import ..."
 AutoImportCommaFlatFiles -location "C:\ETLFiles\Import\" -file "1" -extension ".txt" -server "OURSERVER\OURINSTANCE" -database "StagingDatabase" 
}
Next Steps
  • In this tip, we looked at some ways to obtain and save information about a file prior to importing the file.
  • Before importing a file with PowerShell, we can obtain the file's information that allow us to filter whether it's been complete, what time it was created and modified, and it's overall size.
  • To ensure that our data schedule is correct, tracking when files are complete may help, especially when data volume can grow and impact import times.


Last Updated: 2017-08-30


get scripts

next tip button



About the author
MSSQLTips author Tim Smith Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

View all my tips
Related Resources





Comments For This Article





download





Recommended Reading

Setting the PowerShell Execution Policy

How to find a specific text string in a SQL Server Stored Procedure, Function, View or Trigger

Using PowerShell to Work with Directories and Files

Bulk Copy Data from Oracle to SQL Server

Execute SQL Server Stored Procedures from PowerShell








get free sql tips
agree to terms


Learn more about SQL Server tools