Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

PowerShell Scripts for Obtaining File Metadata


By:   |   Read Comments   |   Related Tips: More > PowerShell

Attend these FREE MSSQLTips webcasts >> click to register


Problem

We want to obtain and save our files metadata for tracking files with data that we import. We've seen errors with failed file archives and faced importing multiple files, when the import should have only been one file. We've also seen issues where a file was dropped later than normal, which affected our end users. In other cases, we have an import that executes too early, as the file didn't have a confirmation line at the end of it. With PowerShell, what are some techniques for saving, retaining and tracking these data?

Solution

In this tip, we'll look at a few pieces of metadata that we can obtain from files with PowerShell: date information, size information, and count information. Using these techniques will make it easier for tracking other metadata if necessary for additional purposes.

With PowerShell, we can obtain some basic metadata for files, such as the creation time, last modified time, and size information. Later in this tip, we'll also look at extracting some data from within files, as some ETL files will "write" a last line indicating the file is complete. In the below code sample, we return some of this basic metadata information and we obtain the time values in UTC as well:

$file = Get-Item "C:\ETLFiles\Import\1.txt"

$file.CreationTime
$file.LastWriteTime
$file.CreationTimeUtc
$file.LastWriteTimeUtc
$file.IsReadOnly

### Denominator determining what measurement value is returned, such as Kb for Kilobytes
($file.Length)/(1Kb)
($file.Length)/(1Mb)
($file.Length)/(1Gb)

We can use this information to retain and\or filter the appropriate files. For example, if we import a file daily and the file fails to be archived, a filter on creation date could filter the old file so we avoid importing the file again. Another example, the time of the file being created or modified helps us identify when we expect to have data imported, transformed and sent, if we have users dependent on the data.

In situations where we must have data imported by a time, it may help to retain a record of past import create and modified times. If we notice a pattern of later imports, it may have reasons such as data growth and we may want to adjust users' expectations. If we experienced it just once, we may not be concerned. The below code retains these metadata in a log file, though we can also import it to a table if needed.

Function Get-FileData {
    Param(
        [Parameter(Mandatory=$true)][string]$file
        , [Parameter(Mandatory=$true)][string]$savefile
    )
    Process
    {
        $filedata = Get-Item $file
        $save = $filedata.CreationTimeUtc.ToString() + "|" + $filedata.LastWriteTime.ToString() + "|" + ($file.Length)/(1Kb).ToString()
        Add-Content $savefile $save
    }
}

Get-FileData -file "C:\ETLFiles\Import\1.txt" -savefile "C:\ETLFiles\log.txt"

Another problem we may experience is when a write process hasn't finished. In some environments, another file is created, allowing us to validate its existence. In other cases, a final line is written indicating the file is complete. The data indicating completion is within the file, so the below code validates whether this exists and returns true if this is the case.

Function Validate-File {
    Param(
        [Parameter(Mandatory=$true)][string]$file
        , [Parameter(Mandatory=$true)][string]$validtext
    )
    Process
    {
        $readfile = New-Object System.IO.StreamReader($file)
        [bool]$return = $false

        while ($line = $readfile.ReadLine())
        {
            if ($line.ToLower() -like "*$validtext*") 
            { 
                $return = $true
            }
        }
        $readfile.Dispose()
  return $return
    }
}

Validate-File -file "C:\ETLFiles\Import\1.txt" -validtext "finish writing"

Wrapping this around the import function, we could validate the final line, then import the file:

if (Validate-File -file "C:\ETLFiles\Import\1.txt" -validtext "finish writing")
{
    Write-Host "Beginning import ..."
 AutoImportCommaFlatFiles -location "C:\ETLFiles\Import\" -file "1" -extension ".txt" -server "OURSERVER\OURINSTANCE" -database "StagingDatabase" 
}
Next Steps
  • In this tip, we looked at some ways to obtain and save information about a file prior to importing the file.
  • Before importing a file with PowerShell, we can obtain the file's information that allow us to filter whether it's been complete, what time it was created and modified, and it's overall size.
  • To ensure that our data schedule is correct, tracking when files are complete may help, especially when data volume can grow and impact import times.


Last Update:


signup button

next tip button



About the author
MSSQLTips author Tim Smith Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

View all my tips
Related Resources





Post a comment or let the author know this tip helped.

All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

*Name    *Email    Notify for updates 


SQL tips:

*Enter Code refresh code     



Learn more about SQL Server tools