Learn more about SQL Server tools

 
 

Tutorials          DBA          Dev          BI          Career          Categories          Events          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Events

DBA    Dev    BI    Categories

 

Parsing all the files in a directory using PowerShell


By:   |   Read Comments (3)   |   Related Tips: More > PowerShell

Problem

I want to parse all the files in a given directory using PowerShell. I'm looking for a particular bit of information and need to search for it in all the files. Is there an easy way to do this?  Check out this tip to learn more.

Solution

Yes, there is. PowerShell is very powerful because it:

  • Handles everything as objects.
  • Supports regular expressions.
  • Has proper looping structures.

As a result, looping through a list of files to find a particular text string (or several text strings) is easy to do. For instance, consider the output from this previous tip where we audited for the members of the sysadmin role across multiple SQL Servers. This PowerShell script produces a text file per SQL Server instance. This is the perfect scenario for parsing for a particular group, such as BUILTIN\Administrators.

First, let's start by defining a variable containing a path to our directory as well as another variable, an object that is an array list (this is what New-Object System.Collections.ArrayList does), to hold our findings.

$fileDirectory = "c:\scripts\reports";
$parse_results = New-Object System.Collections.ArrayList;

Now we'll need a foreach loop combined with a Get-ChildItem cmdlet call to get a list of all the files in the directory.

# Use a foreach to loop through all the files in a directory.
# This method allows us to easily track the file name so we can report
# our findings by file.
foreach($file in Get-ChildItem $fileDirectory)
{
# Processing code goes here
}

This part so far is pretty straight forward. Now, to be able to parse the files, we will want to use the switch command. The switch command in PowerShell is similar in function to the same command in other languages. You can think of it as being capable of stacking multiple IF statements together.  This is what goes where we have the comment "# Processing code goes here" in the previous block.

# We will need to tell the Switch command exactly where to parse, so we'll put together
# the full file path.
$filePath = $fileDirectory + "\" + $file;
# parse all files using a regular expression
Switch -regex (Get-Content -path $filePath)
{
# send the counter to $null so it doesn't display on screen
'BUILTIN\\Administrators' { $parse_results.add($file.name + " > " + $switch.current `
+ "`r`n") > $null }
}

Note that for each pattern of a regular expression in single quotes followed by the curly braces, we have an evaluation. In this case I'm only looking for one case, when BUILTIN\Administrators is present (the backslash is doubled since we're using a regular expression and the backslash is an escape character). If I was looking for Users, I could add another line to parse this as well.

As to exactly what is being done, when the Switch statement detects a line that matches the condition I've specified, the line contains BUILTIN\Administrators, it's adding another entry to my array list. The entry will be a concatenated string of the file name where the text was detected,  along with the entire line (that's what $switch.current refers to).  Once all this is done, all that's left to do is write out what was captured. That's actually the reason for the "`r`n" added to the end of each string. That puts a carriage return/new line at the end of the string so that it'll output properly.

Our finished script looks like this:

 

$fileDirectory = "c:\scripts\reports";
$parse_results = New-Object System.Collections.ArrayList;

# Use a foreach to loop through all the files in a directory.
# This method allows us to easily track the file name so we can report
# our findings by file.
foreach($file in Get-ChildItem $fileDirectory)
{
# We will need to tell the Switch command exactly where to parse, so we'll put together
# the full file path.
$filePath = $fileDirectory + "\" + $file;
# parse all files using a regular expression
Switch -regex (Get-Content -path $filePath)
{
# send the counter to $null so it doesn't display on screen
'BUILTIN\\Administrators' { $parse_results.add($file.name + " > " + $switch.current `
+ "`r`n") > $null }
}
}

write-host $parse_results;

And the output will look like this for a couple of files that I have in the directory:

localhost,5555_sysadmins.txt >  BUILTIN\Administrators
localhost_sysadmins.txt >  BUILTIN\Administrators

 

Next Steps


Last Update:





About the author
MSSQLTips author K. Brian Kelley K. Brian Kelley is a SQL Server author and columnist focusing primarily on SQL Server security.

View all my tips
Related Resources





More SQL Server Solutions




Post a comment or let the author know this tip helped you.

All comments are reviewed, so stay on subject or we may delete your comment.

*Name    *Email    Notify for updates 

Note: your email address is not published. Required fields are marked with an asterisk (*)


Get free SQL tips:

*Enter Code refresh code     



Thursday, January 23, 2014 - 11:23:51 AM - drew.georgopulos@yahoo.com Back To Top

How could we split a 30 gig Unix text file into N text files split on a line width of 1700 characters?

Thanks


Wednesday, February 06, 2013 - 10:24:18 AM - snorkel Back To Top

You should probably point out that this method is going to consume a large amount of system memory if ran on a large directory, which if you find yourself writing a script to parse files in a directory, chances are you doing it because the directory is huge.

 

foreach($file in Get-ChildItem $fileDirectory) is going to grab the file object for every file in that directory and store it in memory prior to entering the for loop.  And since it returns an entire object containing a large amount of data per file this can build up to several GB of data in a hurry.  
Unfortunately, to my knowledge, there is no native to Powershell way around this.  What I've done in the past is execute a dir of the directory that I am interested in and pipe the results to a txt file . I then loop through that text file in Powershell so that I can do a get-childitem on a single file at a time.

Wednesday, August 22, 2012 - 6:44:06 PM - Laerte Junior Back To Top

Too Much Code. Try :

Select-String -Path 'c:\scripts\reports\*.*' -Pattern 'BUILTIN\\Administrators' -AllMatches


Learn more about SQL Server tools