Using PowerShell for SQL Server Replication Tracking

By: Tim Smith | Comments | Related: > Replication

Problem

We're trying to identify effective ways to track SQL Server snapshot and transactional replication counts for sources and destinations. We want reports that provide information when there may be problems, while reducing the amount of "wolf cries" in the data set.

Solution

SQL Server replication monitoring can generate too much noise, depending on the amount of articles that are replicated. In some cases, we may expect that we'll find differences between source and destination articles - for instance, a table with four hundred million records may at any given moment have a count difference of at least one percent of the full table. In the case of snapshot replication, we would want to know of miscounts after the snapshot should have been completed, whereas more than likely in the case of transactional replication we want to know if the behavior of any given article is off. Let's look at evaluating a mismatch using the RowCount property provided by management objects:

Function Loop-ReplicationTables {
    Param(
        [ValidateScript({Test-Path $_})][string]$smolibrary
        , [ValidateLength(4,30)][string]$articleserver
        , [ValidateLength(4,30)][string]$articledatabase
        , [ValidateLength(5,50)][string]$article
    )
    Process
    {
        $nl = [Environment]::NewLine
        Add-Type -Path $smolibrary

        $srv = New-Object Microsoft.SqlServer.Management.SMO.Server($articleserver)

        [int]$reptab_rwcnt = $srv.Databases["$articledatabase"].Tables["$article"].RowCount
        
        if ($reptab_rwcnt -ne $null)
        {
            return $reptab_rwcnt
        }
    }
}

$smo = "C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Smo.dll"

Loop-ReplicationTables -smolibrary $smo -articleserver "" -articledatabase "" -article ""

Let's look at an example of this for a source and destination server with the same article name:

$smo = "C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Smo.dll"

[int]$src = Loop-ReplicationTables -smolibrary $smo -articleserver "SourceServer\Instance" -articledatabase "SourceDatabase" -article "tbArticle"
[int]$dest = Loop-ReplicationTables -smolibrary $smo -articleserver "DestinationServer\Instance" -articledatabase "DestinationDatabase" -article "tbArticle"

if ($src -ne $dest)
{
  Write-Warning "Source count doesn't match destination count."
}

At this point, we only have a script that warns us if the counts don't match, by using .NET's property RowCount. This property does not necessarily match the actual row count that you may receive when running SELECT COUNT(*), though after testing it on 100,000+ cases, the statistical difference for alerting varies less than 2% of the time compared to the COUNT(*) method. It also runs much faster, as it pulls the property instead of running against the table, meaning that several hundred articles can be done in less than a minute compared to ten or twenty minutes for the COUNT(*) method. Still, developers should be warned that this won't provide the latest information; .NET's advantage is speed.

This would be useful for snapshot replication where we check the counts after the time required to replicate the articles. What about behavioral tracking of transactional replication? In this latter case, we want to know what the average and standard deviation miscounts we get every time we check the counts (think "tendency"). Initially this may provide us with inaccurate information because we don't have a history, however, as we track these miscounts, we'll be able to catch the outliers easier (three multiplied by the standard deviation, then in addition to the average generally). This can be a huge time saver in environments where we have thousands of articles being replicated transactionally, as we can report what's wrong, instead of a count of every article (the latter being a popular, yet noisy approach).

For this, we'll save the source and destination servers, source and destination objects (database and article names) and the count difference along with the timestamp of that measurement (for rolling historic changes if applicable):

Function LoopSave-TransReplication {
    Param(
        [ValidateLength(4,30)][string]$source_server
        , [ValidateLength(4,30)][string]$source_db
        , [ValidateLength(4,30)][string]$destination_server
        , [ValidateLength(4,30)][string]$destination_db
        , [ValidateLength(4,30)][string]$article          
    )
    Process
    {
        $smo = "C:\Program Files (x86)\Microsoft SQL Server\110\SDK\Assemblies\Microsoft.SqlServer.Smo.dll"

        [int]$src = Loop-ReplicationTables -smolibrary $smo -articleserver $source_server -articledatabase $source_db -article $article
        [int]$dest = Loop-ReplicationTables -smolibrary $smo -articleserver $destination_server -articledatabase $destination_db -article $article
        [int]$smosaveddiff = ($src - $dest)

        $smo_srcob = "$source_server.$article"
        $smo_dstob = "$destination_db.$article"

        $smo_savehistory = "INSERT INTO tbTransRepMismatchHistory (SourceServer,SourceObject,DestinationServer,DestinationObject,CountDifference,MismatchDate) VALUES ('$source_server','$smo_srcob','$destination_server','$smo_dstob',$smosaveddiff,GETDATE())"

        ### Either Invoke-SqlCmd or Execute-Sql to save this information to the table tbTransRepMismatchHistory
    }
}

The above provides an example of how we can use a historic table for behavioral tracking in order to reduce the amount of miscounts we receive on our reports; for instance:

SELECT 
	t.SourceServer
	, t.SourceObject
	, t.DestinationServer
	, t.DestinationObject
	, AVG(t.CountDifference) AverageDiff
	, STDEV(t.CountDifference) StandardDeviationDiff
	, ABS(AVG(t.CountDifference)+(3*STDEV(t.CountDifference))) Outlier
FROM tbTransRepMismatchHistory t
GROUP BY t.SourceServer, t.SourceObject, t.DestinationServer, t.DestinationObject

This helps us derive the outlier for each article replicated; if the absolute value of a given mismatch is greater than the outlier (if we have enough history), more than likely, we're experiencing an issue.

Next Steps

With snapshot replication, timing matters. Getting the source count before the replication and obtaining the destination count after the snapshot replication will help identify possible issues.
Transactional replication that reduces noises on possible mismatches requires some history before being able to accurately identify issues related to miscounts.

About the author

Tim Smith works as a DBA and developer and also teaches Automating ETL on Udemy.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Using PowerShell for SQL Server Replication Tracking

Problem

Solution

Next Steps

About the author

Comments For This Article