Monitoring SQL Server Transactional Replication Latency via PowerShell and RMO

By:   |   Comments (10)   |   Related: > Replication


Problem

I have many SQL Server transactional replication sessions to monitor, for some implementations, we have one strict requirement, i.e. the latency should be less than 90 seconds, otherwise, there may be timeout errors inside front-end application. Is there a good way to monitor the health of our SQL Server replication implementation?

Solution

There are many ways to monitor replication health, such as checking dbo.MSRepl_Errors table in the distribution database, check various replication SQL Server Agent Jobs (distribution jobs, log reader job, etc.), using various replication stored procedures, such as sp_replcounter, sp_repltrans and others, etc. Actually if you Google "Monitoring SQL Server Transactional Replication", you will see numerous links.

For replication latency monitoring, the best way is probably via a tracer token.

In this tip, I will use Replication Management Object (RMO) and PowerShell to send trace tokens on the publisher and then measure the latency of the token as it is sent from publisher to distributor and then from the distributor to the subscriber. All the latency data can be saved to a table for further analysis and processing. The latency value can be a good indicator for replication health.

There are two advantages with this RMO/PowerShell approach:

  1. High scalability: we can monitor multiple replication systems easily and quickly
  2. Low configuration requirement: only a list of distributor names is needed, so the solution is efficient with minimum administrative overhead

Solution Explanation

  1. With distributor names, loop through each distributor and enumerate the publishers that use the distributor
  2. For each publisher, check whether it has subscribers, if it has subscribers, enumerate the publication databases on this publisher
  3. For each publication database which has publication(s), enumerate its publications
  4. For each publication, post a tracer token
  5. Wait for ($threshold) seconds as defined in $threshold input parameter
  6. Same steps as 1, 2, 3
  7. For each publication, enumerate its tracer tokens
  8. For each tracer token, enumerate its history
  9. Log the history data into a datatable, and return the datatable
#version=11.0.0.0 means sql server 2012 client sdk installed
add-type -AssemblyName "Microsoft.SqlServer.Rmo, version=11.0.0.0, Culture=Neutral, PublicKeyToken=89845dcd8080cc91"
add-type -AssemblyName "Microsoft.SqlServer.Replication, version=11.0.0.0, Culture=Neutral, PublicKeyToken=89845dcd8080cc91";
add-type -AssemblyName "Microsoft.SqlServer.ConnectionInfo, version=11.0.0.0, Culture=Neutral, PublicKeyToken=89845dcd8080cc91";

#Import-Module sqlps -DisableNameChecking;
#Set-Location c:
<#
.SYNOPSIS
Test the transactional replication latency. Supports SQL Server versions 2005 and above.
Assume the distribution agent is running continuously.

.PARAMETER Distributors
One or more distributor sql instance names. Default value is the current server name where the cmdlet starts

.PARAMETER Threshold
The wait time in seconds, after tracer tokens are posted, but before the cmdlet tries to check whether the tracer token has arrived on subscriber
The default value is 15 (seconds)

.EXAMPLE
Test-ReplicationLatency -Distributor server1, server2 -threshold 10

Check the transactional replication latency on all publications of the publishers whose distributors are server1 or server2

.NOTES 
Author: 		Jeffrey Yao
Requires: 		PowerShell Version 3.0, SQL Server SMO, 


.OUTPUTS
A DataTable that contains distributor, publisher, subscriber and latency information

#>
Set-StrictMode -Version 3;

Function Test-ReplicationLatency
{   [CmdletBinding()]
    param (
        [Parameter(Mandatory=$false, ValueFromPipeLine=$true, Position=0)]
        [string[]]$Distributors=$env:ComputerName,

        [Parameter(Mandatory=$false, Position=1)] [int]$Threshold=15  #wait for $Threshold seconds, default 15 seconds, before retrieving the tracer token status
    )

    [string[]]$distributor_list= $Distributors;


    $dt = New-Object "system.data.datatable";
    $col = new-object "system.data.DataColumn" ('TokenID', [System.String]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('PublisherCommitTime', [System.DateTime]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Distributor_Latency', [System.Int16]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Subscriber', [System.String]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Subscriber_DB', [System.String]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Subscriber_Latency', [System.Int16])
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Overall_Latency', [System.Int16])
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Publisher', [System.String]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Publisher_DB', [System.String]);
    $dt.Columns.Add($col);

    $col = new-object "system.data.DataColumn" ('Publication', [System.String]);
    $dt.Columns.Add($col);


    $col = new-object "system.data.DataColumn" ('Distributor', [System.String]);
    $dt.Columns.Add($col);


    $col = new-object "system.data.DataColumn" ('Threshold', [System.Int16]);
    $dt.Columns.Add($col);


    foreach ($dist in $distributor_list)
    { try 
        {
            #[string]$distributor = $dist; #'VHOL1415960\sql2014' #'CGYSQLP22'  #'cgysqlt06','trlsqlp90'
            
            $distsvr = new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dist
            foreach ($dp in $distsvr.DistributionPublishers)
            {# each publisher on the distributor
                $pubsvr= new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dp.name;
                if (($pubsvr.EnumRegisteredSubscribers()).count -gt 0) # the publisher has subscribers, so we will send tracer token
                {
                    foreach ($db in ($pubsvr.ReplicationDatabases | where {$_.HasPublications}))
                    { #each Publication Database, post token
                        $db.TransPublications | 
                        % { #post tracer toke on each publication 
                            $_.PostTracerToken() | Out-Null;

                          } #post tracer toke on each publication 
                    }#each Publication Database, post token
                 }
              } 
        }
        catch { $Error[0].Exception.Message; }
    }
  
    Start-Sleep -Seconds $threshold; #wait for $threshold seconds

    foreach ($dist in $distributor_list)
    { try 
        {
           #$dist;

            [boolean]$del_token = $true;
            [int]$TokenID =0;
            [datetime]$PubCommitTime=Get-Date;

            $distsvr = new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dist
            foreach ($dp in $distsvr.DistributionPublishers)
            {# each publisher on the distributor
                $pubsvr= new-object "Microsoft.SqlServer.Replication.ReplicationServer" $dp.name;
                if (($pubsvr.EnumRegisteredSubscribers()).count -gt 0) # the publisher has subscribers, so we have sent tracer token in last loop
                {
                    foreach ($db in ($pubsvr.ReplicationDatabases | where {$_.HasPublications -eq $true}))
                    { #each Publication Database, collect token
                        $db.TransPublications | 
                        % { #collect token history of each publication
                            $pub = $_;
                            $PubMon=new-object "microsoft.sqlserver.replication.PublicationMonitor"($pub.name, $pubsvr.DistributionDatabase, $pubsvr.name, $db.name, $dist);
                            if ($PubMon.LoadProperties()) #LoadProperties() should return true, meaning $PubMon is correctly created
                            {
                                $PubMon.EnumTracerTokens() | 
                                % {
                                    $TokenID = $_.TracerTokenID;
                                    $PubCommitTime = $_.PublisherCommitTime;
                                    $PubMon.EnumTracerTokenHistory($TokenID).Tables[0] |
                                    % { 

                                        $r = $dt.NewRow();
                                        $r.TokenID = $TokenID;
                                        $r.PublisherCommitTime = $PubCommitTime;

                                        $r.Distributor_Latency = $_.distributor_latency;
                                        $r.Subscriber = $_.subscriber;
                                        $r.Subscriber_DB = $_.Subscriber_db;
                                        $r.Subscriber_Latency = $_.subscriber_latency;
                                        $r.Overall_Latency = $_.overall_latency;

                                        $r.Publisher = $pubsvr.name;
                                        $r.Publication = $pub.name;
                                        $r.Publisher_DB = $db.name;
                                        $r.Distributor = $dist;
                                        $r.Threshold = $Threshold;
                                        $dt.rows.add($r);    
                                        ## when we have multiple subscribers, we want to make sure 
                                        ## even if there is one subscriber that has not received the tracer token, we will not delete this tracer
                                        if ($r.Overall_Latency -is 'DBNull')
                                        { $del_token = $false;}
                            
                                       }
                       
                                    #clean up the tracer token after it arrives at the subscriber side already
                                    if ($del_token) { $PubMon.CleanUpTracerTokenHistory($TokenID);}
                                    else { #reset $del_token to true
                                            $del_token = $true; 
                                         } 
                                  }
                      
                            } #PubMon.LoadProperties() -eq $true;

                          } #collect token history of each publication

                    } #each Publication Database , collect token
                }
            } # each publisher on the distributor
        }#try 
        catch { $error[0].Exception.Message;}
    }
    Write-Output $dt -NoEnumerate;
}# Test-ReplicationLatency

The following is an example I run on my laptop, where I have two SQL Server instances, [tp_w520] serves as both a distributor and publisher and [tp_w520\SQL2K8R2] serves as a subscriber.

Latency example

We can actually set up a SQL Server Agent Job or a Windows Task to run every [X] minutes and then save the results to a database table, and based on this table, we can setup various alerts for latency trend analysis.

The following is an example to save the results from the previous Test-ReplicationLatency cmdlet into a SQL Server table

We first need to create a table:

use MSSQLTips

if object_id('dbo.tblReplicationLatency', 'U') is not null
	drop table dbo.tblReplicationLatency;
create table dbo.tblReplicationLatency (id int identity primary key
, Publisher varchar(100)
, Publisher_DB varchar(100)
, Publication varchar(100)
, Subscriber varchar(100)
, Subscriber_DB varchar(100)
, Distributor_Latency int
, Subscriber_Latency int
, Overall_Latency int
, Distributor varchar(100)
, LogDate datetime default current_timestamp);
go

We then need a PowerShell function to write data into the table:


#This Save-DataTable is used to write a datatable into a sql server table
function Save-DataTable {
    [cmdletbinding()]
    param ( [parameter(Mandatory=$true)]
        [string] $SQLInstance,

        [parameter (Mandatory=$true, ValueFromPipeline=$true)] $SourceDataTable,

        [parameter (mandatory=$true)] [string] $DestinationDB,   

        [parameter (mandatory=$true)]
        [string] $DestinationTable,  # can be two-part naming convention, i.e. [schema_name].[table_name]

        [hashtable] $ColumnMapping=@{}
    )
    try {
            $conn = New-Object System.Data.SqlClient.SqlConnection ("Server=$SQLInstance; Database=$DestinationDB; trusted_connection=TRUE");
            $conn.Open();
            $bulkcopy = New-Object System.Data.SqlClient.SqlBulkCopy($conn);
 
            $bulkcopy.DestinationTableName=$DestinationTable; #you may change to your own table

            if ($ColumnMapping.count -gt 0) {
                $ColumnMapping.keys | % {$bc_mapping = new-object System.Data.SqlClient.SqlBulkCopyColumnMapping($_, $ColumnMapping[$_]); $bulkcopy.ColumnMappings.Add($bc_mapping); } | Out-Null;
            }# mapping columns needed

            $bulkcopy.WriteToServer($SourceDataTable);
    }
    catch {
        Write-Error $_;
    }
    finally {
    	$bulkcopy.close();
        $conn.Close();
    }
    return;
} # Save-DataTable

In PowerShell IDE, we run the following script

[hashtable]$mapping=@{ 'Publisher'='Publisher'; 'Publisher_DB'='Publisher_DB'; 'Publication'='Publication';
                       'Subscriber'='Subscriber'; 'Subscriber_DB'='Subscriber_DB'; 
                       'Distributor_Latency'='Distributor_Latency'; 'Subscriber_Latency'='Subscriber_Latency';
                       'Overall_Latency'='Overall_Latency'; 'Distributor'='Distributor'};
                     
Test-ReplicationLatency -Distributors tp_w520 -Threshold 15 |
Save-DataTable -SQLInstance tp_w520 -DestinationDB MSSQLTips -DestinationTable 'dbo.tblReplicationLatency' -ColumnMapping $mapping;

After a few runs, we can check the table and see something like the following:

Table Result

Summary

This tip shows how to use PowerShell and RMO together to check the transactional replication health via tracer tokens.  With some additional coding, we can collect the result into a table, and then use the table to setup alerts if we find:

  • The latency is beyond a defined threshold
  • Analyze the latency trend at different time windows
  • Proactively check the publication database's transaction log size when long distributor latency is recorded because a large transaction log may mean a long read time for the log reader
  • Check network traffic load between the distributor and subscriber when long subscriber latency is detected
Next Steps

Replication monitoring can be done in multiple ways, but using RMO and PowerShell seems a better way in terms of scalability and efficiency. You may read other great tips about replication latency:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jeffrey Yao Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Friday, April 8, 2016 - 12:11:00 PM - jeff_yao Back To Top (41168)

Thanks Aaron for the feedback, really appreciate it.


Friday, April 8, 2016 - 9:56:56 AM - Aaron Back To Top (41167)

Thanks, Jeffrey. That was the issue and it works now!


Thursday, April 7, 2016 - 4:12:47 PM - jeff_yao Back To Top (41159)

Thanks @Greg, it looks good.


Thursday, April 7, 2016 - 3:36:56 PM - Greg Robidoux Back To Top (41157)

Jeffrey, I updated the code based on your comment below.  Please take a quick look to make sure everything is correct.

Thanks
Greg


Thursday, April 7, 2016 - 1:49:53 PM - jeff_yao Back To Top (41156)

@Aaron, thanks for reading the tip and trying the script. After another check, I believe it is my bad, I somehow missed a column [Distributor] in the destination table, please recreate the table as follows:

if object_id('dbo.tblReplicationLatency', 'U') is not null
	drop table dbo.tblReplicationLatency;
create table dbo.tblReplicationLatency (id int identity primary key
, Publisher varchar(100)
, Publisher_DB varchar(100)
, Publication varchar(100)
, Subscriber varchar(100)
, Subscriber_DB varchar(100)
, Distributor_Latency int
, Subscriber_Latency int
, Overall_Latency int
, Distributor varchar(100) , LogDate datetime default current_timestamp); go

My sincere apologies. I must load the wrong script.
Please let me know how it works.


Thursday, April 7, 2016 - 10:06:49 AM - Aaron Back To Top (41155)

I tried using your code but I'm getting the following error when I try to save to the database table:

Save-DataTable : Exception calling "WriteToServer" with "1" argument(s): "The given ColumnMapping does not match up with any column in the source or destination."
At line:6 char:64
+ Test-ReplicationLatency -Distributors ETL01 -Threshold 15 | Save-DataTable -S ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [Write-Error], WriteErrorException
    + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Save-DataTable

 

The table was created using your code I just put it in an existing admin database instead of creating a new MSSQLTips database.

Any ideas?


Thursday, October 15, 2015 - 2:04:01 PM - jeff_yao Back To Top (38904)

@Eric, unfortunately, there is no merge replication (MR) setup in my environment so I have never developed any monitoring mechanism for MR. On the other hand, I beleive tracer token is only valid in the context of TRANSACTIONAL replication instead of MR. (Please see this https://msdn.microsoft.com/en-us/library/ms188024.aspx)

 


Thursday, October 15, 2015 - 4:07:49 AM - Eric Back To Top (38899)

Hi,

very good script for transactional replication, but do you have same things for merge replication ?

 

regards,

Eric

 


Wednesday, May 27, 2015 - 2:55:13 PM - jeff_yao Back To Top (37296)

Thanks for your comment @Clayton Hoyt.

Yes, we can process token in whatever way that justifies our business requirement. I just want to clarify (for other readers' sake), the tracer token itself will not be lost (unless you purposely clean it out), it will eventually arrive at the subscriber side. So in my case, my business requirement is to log the replication performance, so all tokens will counted for latency info.


Wednesday, May 27, 2015 - 11:36:16 AM - Clayton Hoyt Back To Top (37287)

Excellent article Jeffery.

Depending on how often you run this process, if latency is high, you can easily go over the duration the token takes to finish it's journey. I run this process every 5 minutes but if the token has not finished by 4.5 minutes, I record the value as 9999, indicating that the token was effectively lost.















get free sql tips
agree to terms