Queries For Debugging Bi-Directional SQL Server Merge Replication
In the tip Queries To Debug SQL Server Replication Issues, I showed some queries to help debug replication. In the case of bi-directional merge replication, there are a few additional tables I look at as well, since merge replication has some pieces that differ from transactional and snapshot replication. In my view, merge replication is one of the most reliable tools SQL Server offers, but we still need to make sure that we design both our alerting and table structure to be compatible with what happens behind the scenes. Bi-directional merge replication is ideal for separate application processes where we want eventual data consistency, such as an ETL load on one server, data aggregation on the other server, and a final sync between servers.
First, when DBAs and developers talk about merge replication when it's set to be bidirectional, they will often use the words "Publisher" and "Subscriber", which can incorrectly give a person the impression that the publisher sends data to the subscriber, as the subscriber is subscribed to the publisher. If we carry this analogy to inserting data into the subscriber, we might inaccurately think this wouldn't affect the publisher (wrong); bi-directional merge replication will send the new records to the publisher as well. You can test this by inserting a number of records into the "Subscriber" and watch how those records appear in the "Publisher." While it's not called the publisher, the merge agent is really the entity that determines what data gets sent to what subscriber to it (which could be the publisher, if the subscriber that received new data first). This is a key difference as there is no log reader. Knowing this immediately helps us because we know that a merge agent must track both the publisher and the subscriber (I will sometimes refer to both of these as subscribers because, technically, they are subscribed to the merge agent). There is a synchronization time that the merge agent uses to check if both subscribers are in sync - and sometimes this stops, which would be something we would immediately want to know.
On the distribution database, I can run the below query:
SELECT [comments] MergeText , [time] MergeDate FROM MSmerge_history WHERE [time] BETWEEN DATEADD(MINUTE,-10,GETDATE()) AND GETDATE() ORDER BY [time] DESC
Both the MergeDate and MergeText will help me identify, in the case of the above query, what's happened in the last ten minutes, provided that my sync time is less than ten minutes. Suppose that sync between two servers every sixty seconds, and on server one, I load bitcoin data, which sends that data sixty seconds later to server two. If I update the data sent from server one on server two with Chinese bond and currency information, then those updates will be sent back to server one in sixty seconds. All of this information will be reflected in the merge replication history table, provided that something is not wrong. From an ETL perspective, the brilliance of bidirectional merge replication is that I can have separate processes running on the same data set (helpful in scale), with the data eventually syncing together.
But how would we identify if there's a conflict with data records, since there may be scenarios where servers get out of sync? We can look at the aggregate article history and collect valuable information that will help us identify quickly if there is an issue:
SELECT article_name , start_time , duration , inserts , updates , deletes , conflicts , rows_retried FROM MSmerge_articlehistory
First, we see a conflict column which tells us how many conflicts there's been. If you've never troubleshooted one conflict, that should be zero. This table also tells us valuable information about the count of inserts, updates, and deletes. If we know that our loading process first inserts bitcoin data on server one, then we'll expect to see inserts on server two. If we also know that our process updates Chinese bond and currency information to the same data source on server two, we'll expect to see updates here as well. Notice that the knowledge of what we're doing will help us identify if there may be an issue. In this case, if I saw a delete, I would be checking both the source and destination tables to determine why. Like the previous table, alerts can be configured on something we expect to see; we always want to know when there's been a conflict - this could be a major problem.
When setting up bidirectional merge replication, we should consider the publisher and subscriber roles carefully depending on our data structure, even though both will get the same data over the long run with inserts, updates and deletes. One issue that can arise with merge replication involves failed inserts due to an identity field on the subscriber running outside of the range of available values. The publisher dictates the range of identities, thus an insert into the subscriber may result in too many inserts falling outside the dictated range by the publisher and since the publisher can't issue more, as inserts are happening on the subscriber, we get insert errors. The most appropriate solutions to this problem are:
- Consider using another field outside of identities. Remember that merge replication adds a rowguid field, so you already have a unique field. Additionally, with some data sets that are time-based, like bitcoin, you can always use functions to order data for aggregating.
- If you must use an identity field, ensure that inserts occur on the publisher of the bidirectional merge replication, and not subscriber.
- If you cannot do either, make sure that the identity range in the properties settings of the bi-directional merge replication is large enough to handle three multiples of your largest load.
For monitoring identity ranges, use the below query:
---- Run this query on the appropriate distribution database: SELECT publisher_db , subscriber , time_of_allocation , range_begin , range_end , next_range_begin , next_range_end FROM MSmerge_identity_range_allocations ORDER BY time_of_allocation DESC
In the above query, what is the start and end range? If a subscriber receives inserts that falls outside of the range within a load, you will receive an insert error. The most appropriate design is to make sure that the publisher receives inserts, or broaden the range. In the below query, which we'll run on the publisher, we can see what the publisher range is along with the begin and end range:
SELECT is_pub_range , range_begin , range_end , next_range_begin , next_range_end FROM MSmerge_identity_range
Consider table structure and alternative designs if you have identity fields before adding bi-directional merge replication, as you may lose data from failed inserts.
These queries help track some of the issues with bi-directional merge replication in addition to the error logs and replication monitor. In certain situations related to a mis-sync, unresolved conflicts, suspicious number of insert, update or delete operations, or an identity field going out of range, these can assist to drill down into the problem.
- Like any tool, merge replication can be the "hammer and nail" problem, so consider where it's appropriate and where it's not.
- Test both monitoring and heavy loads before releasing bi-directional merge replication into a production environment. For a test load, I'd suggest at least fifteen multiples of what a normal load will be.
Last Updated: 2016-05-02
About the author
View all my tips