![]() |
|
|
|
By: Kun Lee | Read Comments (8) | Related Tips: More > Replication |
Problem
My company has a very replication intensive environment (mostly transactional replication) and recently I started to see that replicating data was taking longer and longer. In this tip I will show you some of the techniques that I used to troubleshoot the issue as well as what the final resolution was.
Solution
In addition to providing the solution, let me provide the symptoms to my issue and provide how I started troubleshooting the issue.
Basic Symptoms
Here are list of symptoms that I had.
Troubleshooting High CPU and High Disk IO
First, I wanted to know why I was getting high CPU and high Disk IO on the distributor server, so I setup Performance Monitor (also known as perfmon) on the server. You can read this article "Creating SQL Server performance based reports using Excel" on how I setup perfmon to collect data and how I found high CPU/Disk IO for about 7~8min every 10min. That means, we had a job running every 10min with a duration of 7~8min.
I was able to found out there are two jobs that are causing the issue and they were related to replication clean up.
So, I disabled these two jobs and ran them manually to see if that was causing the high CPU/Disk IO and I was able to find out the "Distribution clean up: distribution" job that runs "EXEC dbo.sp_MSdistribution_cleanup @min_distretention = 0, @max_distretention = 48" stored procedure was causing the problem and when I ran it manually, it ran for more than 7min and returned the below message.
Removed 0 replicated transactions consisting of 0 statements in 451 seconds (0 rows/sec). |
Troubleshooting Distribution cleanup issue
From previous research, I was able to find out why we were having high CPU issues and why the distributor database was getting bigger. The issue was that the clean up job was not removing the commands and I was able to see that table "dbo.MSrepl_commands" was 27GB and kept growing everyday.
So, I took a look at the Distributor Properties and I confirmed that retention was set correctly as you can see below.

Fixing Retention days for Each Publication
The next step is to check the retention days for each publication and that is where I found the problem. Many of our publication properties subscription expiration setup was set to "never expire" as you can see from the screen capture below.

Now, I can open each publication to see how it is setup, but as I mentioned we have many publications so I didn't want to waste my time opening each publication using SQL Server Management Studio. So instead, I used the query that I wrote in this tip "SQL Server Replication Scripts to Pull Replication Configuration Information" and added one more column to the script to pull retention days as shown below.
set transaction isolation level read uncommitted select distinct srv.srvname publication_server , a.publisher_db , p.publication publication_name , p.retention , ss.srvname subscription_server , s.subscriber_db from MSArticles a join MSpublications p on a.publication_id = p.publication_id join MSsubscriptions s on p.publication_id = s.publication_id join master..sysservers ss on s.subscriber_id = ss.srvid join master..sysservers srv on srv.srvid = p.publisher_id join MSdistribution_agents da on da.publisher_id = p.publisher_id and da.subscriber_id = s.subscriber_id ORDER BY p.retention |

Confirmation
After I identified and changed the retention days to 120 hours and ran the above job, the distributor database space used was reduced from 27GB to 300MB. After that I shrunk the database to 500MB and then I re-indexed all tables in the distribution database and ran the clean up job again. Now, it takes less 3 sec to run the clean up job. Also, replication as a whole is a lot faster.
Next Steps
| Tuesday, November 17, 2009 - 2:49:06 PM - mictiu | Read The Tip |
|
Hi, Thanks for the excellent tips .. but want to understand something more in details. In quite new to replication but from ur tips here .. i still can't get what ur meaning of your retention day is set to never expire here refer to Subscription expiration in publication properties or Transaction Retention in Distribution properties. Sorry to ask such a question but just to make myself clear :) .... Can this setting help to prevent my replication to become inactive and need to be reinitialized ? Coz im also having problem where replication didn't run for numbers of days like 3 days ... then it become inactive. Please advice and if can notify me through email .. mictiu@mafrica.com.my |
|
| Monday, May 03, 2010 - 4:43:31 PM - drsloat | Read The Tip |
|
That helped me zero in on my problem very quickly. My cleanup job had been running for 15 minutes. Thanks for the write up! |
|
| Thursday, April 12, 2012 - 8:21:25 PM - Daniel | Read The Tip |
|
Kun, Fantasic article. One of our Distribution DBs grew out of control just today. Your solution has helped me greatly to resolve this issue and to learn more about how replication works! Thank you so much!
|
|
| Friday, May 04, 2012 - 11:58:50 AM - sam | Read The Tip |
|
Thanks! Saved the busy day!
|
|
| Sunday, August 26, 2012 - 2:40:41 AM - Vijay | Read The Tip |
|
In replication ,when Backup is SIMPLE RECOVERY MODE. In simple recovry mode. t-log will be treuncate. then,how wll be working replication when, back up is SIMPLE RECOVERY MODE.
Vijay |
|
| Friday, April 26, 2013 - 5:57:16 PM - Salu | Read The Tip |
|
I am not seeing these two jobs on my SQL Server and I have transactional replication set up. Do you know why?
Thanks |
|
| Tuesday, April 30, 2013 - 4:52:29 PM - Kun Lee | Read The Tip |
|
Vijay, To make long story short, even if it is simple mode, the log file won't flush out the log until Log Reader reads the data and mark it. You can easily reproduce the situation (please don't do on production db), you can create a publisher with simple recovery mode and setup replication and stop the log reader and start loading data to the db and watch the open transaction and log space by using - DBCC SQLPERF(LOGSPACE) - use <db name> DBCC OPENTRAN And you will see that. As matter of fact, you can search many issue because of the log reader wasn't mark it soon due to heavy load if you google it..
|
|
| Tuesday, April 30, 2013 - 4:52:44 PM - Kun Lee | Read The Tip |
|
Salu, |
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |