Some thoughts of your typical data shepherd / data plumber / data dance teacher sort of person.
For various reasons which l have now forgotten, l set up transactional replication for some clients. The result of this is l am the caretaker of transactional replication for two of our clients, what l lucky person l am ! T-SQL code used to check Transaction replication Once l got the process up and running (a very, very long and stressful story). At that point l realised that I would have to monitor these processes. Following some goggling this article was found with some TSQL written by SQLSoldier here. This worked for me and l used this to monitor replication by running the script and checking the results manually. Nagios output Our SysAdmin uses a tool called nagios to monitor out IT estate. So they suggested that a script could be written to monitor the replication process and send an alert if anything needed to be looked at. This sounded like an excellent idea, how to do it? The approach that was arrived at, involved using a PowerShell script which would run a SQL query examine the results then respond to Nagios, with the following values
Next we decided to use PowerShell to return the results. Following some goggling we found this page http://www.madeiradata.com/cross-server-replication-health-check-using-powershell/ which runs a SQL script and returned a data set. First challenge was the TSQL script from SQLSoldier was rather long for my first powershell script, l wanted some thing smaller. So l created a stored procedure based on the script, and placed it in my Distribution database on the replicated database server. Doing this had a two reasons, first less TSQL in the PowerShell script, second changing one of the parameters meant it returns different amounts of data. The stored procedure takes the following parameters ;
When considering this error message it was primarily to give some guidance as to what might be happening with the replication process. It is not intended to give any guidance on the underlying reason that is causing the issue. All that is required is that the Nagios process shows that there is something wrong. What ever the reason it requires some form of human intervention. Once an error condition has been detected then the issue will be handed to me to resolve. At least now l do not have to check the process periodically, now l just have to wait for a message from our sysadmin.