2 May 2006 03:20
Alerts cause Smokeping to stop working
Craig Dibble <craig <at> rootdev.com>
2006-05-02 01:20:59 GMT
2006-05-02 01:20:59 GMT
Hi all, I've got two servers running Smokeping - one (Server A) monitoring 127 hosts with fping, the other (Server B) in another city monitoring just 11 hosts. When one device stops responding on Server B it stops logging data for all the hosts it is monitoring, but when the same thing happens on Server A it steps over the failures and carries on. During outages the logs on both servers are filled with messages like this: May 1 10:55:20 mon01 smokeping[9540]: FPing: WARNING: smokeping took 130 seconds to complete 1 round of polling. It should complete polling in 60 seconds. You may have unresponsive devices in your setup. The strange thing is, the configs are identical, apart from the Target definitions, and the fact that Server B has: concurrentprobes = yes set (although we are only using one probe so I'm not sure this is relevant, unless I misunderstood). Server A is running version 1.4, compared to 2.0.4 on Server B. I have seen a few mentions of a similar problem in the list archives, but I haven't found a satisfactory answer. I know I should probably upgrade Server A to a newer version, but obviously am reluctant to do so when it works but the newer version doesn't.(Continue reading)
RSS Feed