Augie Schwer | 2 Apr 2008 02:02
Picon
Gravatar

Re: Monitoring for a hung NFS mount?

Thanks everyone who replied (privately and on the list); attached is
what I finally went with; it works well, doesn't stack procs. for hung
mounts and works great using the snmpvar monitor.

--Augie

On Thu, Mar 27, 2008 at 3:27 PM, Augie Schwer <augie.schwer <at> gmail.com> wrote:
> Anyone have a good way to monitor for a hung NFS mount on a remote machine?
>
>  I've been at it all day trying to come up with a clever way to check
>  the hung mount, not let the monitor get hung and return some useful
>  information; like what mount is hung, but I've come to a dead end and
>  I think the best that can be done is to let the monitor timeout and
>  then sound an alarm based on that timeout.
>
>  Anyone else have ideas?
>
>
>  --
>  Augie Schwer - Augie <at> Schwer.us - http://schwer.us
>  Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072
>

--

-- 
Augie Schwer - Augie <at> Schwer.us - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072
Attachment (nfs_monitor.pl): application/x-perl, 1388 bytes
_______________________________________________
(Continue reading)

Augie Schwer | 2 Apr 2008 19:49
Picon
Gravatar

Re: Monitoring for a hung NFS mount?

I hear Jim's going to release the nfscheck.monitor he wrote into the
mon-contrib tree which is the same basic logic as what I wrote, but
implemented in a far cleaner way.

On the topic of NFS; the next step would be to do a compare between
mtab and fstab and alert if everything you thought was mounted
actually wasn't; seems pretty trivial, but anyone already have
something written up?

--Augie

On Tue, Apr 1, 2008 at 5:02 PM, Augie Schwer <augie.schwer <at> gmail.com> wrote:
> Thanks everyone who replied (privately and on the list); attached is
>  what I finally went with; it works well, doesn't stack procs. for hung
>  mounts and works great using the snmpvar monitor.
>
>  --Augie
>
>
>
>  On Thu, Mar 27, 2008 at 3:27 PM, Augie Schwer <augie.schwer <at> gmail.com> wrote:
>  > Anyone have a good way to monitor for a hung NFS mount on a remote machine?
>  >
>  >  I've been at it all day trying to come up with a clever way to check
>  >  the hung mount, not let the monitor get hung and return some useful
>  >  information; like what mount is hung, but I've come to a dead end and
>  >  I think the best that can be done is to let the monitor timeout and
>  >  then sound an alarm based on that timeout.
>  >
>  >  Anyone else have ideas?
(Continue reading)

Ed Ravin | 2 Apr 2008 20:19
Picon
Favicon

Re: Monitoring for a hung NFS mount?

On Wed, Apr 02, 2008 at 10:49:00AM -0700, Augie Schwer wrote:
> On the topic of NFS; the next step would be to do a compare between
> mtab and fstab and alert if everything you thought was mounted
> actually wasn't; seems pretty trivial, but anyone already have
> something written up?

No, but remember that the location and semantics of mount tables varies
drastically with the operating system - Solaris, for example (and IIRC),
keeps the mount table in-kernel, and you need to call an API to see what's
mounted.  The equivalent of mtab is actually a device driver that calls
the API, not a regular file.  So don't hard code any paths and use
"test -e" (existence), not "test -f" (exists and is a regular file) when
scripting in the sanity checks.
Augie Schwer | 2 Apr 2008 23:15
Picon
Gravatar

Re: Monitoring for a hung NFS mount?

On Wed, Apr 2, 2008 at 11:19 AM, Ed Ravin <eravin <at> panix.com> wrote:
> On Wed, Apr 02, 2008 at 10:49:00AM -0700, Augie Schwer wrote:
>  > On the topic of NFS; the next step would be to do a compare between
>  > mtab and fstab and alert if everything you thought was mounted
>  > actually wasn't; seems pretty trivial, but anyone already have
>  > something written up?
>  No, but remember that the location and semantics of mount tables varies
>  drastically with the operating system - Solaris, for example (and IIRC),
>  keeps the mount table in-kernel, and you need to call an API to see what's
>  mounted.  The equivalent of mtab is actually a device driver that calls
>  the API, not a regular file.  So don't hard code any paths and use
>  "test -e" (existence), not "test -f" (exists and is a regular file) when
>  scripting in the sanity checks.

Noted; I think Jim's monitor does this already, so maybe I'll use that
as a basis.

--

-- 
Augie Schwer - Augie <at> Schwer.us - http://schwer.us
Key fingerprint = 9815 AE19 AFD1 1FE7 5DEE 2AC3 CB99 2784 27B0 C072
Rune Kristian Viken | 3 Apr 2008 10:09
Picon

mon.cgi very slow, communication protocol improvements?


I'm using mon to monitor > 600 hostgroups, with an average of 8 or so 
services each.  The total number of hosts is > 1000.

The main problem I've come accross is that mon.cgi is slow, and after some 
debugging, it seems that it's the communication with the mon-server that is 
slow.  I have to wait an average of about 12 seconds per pageview.

I've tried digging around a bit, and it seems that it's two routines in 
query_opstatus that takes quite a long time:

    %op_success = &mon_list_successes;
    %op_failure = &mon_list_failures ;

Looking at the communication protocol, it seems that the main drawback is 
that mon has to spin through a *lot* of data-structures and present them in 
a nice way.

I was thinking that this might accomplished faster by sharing the %watch and 
maybe %groups data-structure from mon, with the help of  
http://perldoc.perl.org/Storable.html .. but even though I feel I have 
decent know-how of mon-internals, I don't feel they're entirely up to 
scratch on how to implement this.

Is it a good idea?  A horrible idea?  Am I barking up the wrong tree, with 
something else being the main problem here?

--

-- 
Rune Kristian Viken
(Continue reading)

Tom af Hällström | 6 Apr 2008 12:49
Picon

Mon not working on Centos5?

Hi,

I have been using Mon for some time now on Debian and other distros. Now 
I've been trying to get it working on Centos 5 but haven't been able to. 
I've tried on 2 different centos machines and still the same. Mon starts 
ok with default setup but doesn't seem to do anything and I can't find 
any info of it in /var/log/messages either. I installed it via rpmforge 
repository. Somewhere I read of a patch for the mon file, but that did 
not help either. Is mon compatible with centos5 or what could be the 
problem?

Thank you,
Tom
Jan-Frode Myklebust | 7 Apr 2008 08:55

Re: Mon not working on Centos5?

On 2008-04-06, Tom af Hällström <tom <at> viestintaverso.fi> wrote:
> any info of it in /var/log/messages either. I installed it via rpmforge 
> repository. Somewhere I read of a patch for the mon file, but that did 
> not help either. Is mon compatible with centos5 or what could be the 
> problem?

It's working fine for me on RHEL5.. Maybe you're missing
some packages? I know we need to add "perl-Authen-PAM" which 
isn't listed as a dependency in the RPM, but mon woun't
start without it.

  -jf
aneeskA | 8 Apr 2008 07:34
Picon
Gravatar

general info on ' mon '

Hi there,

 I tried installing mon on my machine -- fedora 7 -- and met with some difficulties. Thats why i am writing this. I'll explain it step-by-step.

1. I installed mon from this rpm :- mon-1.2.0-1.fc7.rf.i386.rpm. If ' rpm -ivh ' doesn't work try ' yum install '

2. but 'monshow' failed with some errors. for that i installed these :-

    perl-Authen-PAM-0.16-1.2.fc7.rf.i386.rpm
    perl-Convert-BER-1.3101-1.fc7.rf.noarch.rpm
    perl-Mon-Client-1.0.0-0.pre2.0.2.noarch.rpm


3. after that mon worked fine.

Regards
-- anees

_______________________________________________
mon mailing list
mon <at> linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon
David Nolan | 10 Apr 2008 16:24
Picon
Favicon

Re: mon.cgi very slow, communication protocol improvements?

Rune,

You didn't mention some important bits of imformation, most
significantly what version of Mon, Mon::Client and mon.cgi you are
using.  There have been significant protocol changes in various
versions.  Speed problems that occurred with 0.99.2 are pretty much
gone with 1.2.0 for example.  Also what OS are you running on?

I'm using mon with well over 100 hostgroups without any performance
problems, with mon.cgi rendering a full page in under a second
typically.  I can't see how the performance would fail to scale to
600.

Off the top of my head I'm guessing that Storable would actually
increase the overhead in the mon server & cgi, as the data still has
to be transformed into the sharable form and then re-parsed.

-David

On Thu, Apr 3, 2008 at 4:09 AM, Rune Kristian Viken <runevi <at> basefarm.no> wrote:
>
>  I'm using mon to monitor > 600 hostgroups, with an average of 8 or so
>  services each.  The total number of hosts is > 1000.
>
>  The main problem I've come accross is that mon.cgi is slow, and after some
>  debugging, it seems that it's the communication with the mon-server that is
>  slow.  I have to wait an average of about 12 seconds per pageview.
>
>  I've tried digging around a bit, and it seems that it's two routines in
>  query_opstatus that takes quite a long time:
>
>     %op_success = &mon_list_successes;
>     %op_failure = &mon_list_failures ;
>
>  Looking at the communication protocol, it seems that the main drawback is
>  that mon has to spin through a *lot* of data-structures and present them in
>  a nice way.
>
>  I was thinking that this might accomplished faster by sharing the %watch and
>  maybe %groups data-structure from mon, with the help of
>  http://perldoc.perl.org/Storable.html .. but even though I feel I have
>  decent know-how of mon-internals, I don't feel they're entirely up to
>  scratch on how to implement this.
>
>  Is it a good idea?  A horrible idea?  Am I barking up the wrong tree, with
>  something else being the main problem here?
>
>  --
>  Rune Kristian Viken
>
>  _______________________________________________
>  mon mailing list
>  mon <at> linux.kernel.org
>  http://linux.kernel.org/mailman/listinfo/mon
>
>
aneeskA | 11 Apr 2008 07:59
Picon
Gravatar

alert on success rather than failure

hi all,

 is there a way to make an alert when something succeeds rather than failure ? i was able to achieve this by reversing the return value that monitors give back but i think thats not the proper way to do it. Any thoughts on this ?

regards

-- anees

_______________________________________________
mon mailing list
mon <at> linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

Gmane