quiritt | 1 Mar 09:10 2005

Re: Problems starting MON

The Authen/PAM.pm is not a typo, it is the actual output.

I have looked but couldn't find where mon tries to load that module, where
do I find it?

Don't think though there is a problem with my version as I used the mon
that was delivered with my distribution. Any other clues about what it
could be?

Regards,
John

---------------------------- Original Message ----------------------------
Subject: Problems starting MON
From:    quiritt.fragnations.com <at> linuxsystems.be
Date:    Mon, February 28, 2005 3:21 pm
To:      mon <at> linux.kernel.org
--------------------------------------------------------------------------

Hi,

I´m having problems to start mon on the host server of a ha cluster.

When I try to start the mon it gives me following error:

Starting mon daemon: cf error: could not use PAM authentication: Can´t
locate Authen/AM.pm in  <at> INC ( <at> INC contains:
/usr/lib/perl5/5.8.1/i386-linux-thread-multi /usr/lib/perl5/5.8.1
/usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl
(Continue reading)

Jim Trocki | 1 Mar 13:17 2005

Re: Problems starting MON

On Tue, 1 Mar 2005 quiritt <at> fragnations.com wrote:

> The Authen/PAM.pm is not a typo, it is the actual output.

the output in your mail said "Authen/AM.pm", not "Authen/PAM.pm", that's why
david asked if it was a type. the name of the perl module is Authen::PAM, and
if you're going to use PAM authentication with mon you'll need it installed.
to find out if you have it installed you can do this:

perl -MAuthen::PAM -e exit

and if it complaints, then you don't have it installed. either install it from
cpan or be sure you don't have the setting "authtype pam" in your mon.cf file.
Marko Riedel | 7 Mar 17:56 2005
X-Face
Picon

Upgrading to 1.1.0.


Hello there,

we upgraded to 1.1.0. So far everything seems to be okay, but traps no
longer work. We did not change the code at the machines that send
traps, except to install the latest version (1) of Mon::Client.

Now traps that used to work cause the following output:

trap trap 1 from  grp=somegroup svc=DYNDNS, sta=255 
failure for somegroup DYNDNS 1110213302 somehost DYNDNS OKA

As you can see the trap includes the output from the remote host,
which says that everything is okay. We did not chage the return codes
at all. How can a trap that used to work suddenly turn into a failure?

Thank you for your help.

--

-- 
+------------------------------------------------------------+
| Marko Riedel, EDV Neue Arbeit gGmbH, mriedel <at> neuearbeit.de |
| http://www.geocities.com/markoriedelde/index.html          |
+------------------------------------------------------------+
David Nolan | 9 Mar 16:36 2005
Picon

Re: Upgrading to 1.1.0.


--On Monday, March 07, 2005 5:56 PM +0100 Marko Riedel 
<mriedel <at> neuearbeit.de> wrote:

>
> Hello there,
>
> we upgraded to 1.1.0. So far everything seems to be okay, but traps no
> longer work. We did not change the code at the machines that send
> traps, except to install the latest version (1) of Mon::Client.
>
> Now traps that used to work cause the following output:
>
> trap trap 1 from  grp=somegroup svc=DYNDNS, sta=255
> failure for somegroup DYNDNS 1110213302 somehost DYNDNS OKA
>
> As you can see the trap includes the output from the remote host,
> which says that everything is okay. We did not chage the return codes
> at all. How can a trap that used to work suddenly turn into a failure?
>
> Thank you for your help.
>

Marko,

I'm trying to track this down to see if there is a bug.  The output you 
included is the syslog message thats sent on a trap being received.  The 
only problem I see in that message is that the source IP address of the 
trap isn't being filled in.

(Continue reading)

Aaron.Segura | 9 Mar 19:13 2005

Bug fixed?


[note -- sorry if this is a duplicate, I had trouble sending the first time]

Hi all,
        I think I fixed what may be considered a bug in mon.  It causes unexpected results, so that's how I would classify it.  We are currently testing mon 1.1.0-pre1, but the same problem existed in the previous version we used, 1.0.0-pre4.

        It appears that the environment variable MON_LAST_OUTPUT doesn't get set until after the first alert has already been called.  This was causing a problem where our alert script could not run because it couldn't find the (MON_LAST_)output that was supposed to have been generated by the monitor script until the SECOND alert was triggered.  We would never actually receive the first alert because our alert script couldn't find the MON_LAST_OUTPUT variable.

        I find it hard to believe that nobody else has had this problem.  Are we doing something wrong?  Was this done this way by design?

Here is a diff of the changes I made to v1.1.0-pre1...
       
3219a3220,3223
>     $sref->{"_last_output"} = $output;
>     $sref->{"_last_summary"} = $summary;
>     $sref->{"_last_detail"} = $detail;
>
3349,3355d3352
<
<     #
<     # save the output
<     #
<     $sref->{"_last_output"} = $output;
<     $sref->{"_last_summary"} = $summary;
<     $sref->{"_last_detail"} = $detail;
_______________________________________________
mon mailing list
mon <at> linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon
David Nolan | 9 Mar 20:28 2005
Picon

Re: Bug fixed?


--On Wednesday, March 09, 2005 12:13 PM -0600 Aaron.Segura <at> dtn.com wrote:

>         It appears that the environment variable MON_LAST_OUTPUT doesn't
> get set until after the first alert has already been called.  This was
> causing a problem where our alert script could not run because it
> couldn't  find the (MON_LAST_)output that was supposed to have been
> generated by the  monitor script until the SECOND alert was triggered.
> We would never  actually receive the first alert because our alert script
> couldn't find  the MON_LAST_OUTPUT variable.
>
>         I find it hard to believe that nobody else has had this problem.
> Are we doing something wrong?  Was this done this way by design?

Aaron,

I think you may be misunderstanding what MON_LAST_OUTPUT is for.  It's not 
supposed to be the output of the most recent test, its the output of the 
previous run.  i.e. if pass number 1 outputs 'Failure XYZ', and pass number 
2 outputs 'Status OK', then when the alert for pass 2 is called, 
MON_LAST_OUTPUT should be set to 'Failure XYZ'.

Does that explain the situation, or am I misunderstanding your explanation 
of the problem?

-David Nolan
 Network Software Designer
 Computing Services
 Carnegie Mellon University
Aaron.Segura | 9 Mar 21:01 2005

Re: Bug fixed?


David,

Thanks for the quick reply.  I got the impression from the man page that MON_LAST_OUTPUT should be the output of the monitor from the last time it was run...

       As with monitor programs, alert programs are invoked  with  environment
       variables defined by the user in the service definition, in addition to
       the following which are explicitly set by the server:
       
        [...clip...]

       MON_LAST_OUTPUT
              The entire output of the monitor from the last time it exited.

With "...the last time it exited" being directly before the alert was called.  I guess I don't see how it would be useful to have the results from two runs ago.  How is the alert script supposed to send any sort of relevent information unless it gets the output from the most recent run?  Alternatively, is there another variable which would tell me the output from the most recent monitor run?

thanks



David Nolan <vitroth+ <at> cmu.edu>
Sent by: mon-bounces <at> linux.kernel.org

03/09/2005 01:28 PM

To
mon <at> linux.kernel.org
cc
Subject
Re: Bug fixed?







--On Wednesday, March 09, 2005 12:13 PM -0600 Aaron.Segura <at> dtn.com wrote:

>         It appears that the environment variable MON_LAST_OUTPUT doesn't
> get set until after the first alert has already been called.  This was
> causing a problem where our alert script could not run because it
> couldn't  find the (MON_LAST_)output that was supposed to have been
> generated by the  monitor script until the SECOND alert was triggered.
> We would never  actually receive the first alert because our alert script
> couldn't find  the MON_LAST_OUTPUT variable.
>
>         I find it hard to believe that nobody else has had this problem.
> Are we doing something wrong?  Was this done this way by design?


Aaron,

I think you may be misunderstanding what MON_LAST_OUTPUT is for.  It's not
supposed to be the output of the most recent test, its the output of the
previous run.  i.e. if pass number 1 outputs 'Failure XYZ', and pass number
2 outputs 'Status OK', then when the alert for pass 2 is called,
MON_LAST_OUTPUT should be set to 'Failure XYZ'.

Does that explain the situation, or am I misunderstanding your explanation
of the problem?



-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University

_______________________________________________
mon mailing list
mon <at> linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon

_______________________________________________
mon mailing list
mon <at> linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon
David Nolan | 9 Mar 21:13 2005
Picon

Re: Bug fixed?


--On Wednesday, March 09, 2005 2:01 PM -0600 Aaron.Segura <at> dtn.com wrote:

> With "...the last time it exited" being directly before the alert was
> called.  I guess I don't see how it would be useful to have the results
> from two runs ago.  How is the alert script supposed to send any sort of
> relevent information unless it gets the output from the most recent run?
> Alternatively, is there another variable which would tell me the output
> from the most recent monitor run?

The output from the most recent run comes on STDIN.

>From the man page:
"The first line from standard input must be used as a brief summary of the 
problem, normally supplied as the subject line of an email, or text sent to 
an alphanumeric pager. Interpretation of all subsequent lines read from 
stdin is left up to the alerting program."

And the reason to get both is so the alert about the end of the failure can 
optionally include information about the failure.  Going back to my earlier 
example, with output messages of 'Failure of XYZ' and 'Status OK', when 
processing the upalert for the second pass, you might want your message 
contents to look something like:
Current Status:
Status OK
Previous Status:
Failure of XYZ

-David Nolan
 Network Software Designer
 Computing Services
 Carnegie Mellon University
Aaron.Segura | 9 Mar 21:23 2005

Re: Bug fixed?


Here it comes...

D'oh!

I read that section before, but the lightbulb over my head never turned on until just now...

Thanks for the pointer.  Still, maybe the man page should be changed to be more specific regarding the contents of MON_LAST_OUTPUT and MON_LAST_SUMMARY...you know, for stupid people...not me :)



David Nolan <vitroth+ <at> cmu.edu>

03/09/2005 02:13 PM

To
Aaron.Segura <at> dtn.com, mon <at> linux.kernel.org
cc
Subject
Re: Bug fixed?







--On Wednesday, March 09, 2005 2:01 PM -0600 Aaron.Segura <at> dtn.com wrote:

> With "...the last time it exited" being directly before the alert was
> called.  I guess I don't see how it would be useful to have the results
> from two runs ago.  How is the alert script supposed to send any sort of
> relevent information unless it gets the output from the most recent run?
> Alternatively, is there another variable which would tell me the output
> from the most recent monitor run?

The output from the most recent run comes on STDIN.

>From the man page:
"The first line from standard input must be used as a brief summary of the
problem, normally supplied as the subject line of an email, or text sent to
an alphanumeric pager. Interpretation of all subsequent lines read from
stdin is left up to the alerting program."

And the reason to get both is so the alert about the end of the failure can
optionally include information about the failure.  Going back to my earlier
example, with output messages of 'Failure of XYZ' and 'Status OK', when
processing the upalert for the second pass, you might want your message
contents to look something like:
Current Status:
Status OK
Previous Status:
Failure of XYZ


-David Nolan
Network Software Designer
Computing Services
Carnegie Mellon University


_______________________________________________
mon mailing list
mon <at> linux.kernel.org
http://linux.kernel.org/mailman/listinfo/mon
David Nolan | 9 Mar 22:07 2005
Picon

Re: Bug fixed?


--On Wednesday, March 09, 2005 2:23 PM -0600 Aaron.Segura <at> dtn.com wrote:

> Thanks for the pointer.  Still, maybe the man page should be changed to
> be  more specific regarding the contents of MON_LAST_OUTPUT and
> MON_LAST_SUMMARY...you know, for stupid people...not me :)

How's this?

MON_LAST_SUMMARY
The first line of the output from the last time the monitor exited.
This is not the summary of the current monitor run, but the previous
one.  This may be used by an alert script to provide historical
context in an alert.

MON_LAST_OUTPUT
The entire output of the monitor from the last time it exited.  This
is not the output of the current monitor run, but the previous one.
This may be used by an alert script to provide historical context in
an alert.

I'll commit that to CVS after I clarify some of the other variables as 
well...

-David Nolan
 Network Software Designer
 Computing Services
 Carnegie Mellon University

Gmane