Chang, Jason | 11 Sep 18:43 2014

[Check_mk (english)] Advise needed on monitoring CTDB cluster servers

I'm trying to find a best solution of monitoring CTDB cluster servers. When a node from a cluster goes offline, the ip gets taken over by other nodes to sustain it's service level. I think this would cause some weird behaviors. Anyone successfully implemented CTDB monitoring through check_mk?
_______________________________________________
checkmk-en mailing list
checkmk-en@...
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Ivan Verstraeten | 11 Sep 17:09 2014
Picon

[Check_mk (english)] Inventorize ifindex 0 in a network device

Is possible to inventorize an ifindex 0 in a network device. When i 
inventorize it starts with ifindex 1.

Thanks in advance,
Ivan
Stephen Berg (Contractor | 11 Sep 15:55 2014
Picon

[Check_mk (english)] MAC addresses from a switch

Is there anyway to have a switch that's being checked via snmp to show 
the MAC address of the device plugged into each port?

For instance, in a Dell PowerConnect 5424 switch I see a service for 
each interface.  The services for those ports show a MAC address 
belonging to the switch.  I'd like to see the MAC of the device plugged 
into that port instead.

--

-- 
Stephen Berg
Systems Administrator
NRL Code: 7320
Office: 228-688-5738
stephen.berg.ctr@...
Henri Wahl | 11 Sep 08:12 2014
Picon

[Check_mk (english)] "Hostgroups the host is member of" empty in 1.2.4p5


Hello list,
I by accident noticed that in the status view of a host in 1.2.4p5 the
line "Hostgroups the host is member of" is empty even if the host
belongs to a group. Did I miss something or is this a bug?
Best regards
Henri

-- 
Henri Wahl

IT Department
Leibniz-Institut fuer Festkoerper- u.
Werkstoffforschung Dresden

tel: (03 51) 46 59 - 797
email: h.wahl@...
http://www.ifw-dresden.de

Nagios status monitor Nagstamon:
http://nagstamon.ifw-dresden.de

DHCPv6 server dhcpy6d:
http://dhcpy6d.ifw-dresden.de

IFW Dresden e.V., Helmholtzstrasse 20, D-01069 Dresden
VR Dresden Nr. 1369
Vorstand: Prof. Dr. Manfred Hennecke, Dr. h.c. Dipl.-Finw. Rolf Pfrengle
Steven McDowall | 11 Sep 01:29 2014

[Check_mk (english)] Chrome issue -- solved!


Yep, that did the trick -- upgraded to the Beta for Mac Os X -- nice to know it was easy !

Thanks!

/Steve

Steven McDowall

Galaxy Semiconductor Solutions
Mobile           +1.336.608.2001
Website        www.galaxysemi.com

 



Message: 1
Date: Wed, 10 Sep 2014 04:23:14 +0200
From: Marcel Schulte <schulte.marcel-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: checkmk-en-qhrM8SXbD5JGu4B6N57VcA@public.gmane.orgas-kettner.de
Subject: Re: [Check_mk (english)] Just installed 1.2.4p5 on a Centos
6.5 and getting near constant Chrome "Aw Snap" errors when accessing
the GUI
Message-ID:
<CAPyBW-z7oahbaFvBH8ZrdgYb_B=B5peUL05dAtcB4CnFr4B6Hw-JsoAwUIsXov1KXRcyAk9cg@public.gmane.orgl.com>
Content-Type: text/plain; charset="utf-8"

Hi Steven,

This has already been discussed in another thread, in short:

* Chrome issue, not CMK
* Fixed in Chrome 38 and 39

HTH,
Marcel
Am 10.09.2014 02:38 schrieb "Steven McDowall" <
steven.mcdowall <at> galaxysemi.com>:


Subject pretty much explains it.. basically anytime I hover over anything
with 10-15 seconds the dreaded error message comes up.

I have the console on, but nothing there except a gray box saying
"Inspected Target Crashed" running on a Mac OS X

This is Chrome Version 37.0.2062.94

running on a Mac OS X

Some preliminary playing around with Safari doesn't SEEM to have the same
issue ...


*Steven McDowall*
Galaxy Semiconductor Solutions
Mobile           +1.336.608.2001
Website        www.galaxysemi.com




_______________________________________________
checkmk-en mailing list
checkmk-en@...
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Tata, Joseph | 10 Sep 20:07 2014

[Check_mk (english)] Issue with rule based notifications and new OMD installs

Apologies if this is the wrong list but I think I've discovered an issue with Rule Based Notifications on servers which were installed using an innovation release rather than updated from a stable release.  I have 2 servers, one was deployed with omd-1.2.4p3 and then upgraded to 1.2.5i3 and subsequently 1.2.5.5i5p2.  This server had RBN turned on and working fine (awesome feature by the way).  I recently deployed a second server and installed 1.2.5.5i5p2 with no prior versions of omd.  When I attempt turn on RBN I get the following error during activating changes:

Error: Could not find any contactgroup matching 'check-mk-notify' (config file '/omd/sites/foo/etc/nagios/conf.d/check_mk_objects.cfg', starting on line 70)
Because this error was causing Nagios to roll back the change it took some looking to find what's going on because there was no check_mk_objects.cfg to look at.  However on my original server the following is in check_mk_objects.cfg under the define contact groups sections:

define contactgroup {
  contactgroup_name             check-mk-notify
  alias                         check-mk-notify
}

I've attempted to recreate whatever allowed me to have RBN turned on.  It appears that this occurs when you have a site which was deployed with an innovation release rather than updated from a stable.  I've tried 1.2.5i3 through 1.2.5i5p3 and all have the same issue.  The only way I've been successful is if I deploy a stable release (1.2.4p5), create at least one object (so the check_mk_objects.cfg file exists), then update to an innovation release (1.2.5i3).  It looks like the stable version is configured to define that contact group when you turn on RBN, but the innovation does not.

I know that innovation releases aren't officially supported but I wanted to share this information both to help improve the product and in case someone else runs into this issue. 



_______________________________________________
checkmk-en mailing list
checkmk-en@...
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Markus.Weber | 10 Sep 16:13 2014
Picon

[Check_mk (english)] mk_jolokia specific mbeans

Hey List,

We use mk_jolokia to monitor our Applicationserver and are overall satisfied with the results and how it works.
For an application we need to monitor some specific MBeans not provided by the mk_jolokia plugin.
How can I get these values via mk_jolokia? 
I tried to add a line to the global_vars and the specific_vars array in mk_jolokia but no luck. 
Could it be possible to add a check for "generic" values in check_mk?

Regards Markus
nitin gupta | 10 Sep 15:29 2014
Picon

[Check_mk (english)] Regarding the pulling HEX-STRING from MIBS.

Hi All,

I am implementing the one check_mk check in which i need to fetch mac address of a device. type of OID corresponding to mac is Hex-STRING. But while i fetch the plugin output it shows me some arbitrary character appended with this.

Like  as shown below :

[['\n\x00>\xdf\x85\x0b']]

How it could be possible to fetch right value ?

Thanks
 Nitin

_______________________________________________
checkmk-en mailing list
checkmk-en@...
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Václav Ovsík | 10 Sep 13:46 2014
Picon

[Check_mk (english)] megaraid_bbu: false alert right after battery learn cycle

Hi,
I have Check_MK 1.2.4p5 and received false alert from megaraid_bbu check
today. There is problematic period, when the learn cycle is completed
and the battery start charging from empty to charged state.

My hardware:
    Dell PowerEdge R710
    Product Name    : PERC H700 Integrated
    FW Package Build: 12.10.6-0001

BTW: I have done FW upgrade recently in the hope the "Overcharged" BBU
     will dismiss. Unfortunately "Overcharged" still occurs from time
     to time.

MegaCLI version: 8.07.14 Dec 16, 2013

There are alerts (time is human readable conversion):

    [2014/09/10 10:30:54] SERVICE ALERT: bess.i.cz;RAID Adapter/BBU 0;CRITICAL;SOFT;1;CRIT - Charging
Status is Charging (!) (Expected: None), Remaining Capacity Low is Yes (!) (Expected: No), Battery State
is Degraded(Charging) (!!) (Expected: Operational), Charge is 3 %

    [2014/09/10 10:32:54] SERVICE ALERT: bess.i.cz;RAID Adapter/BBU 0;CRITICAL;HARD;2;CRIT - Charging
Status is Charging (!) (Expected: None), Remaining Capacity Low is Yes (!) (Expected: No), Battery State
is Degraded(Charging) (!!) (Expected: Operational), Charge is 5 %

    [2014/09/10 11:15:00] SERVICE ALERT: bess.i.cz;RAID Adapter/BBU 0;WARNING;HARD;2;WARN - Charging
Status is Charging (!) (Expected: None), Charge is 39 %

Portion of log from RAID adapter (the time is 2 hours back - UTC):
    09/10/14  6:08:42: EVT#19828-09/10/14  6:08:42: 151=Battery relearn started
    09/10/14  6:09:47: EVT#19829-09/10/14  6:09:47: 148=Battery is discharging
    09/10/14  6:09:47: EVT#19830-09/10/14  6:09:47: 152=Battery relearn in progress
    09/10/14  7:13:02: EVT#19831-09/10/14  7:13:02: 162=Current capacity of the battery is below threshold
    09/10/14  7:13:04: EVT#19832-09/10/14  7:13:04: 195=BBU disabled; changing WB virtual disks to WT,
Forced WB VDs are not affected
    09/10/14  7:13:04: Change in current cache property detected for LD : 0!
    09/10/14  7:13:04: EVT#19833-09/10/14  7:13:04:  54=Policy change on VD 00/0 to
[ID=00,dcp=0d,ccp=0c,ap=0,dc=0,dbgi=0,S=0|0] from [
    ID=00,dcp=0d,ccp=0d,ap=0,dc=0,dbgi=0,S=0|0]
    09/10/14  8:29:47: EVT#19834-09/10/14  8:29:47: 153=Battery relearn completed
    09/10/14  8:29:47: Learn completed successfully.
    09/10/14  8:29:47: Learn completed successfully
    09/10/14  8:29:47: Next Learn will start on 12 09 2014

    09/10/14  8:29:47:       *** BATTERY FEATURE PROPERTIES ***
    09/10/14  8:29:47:  _________________________________________________

    09/10/14  8:29:47:       Auto Learn Period     : 90  days
    09/10/14  8:29:47:       Next Learn Time       : 471428987
    09/10/14  8:29:47:       Battery ID            : 433001af
    09/10/14  8:29:47:       Delayed Learn Interval: 0  hours from scheduled time
    09/10/14  8:29:47:       Next Learn scheduled on: 12 09 2014   8:29:47
    09/10/14  8:29:47:  _________________________________________________

    09/10/14  8:30:02: EVT#19835-09/10/14  8:30:02: 162=Current capacity of the battery is below threshold
    09/10/14  8:30:02: EVT#19836-09/10/14  8:30:02: 147=Battery started charging
    09/10/14  9:14:27: EVT#19837-09/10/14  9:14:27: 163=Current capacity of the battery is above threshold
    09/10/14  9:14:27: EVT#19838-09/10/14  9:14:27: 194=BBU enabled; changing WT virtual disks to WB
    09/10/14  9:14:27: Change in current cache property detected for LD : 0!
    09/10/14  9:14:27: EVT#19839-09/10/14  9:14:27:  54=Policy change on VD 00/0 to
[ID=00,dcp=0d,ccp=0d,ap=0,dc=0,dbgi=0,S=0|0] from [ID=00,dcp=0d,ccp=0c,ap=0,dc=0,dbgi=0,S=0|0]

It is apparent from the log above, that during discharge (learning
cycle) there was no alert from Check MK. When the learning cycle ended
8:29:47 UTC (10:29 local time) the Check MK detected alert, but Raid
adapter had disabled BBUi and was in mode Write-Through. At the time
9:14 UTC (11:14 local time).

So there is a problematic period of time, when the battery is below
threshold. CMK should detect the BBU is not needed and suppress alerts.
Maybe

    megacli -AdpGetProp WBSupport -aALL -NoLog

I don't know.
Unfortunately my battery is above threshold now already.
Cheers
--

-- 
Zito
Steven McDowall | 10 Sep 02:29 2014

[Check_mk (english)] Just installed 1.2.4p5 on a Centos 6.5 and getting near constant Chrome "Aw Snap" errors when accessing the GUI


Subject pretty much explains it.. basically anytime I hover over anything with 10-15 seconds the dreaded error message comes up.

I have the console on, but nothing there except a gray box saying "Inspected Target Crashed" running on a Mac OS X 

This is Chrome Version 37.0.2062.94

running on a Mac OS X

Some preliminary playing around with Safari doesn't SEEM to have the same issue ... 


Steven McDowall

Galaxy Semiconductor Solutions
Mobile           +1.336.608.2001
Website        www.galaxysemi.com

 


_______________________________________________
checkmk-en mailing list
checkmk-en@...
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en
Andreas Döhler | 9 Sep 18:52 2014
Picon

Re: [Check_mk (english)] What exactly are "In errors" and do they have any significance?

The output of your module ports look very strange - no port speed detected correctly - is the check if or if64?
Today i could open your error graph and this shows me that you have a steady stream of only some packages per second with errors.
With this data it is clear why most of your interfaces shows errors. The 99% error is normal as you had only 1 packet per second in the last minute on this port 
incoming, and one package with errors are 100% error :)

What happens if you use such a switch in a test environment without switches from other vendors connected?


2014-09-09 14:20 GMT+02:00 Simon Vargas <simonv4-KK0ffGbhmjU@public.gmane.org>:
Hello

CRIT - [Module 1 Port 5] (up) MAC: ec:cd:6d:75:eb:4f, 208.00MBit/s, in: 4.28B/s(0.0%), in-errors: 99.68%CRIT >= 0.1, out: 96.12kB/s(0.4%)
 
Well these look far too abnormal to me 99% errors? When I look at the error graph it's basically an upside down mirror of the OUT UNICAST traffic for the interface.
These switches can handle vlan packets, jumbo frames etc. What packet's couldnt they possibly handle? If there would be real issues with the network we would notice, these look like completely bogous reports to me.

On the other hand 1 of the linksys switches only have errors on 1 interface (the uplink) all the others are green.

Could LLDP, Ipv6, IPX packets trigger these type of errors?

I made a new picture (this time in Jpeg format).

Thanks!

Sent: Monday, September 08, 2014 at 11:07 AM
From: "Andreas Döhler" <andreas.doehler-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
To: "Simon Vargas" <simonv4-KK0ffGbhmjU@public.gmane.org>
Cc: "checkmk-en-qhrM8SXbD5JpaB0eVFyvwnWFp+d4uDoM@public.gmane.org" <checkmk-en-qhrM8SXbD5JpaB0eVFyvwnWFp+d4uDoM@public.gmane.org>
Subject: Re: [Check_mk (english)] What exactly are "In errors" and do they have any significance?

It is possible that the Allied-Telesis switches drop all packages that they not know how to handle.
Is this behavior the same on switch ports with higher traffic than your [Module 1 Port 1] as there is no traffic and some 
unknown packages gave immediate error messages?
The picture i cannot open. 
The perfdata line you posted looks very normal no errors.
 
br
Andreas
 
 
2014-09-08 9:38 GMT+02:00 Simon Vargas <simonv4-KK0ffGbhmjU@public.gmane.org>:Hello

After adding couple of Allied-Telesis switches to our checkmk, it's flooded with red error messages for 'in errors' on the switch ports such as:

CRIT    Interface 01    [Reschedule an immediate check of the 'Check_MK' service] [This problem has been acknowledged]
CRIT - [Module 1 Port 1] (up) MAC: ec:cd:6d:75:eb:4b, speed unknown, in: 4.25B/s, in-errors: 98.29%CRIT >= 0.1, out: 19.82kB/s

The graph shows:
http://i58.tinypic.com/2znmrrt.png[http://i58.tinypic.com/2znmrrt.png]

This is ridicolous, there is no way that there are so many consistent errors on all the switch ports. I think this value is interpreted somehow incorrectly.
The workaround we did is acknowledge these errors, but still getting new letters about them every day:

Perfdata: in=3425.901409;;;0;12500000 inucast=33.402694;;;; innucast=0.016735;;;; indisc=0;;;; inerr=0;0.01;0.1;; out=74351.617834;;;0;12500000 outucast=55.14122;;;; outnucast=2.192261;;;; outdisc=0;;;; outerr=0;0.01;0.1;; outqlen=0;;;0;

Monitoring these switches hurt more than it actually helps.
I cannot find much documentation about what this is, only something on Cisco's site: "In Errors is the sum of all error packets received on that port."


Anybody has experience on this? Please share.

Thanks
_______________________________________________
checkmk-en mailing list
checkmk-en <at> lists.mathias-kettner.de[checkmk-en-qhrM8SXbD5JpaB0eVFyvwnWFp+d4uDoM@public.gmane.org]
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

_______________________________________________
checkmk-en mailing list
checkmk-en@...
http://lists.mathias-kettner.de/mailman/listinfo/checkmk-en

Gmane