Craig Small | 2 Jun 2009 03:31
Picon
Gravatar

Re: wrong SLA-messages

On Fri, May 22, 2009 at 09:18:54AM +0200, sgu@... wrote:
> I have a problem with SLA messages on jffnms0.83 since upgrading my host to ubuntu 9.04.
> Events are showing wrong values, e.g. 
> SLA   	   Ubuntu  <at> home    	 / Storage Used > 80%: 800 % (Home FixedDisk 42908311552) 
> or 
> SLA   	   Ubuntu  <at> home    	 CPU Usage > 80%: 91.67 % (Home Linux ubuntu 2.6.28-11-generic...
> 
> RRDTool graphs are looking ok, so JFFNMS calculates something wrong.
The first one looks wrong, did you adjust the storage there? To me that
seems your total storage didn't update.

The second one could be possible, its only showing 91.67% load.
--

-- 
Craig Small      GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
http://www.enc.com.au/                             csmall at : enc.com.au
http://www.debian.org/          Debian GNU/Linux, software should be Free 

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
sgu | 2 Jun 2009 20:29
Picon
Picon

Re: wrong SLA-messages

The second one is wrong, too. The CPU usage is lower than 10%.
RRDTool graph looks ok. I have a second JFFNMS running in a VM, there everything looks perfect.

I did not change any storage. When I create a new element on my host-JFFNMS it goes wrong. 

-------- Original-Nachricht --------
> Datum: Tue, 2 Jun 2009 11:31:46 +1000
> Von: Craig Small <csmall@...>
> An: jffnms-users@...
> Betreff: Re: [jffnms-users] wrong SLA-messages

> On Fri, May 22, 2009 at 09:18:54AM +0200, sgu@... wrote:
> > I have a problem with SLA messages on jffnms0.83 since upgrading my host
> to ubuntu 9.04.
> > Events are showing wrong values, e.g. 
> > SLA   	   Ubuntu  <at> home    	 / Storage Used > 80%: 800 % (Home FixedDisk
> 42908311552) 
> > or 
> > SLA   	   Ubuntu  <at> home    	 CPU Usage > 80%: 91.67 % (Home Linux ubuntu
> 2.6.28-11-generic...
> > 
> > RRDTool graphs are looking ok, so JFFNMS calculates something wrong.
> The first one looks wrong, did you adjust the storage there? To me that
> seems your total storage didn't update.
> 
> The second one could be possible, its only showing 91.67% load.
> -- 
> Craig Small      GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
> http://www.enc.com.au/                             csmall at : enc.com.au
> http://www.debian.org/          Debian GNU/Linux, software should be Free 
(Continue reading)

Leland Ray | 2 Jun 2009 20:51

Using jffnms through fastcgi

Is anyone using jffnms via fastcgi?

I've been trying to set it up and am getting something strange. The
First index page tries to redirect to admin/setup.php, and at that point
the page enters a redirect loop. 

------------------------------------------------------------------------------
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
Craig Small | 9 Jun 2009 07:10
Picon
Gravatar

Re: wrong SLA-messages

On Tue, Jun 02, 2009 at 08:29:30PM +0200, sgu@... wrote:
> The second one is wrong, too. The CPU usage is lower than 10%.
> RRDTool graph looks ok. I have a second JFFNMS running in a VM, there everything looks perfect.
The graph looking fine is a good start, it means that the pollers are
behaving at least.

Can you run the rrd_analyzer on the command line and find the relevant
lines?  If you know your interface ID of the problem interfaces you will
find that at the start of the lines, eg

20:14:38 I90 : Start: 2006-01-13 19:40:00 Stop: 2006-01-13 20:05:00 Measures: 5

I90 means it is for interface ID 90.

We can then see what the logic of the SLA is, what it is getting etc.

- Craig

--

-- 
Craig Small      GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
http://www.enc.com.au/                             csmall at : enc.com.au
http://www.debian.org/          Debian GNU/Linux, software should be Free 

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
(Continue reading)

Craig Small | 9 Jun 2009 07:13
Picon
Gravatar

Re: Using jffnms through fastcgi

On Tue, Jun 02, 2009 at 11:51:57AM -0700, Leland Ray wrote:
> Is anyone using jffnms via fastcgi?
> 
> I've been trying to set it up and am getting something strange. The
> First index page tries to redirect to admin/setup.php, and at that point
> the page enters a redirect loop. 
Now that is an interesting idea. It might cut down on the load you get
if you hit a lot of webpages.

redirecting to admin/setup.php sounds like it cannot find the config
files. I'm sorry I cannot help you more but never tried fastcgi myself.

 - Craig
--

-- 
Craig Small      GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
http://www.enc.com.au/                             csmall at : enc.com.au
http://www.debian.org/          Debian GNU/Linux, software should be Free 

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
sgu | 12 Jun 2009 12:43
Picon
Picon

Re: wrong SLA-messages

Ok, thanks Craig. 

here is an alarm from today:

12 Jun   12:32:00   	      	  SLA   	   Ubuntu  <at> home    	 / Storage Used > 80%: 600 % (Home FixedDisk 42908311552) 

running rrd_analyzer.php manually delivers:

12:32:00 I221 : ==========================================================================================
12:32:00 I221 : Start: 2009-06-12 12:00:00	Stop: 2009-06-12 12:25:00	Measures: 5
12:32:00 I221 : storage_block_size(4) storage_block_count(1) storage_used_blocks(6)
12:32:00 I221 : ------------------------------------------------------------------------------------------
12:32:00 New Event (16248): 2009-06-12 12:32:00 - 12 - 8 - / - alert - rrd_analizer_sla - Storage Used > 80%:
600 % - 9
12:32:00 I221 : sla : Cond0: ( 6 > ((1*80)/100))                    = 1  -TRUE- Used > 80%: 600 %
12:32:00 I221 : sla : Final Eval: True. INFO: Storage Used > 80%: 600 %

Hopefully this information will help you?!

> On Tue, Jun 02, 2009 at 08:29:30PM +0200, sgu@... wrote:
> > The second one is wrong, too. The CPU usage is lower than 10%.
> > RRDTool graph looks ok. I have a second JFFNMS running in a VM, there
> everything looks perfect.
> The graph looking fine is a good start, it means that the pollers are
> behaving at least.
> 
> Can you run the rrd_analyzer on the command line and find the relevant
> lines?  If you know your interface ID of the problem interfaces you will
> find that at the start of the lines, eg
> 
(Continue reading)

Craig Small | 13 Jun 2009 01:45
Picon
Gravatar

Re: wrong SLA-messages

On Fri, Jun 12, 2009 at 12:43:14PM +0200, sgu@... wrote:
> 12:32:00 I221 : Start: 2009-06-12 12:00:00	Stop: 2009-06-12 12:25:00	Measures: 5
> 12:32:00 I221 : storage_block_size(4) storage_block_count(1) storage_used_blocks(6)
OK, 5 samples and we have 3 results with their values over that time:
What is this telling us?
You:
  have a disk of 4 bytes
  are using 24 of the 4 bytes of your drive, which is 600%

Let's look at my disk:
09:35:24 I27 : storage_block_size(4096) storage_block_count(3844436)
storage_used_blocks(2563235)
I have a 4096 * 3844436 which is about 15 Gig drive, or partition.

So we know the problem is in the fetching of the data, because I'm quite
sure you don't have a 4 byte drive.

You are absolutely certain the rrd graph for this drive is looking ok?
Find the host id for what server the disk is on and let me see the
numbers coming from the poller. For me with host=2 interface=27 i use

su -s /bin/sh -c 'php -q poller.php 2 27' jffnms

My lines look like this:
09:40:52  :  H   2 :  I  27 :  P  10 : snmp_counter:storage_block_size(.1.3...31): 4096 -> buffer(): 1 (time P:1.73
| 0.46)
09:40:52  :  H   2 :  I  27 :  P  20 : snmp_counter:storage_block_count(.1.3...31): 3844436 -> buffer(): 2 (time
P:1.31 | 0.11)
09:40:52  :  H   2 :  I  27 :  P  30 : snmp_counter:storage_used_blocks(.1.3...31): 2563312 -> buffer(): 3 (time
P:1.35 | 0.1)
(Continue reading)

sgu | 14 Jun 2009 22:52
Picon
Picon

Re: wrong SLA-messages

Hello,
that looks very strange...
My root partition has a capacy of 64G. 42% of storage is used. This shows rrd-graph absolutely correct.

To check I created this evening new hosts on my faulty JFFNMS and the JFFNMS running parallel in a VM for my
hardware host.

Following the prints from my faulty JFFNMS:

21:26:03 I262 : ==========================================================================================
21:26:03 I262 : Start: 2009-06-14 20:55:00	Stop: 2009-06-14 21:20:00	Measures: 5
21:26:03 I262 : storage_block_size(4) storage_block_count(1) storage_used_blocks(6)
21:26:03 I262 : ------------------------------------------------------------------------------------------
21:26:03 New Event (16462): 2009-06-14 21:26:03 - 12 - 34 - / - alert - rrd_analizer_sla - Storage Used > 80%:
600 % - 9
21:26:03 I262 : sla : Cond0: ( 6 > ((1*80)/100))                    = 1  -TRUE- Used > 80%: 600 %
21:26:03 I262 : sla : Final Eval: True. INFO: Storage Used > 80%: 600 %

sudo /bin/sh -c 'php -q poller.php 34 262' jffnms
21:29:05  :  H  34 : Poller Start : 6 Items.
21:29:05  :  H  34 :  I 262 :  P   5 : storage_verify(): 31 -> verify_interface_number(): index not changed (time
P:8.77 | 1.98) 
21:29:05  :  H  34 :  I 262 :  P  10 : snmp_counter:storage_block_size(.1.3...31): 4096 -> buffer(): 1 (time
P:2.36 | 0.54) 
21:29:05  :  H  34 :  I 262 :  P  20 : snmp_counter:storage_block_count(.1.3...31): 15556272 -> buffer(): 2
(time P:1.91 | 0.11) 
21:29:05  :  H  34 :  I 262 :  P  30 : snmp_counter:storage_used_blocks(.1.3...31): 6551204 -> buffer(): 3 (time
P:1.63 | 0.16) 
21:29:05  :  H  34 :  I 262 :  P  60 : no_poller(): 0 -> rrd(*): storage_block_size:4096 -
storage_block_count:15556272 - storage_used_blocks:6551204 (time P:0.28 | 38.98) 
(Continue reading)

Craig Small | 16 Jun 2009 07:06
Picon
Gravatar

Re: wrong SLA-messages

On Sun, Jun 14, 2009 at 10:52:33PM +0200, sgu@... wrote:
> Following the prints from my faulty JFFNMS:
> 21:26:03 I262 : storage_block_size(4) storage_block_count(1) storage_used_blocks(6)
Yes, there is the strangely small disk and yet:
> 21:29:05  :  H  34 :  I 262 :  P  60 : no_poller(): 0 -> rrd(*): storage_block_size:4096 -
storage_block_count:15556272 - storage_used_blocks:6551204 (time P:0.28 | 38.98) 
These look correct.

> And here the printouts from my "good" JFFNMS:
> 21:34:19 I23 : storage_block_size(4) storage_block_count(1) storage_used_blocks(6)
? These are the same numbers!! Why is it a "good" JFFNMS? Was this a cut
and paste problem?

> 21:36:36  :  H   3 :  I  23 :  P  60 : no_poller(): 0 -> rrd(*): storage_block_size:4096 -
storage_block_count:15556272 - storage_used_blocks:6551434 (time P:0.08 | 118.25) 
Pretty close to the "bad" JFFNMS.

It's looking like either a strange JFFNMS bug or rrdtool is messed up.

 - Craig

--

-- 
Craig Small      GnuPG:1C1B D893 1418 2AF4 45EE  95CB C76C E5AC 12CA DFA5
http://www.enc.com.au/                             csmall at : enc.com.au
http://www.debian.org/          Debian GNU/Linux, software should be Free 

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
(Continue reading)

Felipe Aceitón | 16 Jun 2009 09:51
Picon

No traffic Graphs (its not the : problem)

Hi everybody,

 

I have a very strange problem here. I have two machines with JFFNMS:

 

JFFNMS1

Debian, kernel 2.6.18-6-686

Jffnms 0.8.3

 

JFFNMS2

Lenny, kernel 2.6.26-2-686

Jffnms 0.8.3

 

I have been monitoring the same server with the two different machines and with the first one I get the traffic graphs but with the second one I get the “The RRDTool files for Interface ID 40 (from Host ID 7), has not been created by the Poller Process yet”

 

I go to the rrd folder and I check that the files for that interface exist (in both servers):

 

interface-40-0.rrd

interface-40-10.rrd

interface-40-11.rrd

interface-40-1.rrd

interface-40-2.rrd

interface-40-3.rrd

interface-40-4.rrd

interface-40-5.rrd

interface-40-6.rrd

interface-40-7.rrd

interface-40-8.rrd

interface-40-9.rrd

 

I run the poller manually and everything seems to be correct:

 

jffnms:/opt/jffnms/engine# php poller.php 7 40 0 0

09:46:28  :  H   2 : Poller Start : 19 Items.

09:46:28  :  H   2 :  I  33 :  P  10 : verify_interface_number(): 65539 -> verify_interface_number(): interfacenumber not changed (time P:4.9 | 1.51)

09:46:28  :  H   2 :  I  33 :  P  15 : cisco_snmp_ping_start:cisco_snmp_ping_start(): -5 -> buffer(): 1 (time P:1.94 | 0.28)

09:46:28  :  H   2 :  I  33 :  P  16 : interface_oper_status(8): up -> alarm(3,,180): Nothing was done (time P:4.87 | 0.99)

09:46:28  :  H   2 :  I  33 :  P  17 : interface_admin_status(7): up -> db(show_rootmap,down=2|up=1,0): 0 (time P:2.13 | 0.35)

09:46:28  :  H   2 :  I  33 :  P  20 : snmp_counter:input(.1.3..539): 1037947777 -> buffer(): 2 (time P:1.44 | 0.1)

09:46:28  :  H   2 :  I  33 :  P  25 : snmp_counter:inpackets(.1.3..539): 45291819 -> buffer(): 3 (time P:1.05 | 0.15)

09:46:28  :  H   2 :  I  33 :  P  30 : snmp_counter:output(.1.3..539): 3299246156 -> buffer(): 4 (time P:1.1 | 0.1)

09:46:28  :  H   2 :  I  33 :  P  35 : snmp_counter:outpackets(.1.3..539): 27288241 -> buffer(): 5 (time P:1.13 | 0.07)

09:46:28  :  H   2 :  I  33 :  P  40 : snmp_counter:outputerrors(.1.3..539): 12921 -> buffer(): 6 (time P:0.96 | 0.07)

09:46:28  :  H   2 :  I  33 :  P  45 : snmp_counter:inputerrors(.1.3..539): 0 -> buffer(): 7 (time P:0.94 | 0.07)

09:46:28  :  H   2 :  I  33 :  P  46 : snmp_counter:drops(.1.3..539): 0 -> buffer(): 8 (time P:1.09 | 0.07)

09:46:28  :  H   2 :  I  33 :  P  47 : db:bandwidthin(band..tes): 125000000 -> buffer(): 9 (time P:0.26 | 0.07)

09:46:28  :  H   2 :  I  33 :  P  48 : db:bandwidthout(band..tes): 125000000 -> buffer(): 10 (time P:0.04 | 0.06)

09:46:28  :  H   2 :  I  33 :  P  50 : cisco_snmp_ping_wait(): -1 -> no_backend(): 0 (time P:0.27 | 0.15)

09:46:28  :  H   2 :  I  33 :  P  55 : cisco_snmp_ping_get_pl:packetloss(): 0 -> buffer(): 11 (time P:0.23 | 0.1)

09:46:28  :  H   2 :  I  33 :  P  60 : cisco_snmp_ping_get_rtt:rtt(): 0 -> buffer(): 12 (time P:0.26 | 0.07)

09:46:28  :  H   2 :  I  33 :  P  65 : cisco_snmp_ping_end(): 1 -> no_backend(): 0 (time P:1.95 | 0.04)

09:46:28  :  H   2 :  I  33 :  P  80 : no_poller(): 0 -> rrd(*): input:1037947777 - output:3299246156 - inputerrors:0 - outputerrors:12921 - rtt:0 - packetloss:0 - inpackets:45291819 - outpackets:27288241 - drops:0 - bandwidthin:125000000 - bandwidthout:125000000 (time P:0.18 | 24.32)

09:46:28  :  H   2 :  I  33 :  P LPD : last_poll_date(): 1245138388 -> db(last_poll_date): 1 (time P:0.3 | 2.56)

09:46:28  :  H   2 : Poller End, Total Time: 91.01 msec.

 

I have tried with many other machines (Windows XP, Windows 2003…) and I get no traffic graphs for anyone. All the other graphs (Utilization, RTT, Packets…) are running well.

Any idea? Could be because of the Debian version (Lenny)?

 

Thanks!!

Felipe

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
jffnms-users mailing list
jffnms-users@...
https://lists.sourceforge.net/lists/listinfo/jffnms-users

Gmane