Ali ISIK | 2 Jan 2006 11:04
Picon
Gravatar

Feature Request: Correlation Function.

Hi, All,

Here is a typical situation that comes up during
the use of SmokePing:

(1) There is a problem somewhere between
    Subnet-S and Subnet-T.

(2) SmokePing is set up on a server named S0
    in Subnet-S.

(3) There is an important server T0 (target) in Subnet-T.
    SmokePing is probing that target and showing
problems, but we can't be sure whether the problem
is with the network or with the target server.

(4) To resolve this ambiguity, the admin sets up
    a clean, dependable linux box inside
Subnet-T, calls it T1 and adds T1 to the SmokePing
target list.  Now S0 is probing T0 as well as T1.

(5) The admin then wants to know how correlated the
    data for T1 and T1 are.  If there is a high correlation,
he figures, the problem must be due to something
common to both -- most likely, the network.

(6) The admin can visually inspect the graphs and
    reach a judgement, but a statistically correct
number would really help.  He figures he needs
something like a moving correlation indicator.  He
(Continue reading)

Edinilson J. Santos | 2 Jan 2006 19:33
Picon

How to interpret tSmoke values?

Using smokeping.cgi I can see in a weekly report something like:

Median Ping RTT (160.4 ms avg)
Packet Loss: 67.74%  average   100.00%  maximum    11.03% current
Probe: 10 ICMP Pings (7300 bytes) Every 1800 seconds

BUT with tSmoke the same host returns:
Uptime: 6.19%

I don't know exactly how to interpret this value (6.19%).

Thanks

Edinilson
---------------------------------------------------------
ATINET-Professional Web Hosting
Tel Voz: (0xx11) 4412-0876
http://www.atinet.com.br

Dan McGinn-Combs | 3 Jan 2006 15:51

Re: How to interpret tSmoke values?

Good question.
tSmoke was designed to be a management summary reporting tool. For that
reason, the numbers are averages of RRD values showing downtime (i.e.
having lost more than 10% of the PINGS sent out).

The RRD command which extracts this data is
RRDs::graph "fake.png",
	"--start","-604800",
	"--end","-600",
	"--step","4320",
	"DEF:loss=$target:loss:AVERAGE",
	"CDEF:avail=loss,UN,0,loss,IF,$pings,GE,0,100,IF",
	"PRINT:avail:AVERAGE:%.2lf";

It doesn't really create a PNG file, just reports the PRINTed
information which is the average loss. That number is used by CDEF to
create another number (avail) which is a percentage of how many
timeslots had downtime (defined as above).

So if you've changed any of the Smokeping defaults, you will see a
difference. In addition, you might also have a bad circuit which is
lossy causing there to be 10% failures in more slots... causing the
average availability to look very bad.

I hope this helps.

Dan
-----Original Message-----

Using smokeping.cgi I can see in a weekly report something like:
(Continue reading)

Edinilson J. Santos | 3 Jan 2006 16:59
Picon

Re: How to interpret tSmoke values?

Mr. Dan, thanks for your explanation.
Is it possible to change tSmoke and make it returns only online values 
instead of loss values?
What exactly to change in the code below (inside tSmoke)?
RRDs::graph "fake.png",
"--start","-604800",
"--end","-600",
"--step","4320",
"DEF:loss=$target:loss:AVERAGE",
"CDEF:avail=loss,UN,0,loss,IF,$pings,GE,0,100,IF",
"PRINT:avail:AVERAGE:%.2lf";

Thanks for your help

Regards

Edinilson
---------------------------------------------------------
ATINET-Professional Web Hosting
Tel Voz: (0xx11) 4412-0876
http://www.atinet.com.br

----- Original Message ----- 
From: "Dan McGinn-Combs" <Dan.McGinn-Combs <at> Geac.com>
To: <smokeping-users <at> list.ee.ethz.ch>
Cc: "Edinilson J. Santos" <edinilson <at> atinet.com.br>
Sent: Tuesday, January 03, 2006 12:51 PM
Subject: RE: [smokeping-users] How to interpret tSmoke values?

Good question.
(Continue reading)

Dan McGinn-Combs | 3 Jan 2006 22:58

Re: How to interpret tSmoke values?

Smokeping stores only the number of lost pings by default. If everything
goes well, the number in "loss" is 0. The CDEF converts this negative
view of the world into a positive one (AVAIL%).

UPTIME is a different measurement altogether. This is how long a
dynamically assigned address has kept that specific address. :(
Dan

-----Original Message-----
From: Edinilson J. Santos [mailto:edinilson <at> atinet.com.br] 
Sent: Tuesday, January 03, 2006 11:00 AM
To: Dan McGinn-Combs; smokeping-users <at> list.ee.ethz.ch
Subject: Re: [smokeping-users] How to interpret tSmoke values?

Mr. Dan, thanks for your explanation.
Is it possible to change tSmoke and make it returns only online values 
instead of loss values?
What exactly to change in the code below (inside tSmoke)?
RRDs::graph "fake.png",
"--start","-604800",
"--end","-600",
"--step","4320",
"DEF:loss=$target:loss:AVERAGE",
"CDEF:avail=loss,UN,0,loss,IF,$pings,GE,0,100,IF",
"PRINT:avail:AVERAGE:%.2lf";

Thanks for your help

Regards

(Continue reading)

Sami Chouaib | 5 Jan 2006 10:25
Picon

business hours

hi list, I recentely used smokeping to monitor our network latency, it works
very fine, I also used tSmoke to send e-mail reports.
Now I want to use tSmoke to generate availability reports, but only
considering working hours (from 8:00 to 18:00).
any suggestions ?
thanks in advance.

sami

Dan McGinn-Combs | 5 Jan 2006 19:21

Re: business hours

Using tSmoke to generate a report that targets specific hours daily
would take a good bit of work with CDEF's, but it is theoretically
possible. I also assume that you are concerned only with M-F (work week)
as well.

Rather than using this (rather oversimplified) method of selecting the
total time in question (here a WEEK's worth):
		my ($WAverage,$Wxsize,$Wysize) = RRDs::graph "fake.png",
                  "--start","-604800",
                  "--end","-600",
                  "--step","4320",
                  "DEF:loss=$target:loss:AVERAGE",
                  "CDEF:avail=loss,UN,0,loss,IF,$pings,GE,0,100,IF",
                  "PRINT:avail:AVERAGE:%.2lf";
                $ERR=RRDs::error;
		die "ERROR while reading $_: $ERR\n" if $ERR;

You would have to use some Fancy Perl(c) to determine which minutes
between --start and --end you wanted to use for your averages. For the
daily segment, you'd have a single data point. For weekly segment, you'd
have five data points (assuming a five day work week). For a month -
you'd have something like 20 data points - however, these would be
partially in one of possibly three months. And then for the quarterly
segment, you'd have 60 data points to fool with. I'm not sure what the
limits are to either RRDtool or the Perl bindings.

When I first started writing tSmoke, I looked at SLAMON (now abandoned,
but begun by Frank Harper) as a potential way of getting the same
results. His code goes into a directory full of RRD Files and extracts
just the kind of data you want, 8 - 18, business days and excluding
(Continue reading)

Scott Moseman | 6 Jan 2006 18:52
Picon

Alerts cause Smokeping to stop working

Using an older version of Smokeping...

# ../bin/smokeping -v
$Id: Smokeping.pm,v 1.5 2004/10/21 21:10:51 oetiker Exp $

I created this rule...

+6pct3hrs
type = loss
pattern = >6%,*36*,>6%
comment = 6+% loss for 3 hours

For all devices, so I apply at the top...

menu = Top
alerts = 6pct3hrs

Everything is fine and dandy until 3 hours comes around.  At that
point I get my alarms for the various devices that have been down
(good) -- but Smokeping also stops collecting data for all of the
other devices now!  All graphs come to a halt.  The Smokeping daemon
is still running but it's not gathering any data.  I can stop/start
the daemon and this situation will happen again.

Am I doing something wrong?  Is this a bug fixed in a later version of
Smokeping?  I did a scan through the CHANGES for the newer versions
and could not find anything that seemed like it was a bug fix for a
related problem.  Maybe I'm missing something?

Thanks,
(Continue reading)

Marcin Kmetko | 7 Jan 2006 14:22
Picon

tSmoke : Of 13 Hosts, 13 Down

Hello !

tSmoke --morning sends bad status (every host down).

after
# tSmoke --q --to=my <at> email --morning

i receive mail :
Of 13 Hosts, 13 Down

But all host are UP, and everything is ok.
weekly status works fine, graphs are ok.

any suggestions ?

regards

Kmet

--

-- 
+---------------------+
| Biuro Informatyki   | Marcin Kmetko
|  Grupa Concordia    | marcin.kmetko <at> grupaconcordia.pl
| tel. (61) 8584800   | tel.(61) 8584815
+---------------------+

Dan McGinn-Combs | 8 Jan 2006 03:29

Re: tSmoke : Of 13 Hosts, 13 Down

First, login to your system and make sure tSmoke is looking at the
proper RRD files.

# tSmoke --checkrrds

This should list out all the RRD files in the configuration file.

Please note that tSmoke reports status at a point-in-time. It checks the
current value and if it has lost 10% of the PINGS (defaults to 2) for a
given host it will report that host down. So check to make sure your
Smokeping reports aren't showing flakey links.

You can also send yourself a fairly detailed report of what tSmoke is
seeing for each host by using the "detail" feature in combination with
the "weekly" report.

# tSmoke --weekly --quiet --detail=2 --to=me <at> my.address

Dan

-----Original Message-----
From: smokeping-users-bounce <at> list.ee.ethz.ch
[mailto:smokeping-users-bounce <at> list.ee.ethz.ch] On Behalf Of Marcin
Kmetko
Sent: Saturday, January 07, 2006 8:22 AM
To: smokeping-users <at> list.ee.ethz.ch
Subject: [smokeping-users] tSmoke : Of 13 Hosts, 13 Down

Hello !

(Continue reading)


Gmane