real.melancon | 4 Jul 03:29 2007
Picon

problem experienced with ra client: unequal results

Hello List,

I use latest argus daemon as well as latest ra* clients.

We collect data using argus daemon using:

/argus -d -S 60 -F /etc/argus/argus.conf -w /var/log/argus/argus.out -i eth1

Then rotate argus.out every hour (using argusarchive) , which generates files in format:

/var/log/argus/archive/YYYY/MM/DD/argus.YYYY.MM.DD.hh.mm.ss.gz

This works well. For example to get Top Talkers & listeners, we use:

/usr/local/bin/racluster -m matrix -r /var/log/argus/argus.out -w - | /usr/local/bin/rasort -m bytes
-w - | /usr/local/bin/ra -nu

For specific days, we use (e.g. July 1st, between 15:00 and 7:00):

/usr/local/bin/racluster -t 01.15:00-17:00 -m matrix -r /var/log/argus/archive/2007/07/01/* -w - |
/usr/local/bin/rasort -m bytes -w - | /usr/local/bin/ra -nu

But. Here is the problem.... (sorry for the long introduction)

Sometimes, argus ra client just doesn't output any data. e.g.

/usr/local/bin/racluster -t 02.15:00-17:00 -m matrix -r /var/log/argus/archive/2007/07/02/* -w - |
/usr/local/bin/rasort -m bytes -w - | /usr/local/bin/ra -nu

same syntax as before but for a different day. data file size is about same size, but ra doesn't output
(Continue reading)

Peter Van Epp | 4 Jul 04:45 2007
Picon
Picon

Re: problem experienced with ra client: unequal results

On Wed, Jul 04, 2007 at 01:29:34AM +0000, real.melancon <at> videotron.ca wrote:
> Hello List,
> 
> I use latest argus daemon as well as latest ra* clients.
> 
<snip>

	There look to be some bugs at the moment (and its not clear if they
are in argus or clients or both). There are some memory corruption problems
in argus (the latest code on the server has made them much better, but there
still seem to be some). This is the debug output from an ra client reading
from a remote sensor (with the addition of a debug statement to get errno
on the fault which indicates 2 "no such file" presumably meaning the socket
has closed for some reason):

ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) read 1460 bytes
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) read 1460 bytes
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) read 1460 bytes
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) read 1460 bytes
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) read 0 bytes
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) read returned 0
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusReadStreamSocket (0x1804000) errno 2
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusCloseInput(0x1804000) closing
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusWriteConnection: write(4, 0x7e994, 6)
ra3[18741.a000ed88]: 07-07-03 19:01:14 ArgusWriteConnection(0x1804000, 0x7e994, 6) returning 6

	While this should mean that ra has exited, it doesn't ra still exists
it just isn't doing anything:

ps auxwwww | grep ra3
(Continue reading)

carter | 4 Jul 17:22 2007

Re: problem experienced with ra client: unequal results

Hey Réal,
Peter is right, we do have a little instability with the release candidates, and I've been extremely busy on
real work, so, ...., my fault at not getting the fixes out quickly!!

The zero length record is a persistent problem, and I still do not have enough data to fix it properly.  If you
can share platform, strategy (radium?) and 64-bit vs 32-bit info and of course a sample data file that
causes the error, that would go a long way to fixing the problem.

The reason you get no output, is a result of the pipes, where a failure along the set of pipes, causes the down
stream processes to terminate early.

I think rather than filter the errors, I need to fix the problem.  If at all possible, if you can share a file
that dies with the error, that would help!!!

Carter

Carter Bullard
QoSient LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax

-----Original Message-----
From: real.melancon <at> videotron.ca

Date: Wed, 04 Jul 2007 01:29:34 
To:argus-info <at> lists.andrew.cmu.edu
Subject: [ARGUS] problem experienced with ra client: unequal results

(Continue reading)

real.melancon | 5 Jul 14:50 2007
Picon

Re : Re: problem experienced with ra client: unequal results

Hi Carter.

I think I also see a problem with rasort (or racluster), but I am not sure. For example:

#> /usr/local/bin/racluster -m matrix -r
/var/log/argus/archive/2007/06/30/argus.2007.06.30.15.00.01.gz -N 2
	e           ip        x.5.72.x           ->        x.6.163.x               1        0           62            0   INT
	e           ip        x.5.72.x           ->         x.6.166.x               1        0           62            0   INT

but "piping" to rasort fails:

/usr/local/bin/racluster -m matrix -r
/var/log/argus/archive/2007/06/30/argus.2007.06.30.15.00.01.gz -w - | rasort -m bytes -N 2
rasort[12302]: 16:55:13.013960 ArgusGenerateRecord: time format incorrect:4

Obviously, I can see there is no time in front or record, but why ? Is there any workaround for this ? 

FYI. We are running Debian Linux ("etch") with 32bits kernel:

#> uname -a
Linux GVAARGUS1 2.6.18-4-686 #1 SMP Mon Mar 26 17:17:36 UTC 2007 i686 GNU/Linux 
It's running on "Intel(R) Core(TM)2 CPU" with 1 GIG. of RAM
Ethernet interfaces are:
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
e1000: eth1: e1000_probe: Intel(R) PRO/1000 Network Connection (this is sniffing interface)
Running at 100Mb FD.
eth0: negotiated 100baseTx-FD flow-control, link ok
eth1: negotiated 100baseTx-FD flow-control, link ok

As mentioned previously, data is collected simply using argusd daemon, and rotated hourly by
(Continue reading)

real.melancon | 5 Jul 15:32 2007
Picon

Re : Re: problem experienced with ra client: unequal results

Hi Carter. Additionnal information:

I think the problem is with specific records, or the *amount* of data to work on. But i am not sure.

For example. Today I tried this and got a segmentation fault:

/usr/local/bin/racluster -V -m saddr daddr proto sport dport -r /var/log/argus/archive/2007/06/30/argus.2007.06.30.00.00.01.gz
Segmentation fault

This is data collected between 00:00 and 00:59AM. File size is 23 megs., and if I try the whole day it
segsfault as well (350 MB.)

But if I run the next hour (from 01:00 to 01:59), it works:

/usr/local/bin/racluster -V -m saddr daddr proto sport dport -r /var/log/argus/archive/2007/06/30/argus.2007.06.30.01.00.01.gz

   23:59:30.657676  e          udp        10.6.104.17.netbio    ->          10.0.10.3.netbio     1290        0
      141900            0   INT
etc...

Is there any way I can upload the crashing file via FTP, so you can look at it ? SMTP won't allow me to send
anything over 20 megs uncompressed, and the file causing the seg. fault is 23 megs.

Thanks again,

Real.

____________________________
Réal Melançon

(Continue reading)

Peter Van Epp | 5 Jul 19:56 2007
Picon
Picon

argus instability

	A data point. Adding a debug statement to argus/Argus_client.c seems
to have made argus stable here. It hasn't died in the last 24 hours after 
this statement was added (I'm about to remove it again to see if it really
is having an effect :-)):

line 1223:

            if (((retn = write (asock->fd, asock->buf, cnt)) < cnt)) {
               if (retn < 0) {
                  if ((errno == EAGAIN) || (errno == EINTR) ||
                     ((errno == EPIPE) && !(asock->status & ARGUS_WAS_FUNCTIONAL)))
                     retn = 0;
                  else {
#ifdef ARGUSDEBUG
            ArgusDebug (2, "ArgusWriteSocket: errno %d\n", errno);
#endif
                     return (retn);
                  }
               }
            } else

#ifdef ARGUSDEBUG
            ArgusDebug (2, "ArgusWriteSocket: errno %d\n", errno);
#endif

was added above as well as a bunch of the debug levels set to 2 in this 
same routine. 

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada
(Continue reading)

Peter Van Epp | 5 Jul 23:41 2007
Picon
Picon

Re: problem experienced with ra client: unequal results

On Wed, Jul 04, 2007 at 03:22:32PM +0000, carter <at> qosient.com wrote:
> Hey R?al,
> Peter is right, we do have a little instability with the release candidates, and I've been extremely busy
on real work, so, ...., my fault at not getting the fixes out quickly!!
> 
> The zero length record is a persistent problem, and I still do not have enough data to fix it properly.  If you
can share platform, strategy (radium?) and 64-bit vs 32-bit info and of course a sample data file that
causes the error, that would go a long way to fixing the problem.
> 

	A data point on this one:

FreeBSD 6.2 on a 32 bit Intel:

07-07-05 14:07:42  e          tcp       142.58.51.17.49680     ->       65.54.171.26.1863          2        2          120          508   CON
07-07-05 14:07:42  e          udp       72.24.222.42.10605     ->      142.58.51.152.19879         1        0          314            0   INT
07-07-05 14:07:42  e          udp      68.150.54.174.1095      ->      206.12.16.222.137           1        0           92            0   INT
07-07-05 14:07:42  e d        tcp      192.75.241.56.3932      ->        207.46.1.81.80            4        3         1573          935   CON
07-07-05 14:07:42  e          tcp      142.58.74.225.1960      ->     208.111.129.38.80            8        8         1420         2656   FIN
07-07-05 14:07:42  e          tcp      142.58.51.152.19879     ->     219.142.130.91.2061          1        0           60            0   FIN
07-07-05 14:07:42  e         icmp      206.12.16.134           ->      58.245.75.154               1        0           98            0   ECO
07-07-05 14:07:42  e d        tcp        142.58.83.7.4755      ->    203.188.205.191.80            4        2          866          312   CON
07-07-05 14:07:42  e          udp     209.193.22.238.11821     ->       142.58.65.30.13792         1        0          104            0   INT
07-07-05 14:07:42  e          udp       67.70.110.88.13075     ->      142.58.67.230.8519          1        0          140            0   INT
07-07-05 14:07:42  e         icmp      206.12.16.134           ->     203.211.83.124               1        0           98            0   ECO
07-07-05 14:07:42  e d        tcp      142.58.211.84.33916     ->     216.200.62.206.80            5        5          796          467   FIN

	And a Mac OS 10.4 box 64 bit PPC reading from the same argus sensor
(which exhibits the 0 length problem):

(Continue reading)

CS Lee | 6 Jul 08:59 2007
Picon

argus 3 md5sum

Carter,

It seems that you haven't generated the md5 checksum for latest argus 3 source and causing md5 mismatch, and no md5 checksum for argus client rc 45 too. Just little note.

--
Best Regards,

CS Lee<geekooL[at]gmail.com>

carter | 6 Jul 15:26 2007

Re: argus instability

When a debug statement works, its usually a stack problem, but this looks like you're just slowing argus
down.  This helps a great deal!!!

Carter


Carter Bullard
QoSient LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax

-----Original Message-----
From: Peter Van Epp <vanepp <at> sfu.ca>

Date: Thu, 5 Jul 2007 10:56:24 
To:argus-info <at> lists.andrew.cmu.edu
Subject: [ARGUS] argus instability


	A data point. Adding a debug statement to argus/Argus_client.c seems
to have made argus stable here. It hasn't died in the last 24 hours after 
this statement was added (I'm about to remove it again to see if it really
is having an effect :-)):

line 1223:

            if (((retn = write (asock->fd, asock->buf, cnt)) < cnt)) {
               if (retn < 0) {
                  if ((errno == EAGAIN) || (errno == EINTR) ||
                     ((errno == EPIPE) && !(asock->status & ARGUS_WAS_FUNCTIONAL)))
                     retn = 0;
                  else {
#ifdef ARGUSDEBUG
            ArgusDebug (2, "ArgusWriteSocket: errno %d\n", errno);
#endif
                     return (retn);
                  }
               }
            } else


#ifdef ARGUSDEBUG
            ArgusDebug (2, "ArgusWriteSocket: errno %d\n", errno);
#endif

was added above as well as a bunch of the debug levels set to 2 in this 
same routine. 

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada


carter | 6 Jul 15:23 2007

Re: Re : Re: problem experienced with ra client: unequal results

Hey Réal,
Yes, put the file in the incoming directory at qosient.com (anonymous ftp).

Carter

Carter Bullard
QoSient LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax

-----Original Message-----
From: real.melancon <at> videotron.ca

Date: Thu, 05 Jul 2007 13:32:05 
To:carter <at> qosient.com
Cc:Argus <argus-info <at> lists.andrew.cmu.edu>
Subject: Re : Re: [ARGUS] problem experienced with ra client: unequal results


Hi Carter. Additionnal information:

I think the problem is with specific records, or the *amount* of data to work on. But i am not sure.

For example. Today I tried this and got a segmentation fault:

/usr/local/bin/racluster -V -m saddr daddr proto sport dport -r /var/log/argus/archive/2007/06/30/argus.2007.06.30.00.00.01.gz
Segmentation fault

This is data collected between 00:00 and 00:59AM. File size is 23 megs., and if I try the whole day it
segsfault as well (350 MB.)

But if I run the next hour (from 01:00 to 01:59), it works:

/usr/local/bin/racluster -V -m saddr daddr proto sport dport -r /var/log/argus/archive/2007/06/30/argus.2007.06.30.01.00.01.gz

   23:59:30.657676  e          udp        10.6.104.17.netbio    ->          10.0.10.3.netbio     1290        0
      141900            0   INT
etc...

Is there any way I can upload the crashing file via FTP, so you can look at it ? SMTP won't allow me to send
anything over 20 megs uncompressed, and the file causing the seg. fault is 23 megs.

Thanks again,

Real.


____________________________
Réal Melançon


Gmane