Carter Bullard | 1 Jul 2009 01:49

Re: segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Hey Peter,
Something is writing over something, just can't seem to find a handle.
The ArgusLoadList() is passing ArgusListRecords from the Modeler to
the Output processor, and it just takes the two link lists and combines
them.  If there is nothing in the receive list, its just a "move the  
pointers"
and there you go.   The receive list should be empty if the output
processor is keeping ahead of the load.

My guess is that we're getting the length of an output record wrong,
which can happen if you're sloppy forming a DSR that you rarely use,
so it could be a packet specific bug still, or we are using a buffer  
that
has been deallocated/reallocated and  we're stomping on the new
users buffer.

This can happen in threaded applications, so turning off the .threads
tag may be a good test.

Hey Gunnar, any chance you can use valgrind() to see if we're doing
something wrong with memory?

Carter

On Jun 30, 2009, at 6:33 PM, Peter Van Epp wrote:

> On Mon, Jun 29, 2009 at 09:19:57AM +0200, Gunnar Lindberg wrote:
>> If you had asked me a week ago everything whould have been just fine.
>> No crash sine Jun 1. Our students left at the end of May which
>> probably changed traffic pattern quite considerably.
(Continue reading)

Gunnar Lindberg | 1 Jul 2009 15:59
Picon
Picon

Re: segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

The idea that this may be strings is interesting, so just like
Carter I took my old ASCII chart - but it didn't say me much.
And, next crash that occured yesterday afternoon made me go for
the 8-bit 8859 chart - i.e. I think its more random data.

{start = 0x2029706f742e656c, end = 0x702e73696874202b,
{start = 0xb9d2bcfac6fa3f4c, end = 0x3d078cfe490497a0, 

>Hey Gunnar, any chance you can use valgrind() to see if we're doing
>something wrong with memory?

# ll -o .devel .threads
/bin/ls: .threads: No such file or directory
-rw-rw-r--  1 root 0 May 15 07:13 .devel

I assume you mean something like
  /usr/bin/valgrind [-???] /usr/local/sbin/argus >& /var/log/xxx.log &
and than we see what's in xxx.log after a crash. Right? We have some
kind of cron based watchdog and I guess we leave that as is, so that
we get xxx.log once and are back in ordinary business after.

Mixed news, good and bad... :-).

1) Sweden has longer holidays than the US and I'm looking forward
   to my 5 weeks, starting on Mon Jun 6; back Mon Aug 10. And, for
   once I'm going to stay away completely, not even email :-), so
   I'll resist the tempting "let's just set it up, only once...".

2) Since I'm not at all familiar with valgrind I would appreciate
   some advice on "-???".
(Continue reading)

Carter Bullard | 2 Jul 2009 22:13

New Argus web-site

Gentle people,
I've launched the new argus web site today.  New face with new pages.
Much of it is under construction, but I'm hoping that this is a better  
platform
to add things like "screenshots", "projects using argus" etc.....

I have a Publications page that has books, scholarly pubs that reference
argus, along with web reports and presentations.

I also have icons and links to argus sponsors, vendors and friends.
Please take a look, and send lots of comments, as I'd like this to be a
good site for argus.

If you connect and get an old looking page, be sure and hit your refresh
button to update your local web cache.

I'll send an annoucenment to argus-announce tomorrow if all is well.

Thanks for all the support, and have a great American Holiday weekend
(for those Americans on the list) and a great summer weekend for those
in the Northern hemisphere, and I hope its not too cold for you guys in
the Southern hemisphere.

Carter

Attachment (smime.p7s): application/pkcs7-signature, 3815 bytes
Carter Bullard | 6 Jul 2009 16:12

Re: segfault at 000000000311c000 rip 000000000040fb46rsp 0000007fbffff830 error 4

Hey Gunnar,
Another strategy to try when you return, if at all possible, is to  
find another
platform, or a 32-bit machine, and look to see if it dies at the same  
time,
or in the same manner.  At least that would help to eliminate the
"it's the 64-bit" thats the problem.

I am going to go through the code this week to see if I can find  
anything
related to poor memory management.  I use valgrind() this way:

    valgrind /usr/local/sbin/argus -d

(my argi run as daemons, and using the "-d" toggles that setting to  
off for me).
Any memory complaints will come out pretty quickly.  If we are doing  
something
wrong with memory, we'll do it many many times before it tickles a  
part of memory
that kills argus.

The .devel tag in the root directory is important, as valgrind will  
tell us what
line numbers are involved if compiled with the options the ".devel" tag
generates.

Carter

On Jul 1, 2009, at 9:59 AM, Gunnar Lindberg wrote:
(Continue reading)

Rodney McKee | 7 Jul 2009 00:05
Gravatar

ragraph with filter

Hello,

Just trying to generate some graphs and have got an issue with adding a filter to the command. I can run the command below without the filter and all works fine but when I add the filter I get the following message.


ragraph sbytes dbytes -M 5m -r  /var/log/argus/archive/mel/2009/07/bins/fw* -title "titile goes here" -w /var/log/argus/reports/mel-2009-07" -debug - host 203.89.202.163
DEBUG: RagraphProcessArgusData: exec: /usr/bin/rabins -M hard zero -p6 -GL0 -s ltime  sbytes dbytes -M 5m -r /var/log/argus/archive/mel/2009/07/bins/fw1-01-1d /var/log/argus/archive/mel/2009/07/bins/fw1-01-5m /var/log/argus/archive/mel/2009/07/bins/fw1-02-1d /var/log/argus/archive/mel/2009/07/bins/fw1-02-5m /var/log/argus/archive/mel/2009/07/bins/fw1-03-1d /var/log/argus/archive/mel/2009/07/bins/fw1-03-5m /var/log/argus/archive/mel/2009/07/bins/fw1-04-1d /var/log/argus/archive/mel/2009/07/bins/fw1-04-5m /var/log/argus/archive/mel/2009/07/bins/fw2-01-1d /var/log/argus/archive/mel/2009/07/bins/fw2-01-5m /var/log/argus/archive/mel/2009/07/bins/fw2-02-1d /var/log/argus/archive/mel/2009/07/bins/fw2-02-5m /var/log/argus/archive/mel/2009/07/bins/fw2-03-1d /var/log/argus/archive/mel/2009/07/bins/fw2-03-5m /var/log/argus/archive/mel/2009/07/bins/fw2-04-1d /var/log/argus/archive/mel/2009/07/bins/fw2-04-5m - host 203.89.202.163 > /tmp/filefFl5b9
sh: line 1: 14287 Segmentation fault      /usr/bin/rabins -M hard zero -p6 -GL0 -s ltime sbytes dbytes -M 5m -r /var/log/argus/archive/mel/2009/07/bins/fw1-01-1d /var/log/argus/archive/mel/2009/07/bins/fw1-01-5m /var/log/argus/archive/mel/2009/07/bins/fw1-02-1d /var/log/argus/archive/mel/2009/07/bins/fw1-02-5m /var/log/argus/archive/mel/2009/07/bins/fw1-03-1d /var/log/argus/archive/mel/2009/07/bins/fw1-03-5m /var/log/argus/archive/mel/2009/07/bins/fw1-04-1d /var/log/argus/archive/mel/2009/07/bins/fw1-04-5m /var/log/argus/archive/mel/2009/07/bins/fw2-01-1d /var/log/argus/archive/mel/2009/07/bins/fw2-01-5m /var/log/argus/archive/mel/2009/07/bins/fw2-02-1d /var/log/argus/archive/mel/2009/07/bins/fw2-02-5m /var/log/argus/archive/mel/2009/07/bins/fw2-03-1d /var/log/argus/archive/mel/2009/07/bins/fw2-03-5m /var/log/argus/archive/mel/2009/07/bins/fw2-04-1d /var/log/argus/archive/mel/2009/07/bins/fw2-04-5m - host 203.89.202.163 > /tmp/filefFl5b9
DEBUG: RagraphReadInitialValues(/tmp/filefFl5b9)
DEBUG: RagraphReadInitialValue(/tmp/filefFl5b9): start  stop  last  seconds  step  Columns  bins
DEBUG: RagraphGenerateRRDParameters(/tmp/filefFl5b9)
usage: /usr/bin/ragraph metric (srcid | proto [daddr] | dport) [-title "title"] [ra-options]
DEBUG: RagraphGenerateRRD: split 0 lower 0
DEBUG: RagraphGenerateRRD(/tmp/filefFl5b9)
DEBUG: RagraphGenerateRRD: RRDs::create /tmp/filefFl5b9.rrd, --start  --step  RRA:AVERAGE:0.5:1:
DEBUG: RagraphGenerateRRD: /tmp/filefFl5b9.rrd: start time: unparsable time:
/usr/bin/ragraph: unable to create `/tmp/filefFl5b9.rrd': start time: unparsable time:



Rodney McKee | 7 Jul 2009 00:06
Gravatar

Re: ragraph with filter

sorted, sorry for the spam
Carter Bullard | 7 Jul 2009 16:02

Re: Bug, TCP direction on unidirectional flows

Hey Nick,
Sorry to bug you again, but if you have that set of argus data, I'd love to check out the bug!?!?!

Carter

On Jun 19, 2009, at 1:50 PM, Nick Diel wrote:

I noticed an interesting bug today with Argus.  With unidirectional flows where only the server side is visible (syn-ack side), Argus incorrectly swaps the src and dst addresses.

Here is an example
 tcpdump -r interesting.pcap -nn
reading from file interesting.pcap, link-type EN10MB (Ethernet)
21:01:55.758204 IP X.X.X.X.25 > Y.Y.Y.Y.4442: S 3557037574:3557037574(0) ack 1284350011 win 0
21:01:55.786742 IP X.X.X.X.25 > Y.Y.Y.Y.4442: . ack 1 win 2920
21:01:55.793184 IP X.X.X.X.25 > Y.Y.Y.Y.4442: P 1:37(36) ack 1 win 2920
....
21:02:04.441692 IP X.X.X.X.25 > Y.Y.Y.Y.4442: F 537:537(0) ack 1257 win 49100
21:02:04.904895 IP X.X.X.X.25 > Y.Y.Y.Y.4442: . ack 1258 win 49100
21:05:05.260483 IP X.X.X.X.25 > Y.Y.Y.Y.1282: S 4103843404:4103843404(0) ack 1358349119 win 1460 <mss 1460,nop,nop,sackOK>
21:05:05.294729 IP X.X.X.X.25 > Y.Y.Y.Y.1282: P 1:37(36) ack 1 win 2920
...
21:05:08.777255 IP X.X.X.X.25 > Y.Y.Y.Y.1282: . ack 1075 win 49640

argus -r interesting.pcap -w - | ra -r - -z
   21:01:55.758204  e         tcp      X.X.X.X smtp      ->      Y.Y.Y.Y 4442         11       1166   SEf
   21:05:05.260483  e         tcp      X.X.X.X smtp      ->      Y.Y.Y.Y 1282         10       1024   SEf


ra -?
Ra Version 3.0.2.beta.8

argus -?
Argus Version 3.0.1.beta.3


Nick



Attachment (smime.p7s): application/pkcs7-signature, 3815 bytes
Phillip Deneault | 8 Jul 2009 22:24
Favicon

possible radium issue

I'm running the beta.8 code.  I have a single radium instance collecting 
data from dozens of locations and 3 rasplit processes connecting to that 
radium process, one for 10 minute slices, 1 for hourlies, and 1 for 
dailies.

It *seems* as if the data I'm recording is lower than what I should 
have.  I say this because I get drastically different counts when I 
check locally recorded data vs. radium recorded data.

Please yell at me if I am doing this wrong, I performed the racluster in 
an attempt to normalize the flow counts a little.

Locally recorded data tallies like this.(logs rotated daily, so I picked 
a convenient hour).

# racluster -t 14 -M norep -r /var/log/argus/argus.out -w - | racount -r -
racount   records     total_pkts     src_pkts       dst_pkts 
total_bytes        src_bytes          dst_bytes
     sum   52385       134978         134813         165 
9982211            9970637            11574

However, when I run a tally on the hourlies and the slices collected by 
radium, I get two different flow counts, neither of which come anywhere 
close.

(SLICES)
# racluster -M norep -r argus-07.08.2009-14.50.00.out 
argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out 
argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out 
argus-07.08.2009-14.00.00.out -w - | racount -r -
racount   records     total_pkts     src_pkts       dst_pkts 
total_bytes        src_bytes          dst_bytes
     sum   631         1920           1397           523 
507980             210286             297694

(HOURLIES)
# racluster -M norep -r argus-07.08.2009-14.00.00.out -w - | racount -r -
racount   records     total_pkts     src_pkts       dst_pkts 
total_bytes        src_bytes          dst_bytes
     sum   252         447            348            99 
95012              57022              37990

Is this a bug, or me doing something wrong?

Thanks,
Phil

Carter Bullard | 9 Jul 2009 05:29

Re: possible radium issue

No, this doesn't seem right at all.   A couple of suggestions.
Don't use the "-M norep" for this type of aggregation (basically it
just throws away the AGR dsr as the records are being written
out, and in some apps this is great, but not necessary  here).

How are your rasplit()s called, I suspect there may be an issue with  
that.

In most cases, you don't need the hourly and daily rasplit()  
processes, because
you can generate both of these from your 10 min split files.  All  
depends on
whether you want the hourly and daily files updated continuously, or  
if you
can get away with updating them, say every 10 minutes.

It looks like racluster() is faulting reading one of the files.   When  
it does that,
the pipe closes down, and your racount() reports just the records it  
receives.
Just need to find the bad file, and then try to figure out how it got  
corrupted
(at least that is my guess).

what are the totals for each of the individual files in your  
example(s) without
the clustering?

Carter

On Jul 8, 2009, at 4:24 PM, Phillip Deneault wrote:

> I'm running the beta.8 code.  I have a single radium instance  
> collecting data from dozens of locations and 3 rasplit processes  
> connecting to that radium process, one for 10 minute slices, 1 for  
> hourlies, and 1 for dailies.
>
> It *seems* as if the data I'm recording is lower than what I should  
> have.  I say this because I get drastically different counts when I  
> check locally recorded data vs. radium recorded data.
>
> Please yell at me if I am doing this wrong, I performed the  
> racluster in an attempt to normalize the flow counts a little.
>
> Locally recorded data tallies like this.(logs rotated daily, so I  
> picked a convenient hour).
>
> # racluster -t 14 -M norep -r /var/log/argus/argus.out -w - |  
> racount -r -
> racount   records     total_pkts     src_pkts       dst_pkts  
> total_bytes        src_bytes          dst_bytes
>    sum   52385       134978         134813         165  
> 9982211            9970637            11574
>
> However, when I run a tally on the hourlies and the slices collected  
> by radium, I get two different flow counts, neither of which come  
> anywhere close.
>
> (SLICES)
> # racluster -M norep -r argus-07.08.2009-14.50.00.out  
> argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out  
> argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out  
> argus-07.08.2009-14.00.00.out -w - | racount -r -
> racount   records     total_pkts     src_pkts       dst_pkts  
> total_bytes        src_bytes          dst_bytes
>    sum   631         1920           1397           523  
> 507980             210286             297694
>
> (HOURLIES)
> # racluster -M norep -r argus-07.08.2009-14.00.00.out -w - | racount  
> -r -
> racount   records     total_pkts     src_pkts       dst_pkts  
> total_bytes        src_bytes          dst_bytes
>    sum   252         447            348            99  
> 95012              57022              37990
>
> Is this a bug, or me doing something wrong?
>
> Thanks,
> Phil
>

Attachment (smime.p7s): application/pkcs7-signature, 3815 bytes
Phillip Deneault | 9 Jul 2009 16:17
Favicon

Re: possible radium issue

Carter Bullard wrote:
> No, this doesn't seem right at all.   A couple of suggestions.
> Don't use the "-M norep" for this type of aggregation (basically it
> just throws away the AGR dsr as the records are being written
> out, and in some apps this is great, but not necessary  here).

Ok, I'll remove it as this issue presses forward.

> How are your rasplit()s called, I suspect there may be an issue with that.

/usr/bin/rasplit -d -S <radiumhost> -M time 10m -w 
/data/argus/slices/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
/usr/bin/rasplit -d -S <radiumhost> -M time 1h -w 
/data/argus/hourlies/\$srcid/argus-%m.%d.%Y-%H.%M.%S.out
/usr/bin/rasplit -d -S <radiumhost> -M time 1d -w 
/data/argus/dailies/\$srcid/argus-%m.%d.%Y.out

> In most cases, you don't need the hourly and daily rasplit() processes, 
> because
> you can generate both of these from your 10 min split files.  All 
> depends on
> whether you want the hourly and daily files updated continuously, or if you
> can get away with updating them, say every 10 minutes.

This is true.  Except I don't trust either of them right now. :-)  If 
the issue is that there is some strange blocking process which would 
require me to run only one rasplit, I would do it and hack up everything 
I'm doing on the programming end.  But neither CPU, memory, nor IO 
suggests this to be the case.

> It looks like racluster() is faulting reading one of the files.   When 
> it does that,
> the pipe closes down, and your racount() reports just the records it 
> receives.
> Just need to find the bad file, and then try to figure out how it got 
> corrupted
> (at least that is my guess).
> 
> what are the totals for each of the individual files in your example(s) 
> without
> the clustering?

Here's a breakdown using a few methods.  Things mostly make sense except 
the numbers seem so low by comparison to the locally recorded copies, 
and the difference from the collection of daily files with the hourlies.
The 'Clustering' field is with the '-M norep' off piped to an racount. 
The WithoutClustering field is just the straight racount.  I ran the 
files individually and totaled them, then in a single command line, and 
I also ran the hourly file the same way for comparison.

I got no errors when running any of these commands on any of these files.

File		Clustering	WithoutClustering
00		37		38
10		85		85
20		138		148
30		43		44
40		104		110
50		254		265
-----------------------------------------
SumTotal	661		690

Total with
all 6 files
in the '-r'	631		690

Hourly file	252		263

Can anyone else replicate this behavior?

Thanks,
Phil

> Carter
> 
> 
> 
> On Jul 8, 2009, at 4:24 PM, Phillip Deneault wrote:
> 
>> I'm running the beta.8 code.  I have a single radium instance 
>> collecting data from dozens of locations and 3 rasplit processes 
>> connecting to that radium process, one for 10 minute slices, 1 for 
>> hourlies, and 1 for dailies.
>>
>> It *seems* as if the data I'm recording is lower than what I should 
>> have.  I say this because I get drastically different counts when I 
>> check locally recorded data vs. radium recorded data.
>>
>> Please yell at me if I am doing this wrong, I performed the racluster 
>> in an attempt to normalize the flow counts a little.
>>
>> Locally recorded data tallies like this.(logs rotated daily, so I 
>> picked a convenient hour).
>>
>> # racluster -t 14 -M norep -r /var/log/argus/argus.out -w - | racount 
>> -r -
>> racount   records     total_pkts     src_pkts       dst_pkts 
>> total_bytes        src_bytes          dst_bytes
>>    sum   52385       134978         134813         165 
>> 9982211            9970637            11574
>>
>> However, when I run a tally on the hourlies and the slices collected 
>> by radium, I get two different flow counts, neither of which come 
>> anywhere close.
>>
>> (SLICES)
>> # racluster -M norep -r argus-07.08.2009-14.50.00.out 
>> argus-07.08.2009-14.40.00.out argus-07.08.2009-14.30.00.out 
>> argus-07.08.2009-14.20.00.out argus-07.08.2009-14.10.00.out 
>> argus-07.08.2009-14.00.00.out -w - | racount -r -
>> racount   records     total_pkts     src_pkts       dst_pkts 
>> total_bytes        src_bytes          dst_bytes
>>    sum   631         1920           1397           523 
>> 507980             210286             297694
>>
>> (HOURLIES)
>> # racluster -M norep -r argus-07.08.2009-14.00.00.out -w - | racount -r -
>> racount   records     total_pkts     src_pkts       dst_pkts 
>> total_bytes        src_bytes          dst_bytes
>>    sum   252         447            348            99 
>> 95012              57022              37990
>>
>> Is this a bug, or me doing something wrong?
>>
>> Thanks,
>> Phil
>>
> 
> 
> 
> 


Gmane