Wolfgang Barth | 1 Sep 12:08
Picon

radium fails (rc.50 from 2007-08-31)

Hi,

if I start radium, I get an output like

   007-09-01 11:38:35starteddium[7261]: ßø·ii

Radium stops immediately. Same effect with an without .threads.

Wolfgang
--

-- 
<wob (at) swobspace de> * http://www.swobspace.de

Peter Van Epp | 2 Sep 20:12
Picon
Picon
Favicon
Gravatar

Re: radium fails (rc.50 from 2007-08-31)

On Sat, Sep 01, 2007 at 12:08:34PM +0200, Wolfgang Barth wrote:
> Hi,
> 
> if I start radium, I get an output like
> 
>    007-09-01 11:38:35starteddium[7261]: ???ii
> 
> Radium stops immediately. Same effect with an without .threads.
> 
> Wolfgang
> -- 
> <wob (at) swobspace de> * http://www.swobspace.de

	While Carter would be the final authority, I think radium may not yet
be updated. We just found a bug (or perhaps bugs) in memory allocation which
was causing the argus memory footprint to grow (thats what the various "small"
releases are about) and Carter just merged the changes back in on Friday (and
I think is now taking a well deserved rest :-)). 
	This is a copy of the latest argus (without threads) that has been 
running since Friday (even though traffic is well down):

ps auxwwww | grep argus
root     27229  3.8  9.1 364868 361352 ?       SL   Aug31 102:25 argus -P 560 -i eth0 -i eth1 -U 512 -m -F /scratch/argus.conf
vanepp   31939  0.0  0.0   3132   832 pts/0    S+   11:01   0:00 grep argus

previous versions would probably have been pegged at 4 gigs (all the memory
the machine has) by now.
	Hmmm, I just checked the threads version that I butchered to avoid a 
deadlock in shutdown and it is using an amazingly small amount of memory
(although on a different link so this may be a traffic type issue, I guess I 
(Continue reading)

Peter Van Epp | 2 Sep 20:24
Picon
Picon
Favicon
Gravatar

Re: radium fails (rc.50 from 2007-08-31)

> 	Hmmm, I just checked the threads version that I butchered to avoid a 
> deadlock in shutdown and it is using an amazingly small amount of memory
> (although on a different link so this may be a traffic type issue, I guess I 
> should start a non thread version and see what happens :-)):
> 
> ps auxwwwww | grep argus
> root     25148 12.2  0.4  19816 16112 ?        SL   Sep01 107:00 argus -P 560 -i eth2 -i eth3 -U 512 -m -D2 -F /spare/argus.conf
> vanepp   26736  0.0  0.0   3132   832 pts/1    S+   11:03   0:00 grep argus
> 
> perhaps after poking a bit more at the threads version I'll move it to the
> Intenet link (where it is it won't see most of the port scans that sweep
> through which may be the difference). 
> 
> Peter Van Epp / Operations and Technical Support 
> Simon Fraser University, Burnaby, B.C. Canada

	And just discovered that threads weren't in fact enabled so memory 
usage is oddly low although it seems to have been giving output. Enabling
threads just now is doing more what I'd expect (we will see if that includes
being able to shutdown without deadlocking :-)):

 sniffer1:/spare # ps auxwwww | grep argus
root     29352 14.5  0.5  90268 20504 pts/1    SLl  11:20   0:22 argus -P 560 -i eth2 -i eth3 -U 512 -m -D2 -F /spare/argus.conf
root     29361  0.0  0.0   3132   832 pts/1    S+   11:23   0:00 grep argus

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada

Carter Bullard | 3 Sep 17:17

Re: radium fails (rc.50 from 2007-08-31)

Sorry for the delayed response.  Peter is correct on the radium issues.  Radium shares a lot of code from
argus, so we're trying to figure out the memory issues, and I'll merge them into radium() when we're happy
with it!!  Still a holiday here, but I'll start the radium work tonight, at least that is the plan.

So, it seems that we're pretty good, except for the "-R" option?  Have we got a 3.0.0 that has been running for days?

Hope everyone (US in particular) are having a great [ holi ] day !!!

Carter

Carter Bullard
QoSient LLC
150 E. 57th Street Suite 12D
New York, New York 10022
+1 212 588-9133 Phone
+1 212 588-9134 Fax

-----Original Message-----
From: Peter Van Epp <vanepp <at> sfu.ca>

Date: Sun, 2 Sep 2007 11:12:00 
To:argus-info <at> lists.andrew.cmu.edu
Subject: Re: [ARGUS] radium fails (rc.50 from 2007-08-31)


On Sat, Sep 01, 2007 at 12:08:34PM +0200, Wolfgang Barth wrote:
> Hi,
> 
> if I start radium, I get an output like
> 
(Continue reading)

Peter Van Epp | 3 Sep 18:40
Picon
Picon
Favicon
Gravatar

Re: radium fails (rc.50 from 2007-08-31)

On Mon, Sep 03, 2007 at 03:17:18PM +0000, Carter Bullard wrote:
> Sorry for the delayed response.  Peter is correct on the radium issues.  Radium shares a lot of code from
argus, so we're trying to figure out the memory issues, and I'll merge them into radium() when we're happy
with it!!  Still a holiday here, but I'll start the radium work tonight, at least that is the plan.
> 
> So, it seems that we're pretty good, except for the "-R" option?  Have we got a 3.0.0 that has been running for days?
> 
> Hope everyone (US in particular) are having a great [ holi ] day !!!
> 
> Carter
> 
> Carter Bullard
> QoSient LLC
> 150 E. 57th Street Suite 12D
> New York, New York 10022
> +1 212 588-9133 Phone
> +1 212 588-9134 Fax
> 

	Well no .threads, -R or -J has been working for me for 3 days now :-).
As I recall -J and -R both eventually grew, and .threads has at least the 
deadlock on a -HUP and I think other problems relating to not logging things
since I don't see the statistics messages or the debug message saying it saw
the -HUP (which no .threads does) even with the antideadlock hack in place.
Putting a syslog message in the antidead lock code where it wil bail if it
can't get a lock doesn't send anything to syslog either which makes me wonder
if the threads are losing the ability to write to syslog and/or the debug file.

ps auxwwww | grep argus
root     27229  3.9  9.1 364868 361352 ?       SL   Aug31 158:13 argus -P 560 -i eth0 -i eth1 -U 512 -m -F /scratch/argus.conf
(Continue reading)

Peter Van Epp | 3 Sep 22:58
Picon
Picon
Favicon
Gravatar

Re: radium fails (rc.50 from 2007-08-31)

	Looks like there are issues with .threads (and may be -J). The argus
I started on our Internet link has frozen, and doesn't seem to be responding
to a HUP.

root      5193  5.0 36.2 1468932 1426644 ?     SLl  10:04  10:36 argus -J -P 560 -i eth0 -i eth1 -U 512 -m -F /scratch/argus.conf
root      5536  0.0  0.0   3132   832 pts/1    S+   13:36   0:00 grep argus
hcids:/scratch # gdb64 argus 5193
GNU gdb 6.5
(gdb) where
#0  0x00000400002fa904 in ___newselect_nocancel ()
   from /lib64/power5+/libc.so.6
#1  0x0000000010019a3c in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1648
#2  0x0000000010006308 in main (argc=13, argv=0xfffffb24188) at argus.c:545
(gdb) where
#0  0x00000400002fa904 in ___newselect_nocancel ()
   from /lib64/power5+/libc.so.6
#1  0x0000000010019a3c in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1648
#2  0x0000000010006308 in main (argc=13, argv=0xfffffb24188) at argus.c:545
(gdb) where
#0  0x00000400002fa904 in ___newselect_nocancel ()
   from /lib64/power5+/libc.so.6
#1  0x0000000010019a3c in ArgusGetPackets (src=0x102006c0)
    at ArgusSource.c:1648
#2  0x0000000010006308 in main (argc=13, argv=0xfffffb241

	no complaints in /var/log/messages:

Sep  3 10:04:44 hcids kernel: RING: succesfully allocated 0 KB [tot_mem=12664896][order=12]
(Continue reading)

Peter Van Epp | 4 Sep 02:03
Picon
Picon
Favicon
Gravatar

Re: radium fails (rc.50 from 2007-08-31)

	To follow up my own post as usual, the issue looks to be threads 
deadlocking. I started an argus with no -J but a -D2 and it hung the same way.
I then foolishly listed the debug output file before saving the two gdb
outputs and lost them, but it was the same as before the argus was sitting
at select doing nothing and when HUPed there was a hung thread. A look through
the 900 meg debug file does indicate it is a new thread in an apparant list
allocation loop (which is likely eating memory and is in any case not working).
I've started it again and I expect it to fail again at which time I'll get 
more data. So it looks like a no .threads version of argus-3.0.0 works but 
.threads currently doesn't. 

Peter Van Epp / Operations and Technical Support 
Simon Fraser University, Burnaby, B.C. Canada

Carter Bullard | 4 Sep 15:38

Re: radium fails (rc.50 from 2007-08-31)

Peter,
We should leave the threaded version for argus-3.1, and assume that
all testing should be on the non-threaded version from now on.  But
curiosity may require a little explanation of the threaded model.

When running with the threads stuff enabled, argus will have 2 threads,
at a minimum:

Attaching to program: `/usr/local/sbin/argus', process 13992.
Reading symbols for shared libraries . done
0x9001f888 in select ()

(gdb) info threads
   2 process 13992 thread 0xb03  0x90054388 in  
semaphore_timedwait_signal_trap ()
* 1 process 13992 thread 0x203  0x9001f888 in select ()

(the * indicates the current thread)

(gdb) where
#0  0x9001f888 in select ()
#1  0x000161d0 in ArgusGetPackets (src=0x30501c) at ArgusSource.c:1648
#2  0x000046ac in main (argc=2, argv=0xbffffb0c) at argus.c:545
#3  0x0000280c in _start ()
#4  0x00002510 in start ()

(gdb) thread 2
[Switching to thread 2 (process 13992 thread 0xb03)]
0x90054388 in semaphore_timedwait_signal_trap ()
(gdb) where
(Continue reading)

Peter Van Epp | 4 Sep 21:14
Picon
Picon
Favicon
Gravatar

Re: radium fails (rc.50 from 2007-08-31)

On Tue, Sep 04, 2007 at 09:38:59AM -0400, Carter Bullard wrote:
> Peter,
> We should leave the threaded version for argus-3.1, and assume that
> all testing should be on the non-threaded version from now on.  But
<snip>

	That looks to be a fine idea :-). This copy of the current argus-3.0.0
without .threads but with -J has been running since yesterday in ~ 370K of
ram (and load is back up to normal peak levels here today):

ps auxwwww | grep argus
root      9499  6.0  9.1 364200 360488 ?       SL   Sep03  54:03 argus -J -P 560 -i eth0 -i eth1 -U 512 -m -F /scratch/argus.conf
vanepp   10801  0.0  0.0   3132   832 pts/1    S+   12:05   0:00 grep argus

	I may take my life in my hands and try the -R flag and see what happens
too :-). I'd say we seem to have fixed the issue (at least on PPC which is 
what I'm on, testing on Intel would be good in case there is an endian problem)
with unlimited memory growth. It may be time to start again comparing output
from 3.0 with output from 2.0.6 on the same link to make sure they are at 
least close to each other (they were in good shape last I tried but there have
been many changes since then :-)).

Peter Van Epp / Operations and Technical Support 

Carter Bullard | 4 Sep 21:26

Re: radium fails (rc.50 from 2007-08-31)

I wouldn't worry about the "-R" option, most of its function has been  
integrated
into the basic flow modeler now.  It will drastically increase the  
amount of output
records you generate if you have a lot of single request response  
protocols,
like NFS or AFS, so, probably not a requirement.  I'll go through to  
make
sure I can eliminate it as an option.

OK, we're back to being stable on argus-3.0.0 and time to refocus on  
clients,
to get ready for the release, which is now 4 months behind my  
schedule :o(
We may have lost anyone that was interested, so let me get the  
clients squared
away and get back to using the data, rather than generating it ;o)

Thanks for all your help, and of course if there is any gotcha, be  
sure and send
it to the list!!!!!

Carter

On Sep 4, 2007, at 3:14 PM, Peter Van Epp wrote:

> On Tue, Sep 04, 2007 at 09:38:59AM -0400, Carter Bullard wrote:
>> Peter,
>> We should leave the threaded version for argus-3.1, and assume that
>> all testing should be on the non-threaded version from now on.  But
(Continue reading)


Gmane