Carter Bullard | 2 Oct 21:01

Added printing country codes to ra* progs

Gentle people,
I've added printing country codes for IPv4 addresses to all the ra*  
programs.
The new printing fields are "sco" and "dco" for src country and dst  
country.
I'll add filtering, aggregation and sorting possibly before release, but
more than  likely, after.

ragraph() will aggregate based on country code, so you will be able to
graph based on this field.

This feature needs a configuration file that is constructed from a  
set of files
that you get from the internet number registries.  I will have a  
working file,
called ./support/Config/delegated-ipv4-latest, that was created on  
2007-10-01.
This file will be in the distribution, and will be installed by the  
package
Makefile.  I have included a shell script that creates new config  
file, by
fetching all the needed files, and consolidating the data into a single
file suitable for the ra* programs.  This script is called
"./support/Config/ragetcountrycodes.sh".   This program uses wget()  
to get
the files from ARIN, the American Registry for Internet Numbers, for all
the global registries and generates the composite file that the ra*  
programs
use.

(Continue reading)

Carter Bullard | 3 Oct 02:24

Re: irritating arugs behaviour

Hey Russell,
I'm thinking about this and trying to come up with a reasonable
solution to your problem.  Understanding that there is not enough
diskspace is tricky unless you generate a filesystem full error, and
I'm wondering if there isn't something a little more elegant.

I'll ponder this until tomorrow, and lets figure out what we need to do.

I was very glad to see that the program "arugs" was irritating you
instead of my program.

Carter

On Sep 24, 2007, at 5:18 PM, Russell Fulton wrote:

> Hi
>
> When a disk write from argus fails (due to diskfull for example) argus
> keeps running but never retries the output file.   I have a daily job
> that clears space in the archive and occasionally the cleared space is
> not enough for the next day's logs so the system runs out of disk late
> at night.
>
> I've just got  back after a few days away to find that the monitor ran
> out of disk on the first evening, of course!  :)
>
> When argus closes an output process due to errors it would be nice  
> if it
> could recheck the file periodically to see if it can proceed again.
>
(Continue reading)

Carter Bullard | 3 Oct 05:33

country code discussion and example

Gentle people,
I just wanted to document on the mailing list the support for printing country codes.
Why, what, how etc....

While country codes are not "truth", they are useful for categorizing traffic, and
so I've added a bunch of support for mapping IP addresses to publicly available
country code information.  The biggest win is when we can filter and aggregate based
on country codes, and so I spent the time today to get these features in the ra*
programs..

OK, there are two types of country code support in ra* programs.  The first
is printing support, which is a real-time local lookup of an IP address against
the RIR databases.  This doesn't modify any record content, and so this
support doesn't provide filtering or aggregation,...., just field printing.  This
is enabled automatically if you have either an "sco" or a "dco" in the print
field specification.

The ra* programs will read in the RIR database from the path specified
in the .rarc file.  There is a new rarc file variable, RA_DELEGATED_IP, where you
specify the path to a file of the type found in ./support/Config/delegated-ip-latest.
The client package Makefile will install this file in /usr/local/argus, by
default, and so the sample rarc file in ./support/Config has this path as
its value.

Because the RIR databases change weekly (maybe daily?), in order to make
this useful, we have scripts to generate current IPv4 delegated address maps.

The second form of support is where we add the country codes to the actual
argus records.  When they are a part of the record, ra* clients can filter,
aggregate, sort, strip and anonymize the country codes.  There is a new
client, called ralabel(), which will add country code labels to an argus data
stream, and from there you can actually do work with the codes.

If a record has embedded country codes, the ra* programs will not consult
newer databases when processing the country codes.  Basically by putting
them in the records, you fix the values in time, which is a good thing.

ralabel() uses several sources for country code information.  The first place
it looks is in the local RIR databases that are provided in the client
distribution.  These db's are not complete, so if ralabel() can't find a countr
 code, it performs a reverse DNS lookup, to see if there is a country code in 
the IP address's fully qualified domain name (FQDN).  Currently we
use the first country code we get, first from the RIR database, then the DNS.
We can add other queries as well, such as the whois database, but we'll
start with this first to see how it goes.

Here is an example of some data that spans the summer for a new server.
(We reference the source country code as "sco"):

   ralabel -nnnR datadir -w - | racluster -m sco -w - | rasort -m bytes -s stime dur sco trans pkts bytes state

         StartTime              Dur sCo    Trans      TotPkts        TotBytes State
2007/06/08.13:24:3      9666858.000  US   472501    267263652    645752494338   CON
2007/07/26.04:02:5      2607663.000  DE    40624      5642307      6217372807   CON
2007/07/26.14:22:3      2142552.750  IT     7245       754204      1199815337   CON
2007/08/09.10:09:0        22408.656  AT     3903       577621       637107386   CON
2007/09/03.00:52:2       871139.062  IN     1204        44309        70867839   CON
2007/06/13.06:12:0      9136759.000  CA     2199        58517        37835084   CON
2007/09/12.04:52:1       609061.438  HK      716        20316        34759598   CON
2007/08/01.13:49:0       932323.688  AP    16591        44937        20652629   CON
2007/09/07.00:59:1          262.234  NZ       34         5564        11564321   CON
2007/09/13.15:45:4         1553.102  GY      203         7719         8524481   CON
2007/06/28.10:56:4      7929962.500  GB      792         4615         3718261   CON
2007/09/19.08:29:2       772173.500  SA       35         1742         3456232   CON
2007/06/11.10:28:4      8061142.500  EU        8          387          595600   CON
2007/07/17.23:35:0      4928945.000  CN       25          117           34923   CON
2007/08/10.14:33:3      4089591.750  FR       25          263           33814   CON
2007/07/19.10:41:0            8.146  AF        4           36           18270   CON
2007/08/06.10:06:2      1395395.500  SE        6           75           16833   CON
2007/07/30.18:10:0      2706994.000  KR        5           48           16309   CON
2007/09/08.03:49:0            0.362  BY        1           20           12092   CON
2007/07/02.15:53:0        46291.188  EE        2           17            1539   CON
2007/07/09.15:37:2            0.524  CZ        1           13            1269   CON
2007/09/01.07:24:1         6793.704  AU        2           12            1215   CON
2007/07/18.10:28:5            0.251  IS        1           11            1175   CON
2007/08/03.14:05:3            0.465  TW        1            6             358   CON
2007/06/13.16:34:4            0.000  RU        1            2             128   CON
2007/08/23.18:05:3            0.000  ES        1            1              62   INT
2007/07/29.11:14:3            0.000  JP        1            1              60   INT

The ralabel() adds the country codes to the records, and the racluster simply 
merges records that have a matching country code string.  rasort() creates
the ranks, and there you go, a decent activity table based on country.
The -nnn is there to guarantee that if we have to do a reverse DNS lookup
to find the country code, we actually get a name resolved, instead of an address.
I'll eliminate that on the next round so you don't have to remember.

All country codes are 2 character ascii strings, and so the data demand
to embed them in each record is not huge (8 bytes), but it is significant,
when you have billions of records, so we'll want to be able to rastrip()
the codes out of the records, and of course, with anonymization, I see a situation
where you would want to anonymize the IP addresses, but leave the country
codes intact.

Ok well, hopefully that is helpful.  I'll add more later this week.

Carter


Carter Bullard | 4 Oct 02:41

new clients rc.58 on the server

Gentle people,
ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.0.rc.58.tar.gz

is now ready for testing.  This release candidate has a large number of
modifications to fix bugs as well as to finalize the argus-3.0.0 record
enhancement support.  This primarily is the addition of packet size
reporting and country code printing.

Of the bug fixes, one in particular is an unreported bug where radium()
would/could eat as much CPU as it could get.

There are a number of fixes to ragraph().  It should use modern rrdtool
distributions correctly (switching from the \J label to the \l label  
for left
justification of legend items) and I've changed the color selections so
that we repeat colors only after 24 items in the graph.  I also added
support for new rrd_graph options, such as -units si and the -units- 
exponent
option.

If anyone finds any issues, regardless of wether its code, installation
problems, documentation errors, etc....   please send email!!!!!

Thanks for all the support!!

Carter

Carter Bullard | 4 Oct 04:12

Re: irritating arugs behaviour

Hey Russell,
So we have mechanisms in argus to periodically check the filesystem.
We're running stat() against the output file only once every second to
see if the file has been renamed and there is a need to recreate it.
We can use the same mechanism to do a statfs() to see if there is
any space to write out to the disk.

So question is, should we have thresholds for the amount of free space
that needs to be there to write out, and free space to start writing
again after stopping?  Seems reasonable.  The system call statfs()
can differentiate free blocks available to non-super user and total
free blocks, so ..... seems that argus is running as root, so we'll
worry about total free blocks?

So how does this sound? When we reach the limit, or fail due to
a number of reasons, we close the output file.  Then every second
when we have records to write out, we'll test the filesystem for
space, and as soon as space becomes free, we'll reopen the file,
or recreate it, and start writing again.

We probably need to tally the number of records that weren't
written, so I'll have to figure out how to do that, possibly writing
the number in the initial MAR that is written to the new file.

Is that what you were thinking?

Carter

Russell Fulton wrote:
>> I was very glad to see that the program "arugs" was irritating you
>> instead of my program.
>>     
> :-P
>
> R
>
> Yes, I agree that deciding what the best way to handle this is not
> obvious and you certainly don't want to keep trying to write every
> record as you will almost certainly end up with corrupt files.
>
> Cheers, Russell
>
>
>   

Wolfgang Barth | 4 Oct 10:10
Picon

Re: new clients rc.58 / radium segfaults

Hi Carter,

> Of the bug fixes, one in particular is an unreported bug where radium()
> would/could eat as much CPU as it could get.

Oh yeah, now it consumes no more CPU, but memory ;-) (see below)

With version rc.58 I get a segfault:

 /usr/local/sbin/radium -f /etc/radium.conf

strace output:

...
mmap2(NULL, 397312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0xb7acf000
gettimeofday({1191482968, 288485}, NULL) = 0
read(3, "uivalent\n#\n  \n#RADIUM_FILTER=\"\"\n"..., 4096) = 3402
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Process 5530 detached

radium in the foreground works, so the debbuging output is not really
helpful...

compiled with .debug:

% ./radium/radium -D8 -f /etc/radium.conf

... very long output ...

radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
0xbfe4b008
radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
0xbfeac008
radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
0xbff0d008
radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
0xbff6e008
radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning 0x0
radium[15272]: 2007-10-04 10:06:57 ArgusAddHostList((null)) ArgusCalloc
Cannot allocate memory
^^^^^^^^^^^^^^^^^^^^^^^^
radium[15272]: 2007-10-04 10:06:57 ArgusShutDown (3)
radium[15272]: 2007-10-04 10:06:57 ArgusFree (0x81fa080)
radium[15272]: 2007-10-04 10:06:57 ArgusDeleteQueue (0x81fa080) returning
radium[15272]: 2007-10-04 10:06:57 RaParseComplete(0)
radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7cfc008) closing
radium[15272]: 2007-10-04 10:06:57 ArgusWriteConnection(0xb7cfc008,
0x80a6689, 6) returning 6
radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7cfc008) done
radium[15272]: 2007-10-04 10:06:57 ArgusFree (0xb7cfc008)
radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7c88008) closing
radium[15272]: 2007-10-04 10:06:57 ArgusWriteConnection(0xb7c88008,
0x80a6689, 6) returning 6
radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7c88008) done
*** glibc detected *** double free or corruption (fasttop): 0x081fa780 ***
Abort

Wolfgang
--

-- 
<wob (at) swobspace de> * http://www.swobspace.de

Wolfgang Barth | 4 Oct 10:17
Picon

Re: new clients rc.58 / radium segfaults

On Thu, Oct 04, 2007 at 10:10:16AM +0200, wob wrote:

> Oh yeah, now it consumes no more CPU, but memory ;-) (see below)

Additional info: radium rc.56 works, rc.57 and rc.58 fails.

Wolfgang
--

-- 
<wob (at) swobspace de> * http://www.swobspace.de

Patrick Forsberg | 4 Oct 15:01
Picon
Picon

Re: new clients rc.58 on the server

Carter Bullard wrote:
> Gentle people,
> ftp://qosient.com/dev/argus-3.0/argus-clients-3.0.0.rc.58.tar.gz

> If anyone finds any issues, regardless of wether its code, installation
> problems, documentation errors, etc....   please send email!!!!!

It would seem that rastrip is still broken

We collect argusdata on two nodes with two NICs each (IN and OUT).
We collect 12 bytes of user DATA.
We merge the data from the two nodes with racluster

Because of policy we need to clean out userdata after a period of time,
but right now we don't trust rastrip since it looks as if it does things
with the data that it shouldn't

Here's a representation of the problem:

#Sort the indata (since it comes from more than one collector)
rasort -r /var/log/argus/2007/10/03-0000 -w 100300.rasort
#Strip userdata from logfiles
rastrip -M -suser -M -duser -r 100300.rasort -w 100300.rasort.strip1
#Strip userdata again (should do nothing)
rastrip -M -suser -M -duser -r 100300.rasort.strip1 -w 100300.rasort.strip2
#Plaintext the stripped files
ra -r 100300.rasort.strip1 >strip1
ra -r 100300.rasort.strip2 >strip2

# I would expect the second run of rastrip to do nothing to the data since the
# only thing it's supposed to do is remove suser and duser and that should
# already be gone. But a simple comparison of the text output shows that
# something funky is happening to some of the flows.

#Compare the two plaintext files. There should be no difference.
diff strip1 strip2
10703c10703
<    23:55:11.467660            tcp     1.2.3.4.63212     ->       2.3.4.5.smtp          1        2           68          198   RST
---
>    23:54:59.085949            tcp     1.2.3.4.63212     ->       2.3.4.5.smtp          1        2           68          198   RST
10760c10760
<    23:55:11.579806    d       tcp      3.4.5.6.55954     ->      2.3.4.6.smtp          1        1           60           60   RST
---
>    23:54:58.833879    d       tcp      3.4.5.6.55954     ->      2.3.4.6.smtp          1        1           60           60   RST
19540c19540
<    23:55:29.428842    d       tcp    4.5.6.7.3014      ->       2.3.4.7.http          7       12          426        15047   FIN
---
>    23:54:57.897868    d       tcp    4.5.6.7.3014      ->       2.3.4.7.http          7       12          426        15047   FIN
23914c23914
<    23:55:37.793078  e         tcp    5.6.7.8.42565     ->      2.3.4.8.http          3        4          192          264   RST
---
>    23:55:09.038700  e         tcp    5.6.7.8.42565     ->      2.3.4.8.http          3        4          192          264   RST

<REST OF DATA STRIPPED>

Only 24 out of 121773 flows are affected, but that's bad enough.

The collector nodes are running argus-3.0.0 64-bit on a RHEL-4ES 2.6.9-55.ELsmp
racluster is argus-3.0.0.rc.44 32bit on a RHEL-3AS 2.4.21-47.ELsmp
rasort,rastrip and ra above are argus-3.0.0.rc.58 32bit on a RHEL-3AS 2.4.21-47.ELsmp

Regards,

/Patrick

Carter Bullard | 4 Oct 16:40

Re: new clients rc.58 / radium segfaults

Man, ...,  Wolfgang,
You sure do find the bugs!!!   Now, that is a very good thing, except
that there aren't suppose to be any bugs    :o)

OK, so why would you be failing adding to the host list?  I suspect
the error message is bogus, but we'll see.

How many remote sources are you collecting from?  Are any of
them actually live? reachable?  radium should not fail here, of
course, but I suspect it maybe either a bogus return code, or
a radium.conf problem???!!!

Carter

On Oct 4, 2007, at 4:10 AM, Wolfgang Barth wrote:

> Hi Carter,
>
>> Of the bug fixes, one in particular is an unreported bug where  
>> radium()
>> would/could eat as much CPU as it could get.
>
> Oh yeah, now it consumes no more CPU, but memory ;-) (see below)
>
> With version rc.58 I get a segfault:
>
>  /usr/local/sbin/radium -f /etc/radium.conf
>
> strace output:
>
> ...
> mmap2(NULL, 397312, PROT_READ|PROT_WRITE, MAP_PRIVATE| 
> MAP_ANONYMOUS, -1, 0)
> = 0xb7acf000
> gettimeofday({1191482968, 288485}, NULL) = 0
> read(3, "uivalent\n#\n  \n#RADIUM_FILTER=\"\"\n"..., 4096) = 3402
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
> Process 5530 detached
>
> radium in the foreground works, so the debbuging output is not really
> helpful...
>
> compiled with .debug:
>
> % ./radium/radium -D8 -f /etc/radium.conf
>
> ... very long output ...
>
> radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
> 0xbfe4b008
> radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
> 0xbfeac008
> radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
> 0xbff0d008
> radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576) returning
> 0xbff6e008
> radium[15272]: 2007-10-04 10:06:57 ArgusCalloc (1, 394576)  
> returning 0x0
> radium[15272]: 2007-10-04 10:06:57 ArgusAddHostList((null))  
> ArgusCalloc
> Cannot allocate memory
> ^^^^^^^^^^^^^^^^^^^^^^^^
> radium[15272]: 2007-10-04 10:06:57 ArgusShutDown (3)
> radium[15272]: 2007-10-04 10:06:57 ArgusFree (0x81fa080)
> radium[15272]: 2007-10-04 10:06:57 ArgusDeleteQueue (0x81fa080)  
> returning
> radium[15272]: 2007-10-04 10:06:57 RaParseComplete(0)
> radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7cfc008) closing
> radium[15272]: 2007-10-04 10:06:57 ArgusWriteConnection(0xb7cfc008,
> 0x80a6689, 6) returning 6
> radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7cfc008) done
> radium[15272]: 2007-10-04 10:06:57 ArgusFree (0xb7cfc008)
> radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7c88008) closing
> radium[15272]: 2007-10-04 10:06:57 ArgusWriteConnection(0xb7c88008,
> 0x80a6689, 6) returning 6
> radium[15272]: 2007-10-04 10:06:57 ArgusCloseInput(0xb7c88008) done
> *** glibc detected *** double free or corruption (fasttop):  
> 0x081fa780 ***
> Abort
>
> Wolfgang
> -- 
> <wob (at) swobspace de> * http://www.swobspace.de
>

Carter Bullard | 4 Oct 16:51

Re: new clients rc.58 / radium segfaults

Hey Wolfgang,
Could you run  your failing radium() using the "-D3" option?  Preferably
the first command line option?  Do you have any RA_ARGUS_SERVER
directives in your .rarc, and ..., can I get a copy of your  
radium.conf file?
You can corrupt anything you're not comfortable sharing, like making
IP addresses bogus, etc......!!!

Carter

On Oct 4, 2007, at 4:17 AM, Wolfgang Barth wrote:

> On Thu, Oct 04, 2007 at 10:10:16AM +0200, wob wrote:
>
>> Oh yeah, now it consumes no more CPU, but memory ;-) (see below)
>
> Additional info: radium rc.56 works, rc.57 and rc.58 fails.
>
> Wolfgang
> -- 
> <wob (at) swobspace de> * http://www.swobspace.de
>


Gmane