Gentle people,
I just wanted to document on the mailing list the support for printing country codes.
Why, what, how etc....
While country codes are not "truth", they are useful for categorizing traffic, and
so I've added a bunch of support for mapping IP addresses to publicly available
country code information. The biggest win is when we can filter and aggregate based
on country codes, and so I spent the time today to get these features in the ra*
programs..
OK, there are two types of country code support in ra* programs. The first
is printing support, which is a real-time local lookup of an IP address against
the RIR databases. This doesn't modify any record content, and so this
support doesn't provide filtering or aggregation,...., just field printing. This
is enabled automatically if you have either an "sco" or a "dco" in the print
field specification.
The ra* programs will read in the RIR database from the path specified
in the .rarc file. There is a new rarc file variable, RA_DELEGATED_IP, where you
specify the path to a file of the type found in ./support/Config/delegated-ip-latest.
The client package Makefile will install this file in /usr/local/argus, by
default, and so the sample rarc file in ./support/Config has this path as
its value.
Because the RIR databases change weekly (maybe daily?), in order to make
this useful, we have scripts to generate current IPv4 delegated address maps.
The second form of support is where we add the country codes to the actual
argus records. When they are a part of the record, ra* clients can filter,
aggregate, sort, strip and anonymize the country codes. There is a new
client, called ralabel(), which will add country code labels to an argus data
stream, and from there you can actually do work with the codes.
If a record has embedded country codes, the ra* programs will not consult
newer databases when processing the country codes. Basically by putting
them in the records, you fix the values in time, which is a good thing.
ralabel() uses several sources for country code information. The first place
it looks is in the local RIR databases that are provided in the client
distribution. These db's are not complete, so if ralabel() can't find a countr
code, it performs a reverse DNS lookup, to see if there is a country code in
the IP address's fully qualified domain name (FQDN). Currently we
use the first country code we get, first from the RIR database, then the DNS.
We can add other queries as well, such as the whois database, but we'll
start with this first to see how it goes.
Here is an example of some data that spans the summer for a new server.
(We reference the source country code as "sco"):
ralabel -nnnR datadir -w - | racluster -m sco -w - | rasort -m bytes -s stime dur sco trans pkts bytes state
StartTime Dur sCo Trans TotPkts TotBytes State
2007/06/08.13:24:3 9666858.000 US 472501 267263652 645752494338 CON
2007/07/26.04:02:5 2607663.000 DE 40624 5642307 6217372807 CON
2007/07/26.14:22:3 2142552.750 IT 7245 754204 1199815337 CON
2007/08/09.10:09:0 22408.656 AT 3903 577621 637107386 CON
2007/09/03.00:52:2 871139.062 IN 1204 44309 70867839 CON
2007/06/13.06:12:0 9136759.000 CA 2199 58517 37835084 CON
2007/09/12.04:52:1 609061.438 HK 716 20316 34759598 CON
2007/08/01.13:49:0 932323.688 AP 16591 44937 20652629 CON
2007/09/07.00:59:1 262.234 NZ 34 5564 11564321 CON
2007/09/13.15:45:4 1553.102 GY 203 7719 8524481 CON
2007/06/28.10:56:4 7929962.500 GB 792 4615 3718261 CON
2007/09/19.08:29:2 772173.500 SA 35 1742 3456232 CON
2007/06/11.10:28:4 8061142.500 EU 8 387 595600 CON
2007/07/17.23:35:0 4928945.000 CN 25 117 34923 CON
2007/08/10.14:33:3 4089591.750 FR 25 263 33814 CON
2007/07/19.10:41:0 8.146 AF 4 36 18270 CON
2007/08/06.10:06:2 1395395.500 SE 6 75 16833 CON
2007/07/30.18:10:0 2706994.000 KR 5 48 16309 CON
2007/09/08.03:49:0 0.362 BY 1 20 12092 CON
2007/07/02.15:53:0 46291.188 EE 2 17 1539 CON
2007/07/09.15:37:2 0.524 CZ 1 13 1269 CON
2007/09/01.07:24:1 6793.704 AU 2 12 1215 CON
2007/07/18.10:28:5 0.251 IS 1 11 1175 CON
2007/08/03.14:05:3 0.465 TW 1 6 358 CON
2007/06/13.16:34:4 0.000 RU 1 2 128 CON
2007/08/23.18:05:3 0.000 ES 1 1 62 INT
2007/07/29.11:14:3 0.000 JP 1 1 60 INT
The ralabel() adds the country codes to the records, and the racluster simply
merges records that have a matching country code string. rasort() creates
the ranks, and there you go, a decent activity table based on country.
The -nnn is there to guarantee that if we have to do a reverse DNS lookup
to find the country code, we actually get a name resolved, instead of an address.
I'll eliminate that on the next round so you don't have to remember.
All country codes are 2 character ascii strings, and so the data demand
to embed them in each record is not huge (8 bytes), but it is significant,
when you have billions of records, so we'll want to be able to rastrip()
the codes out of the records, and of course, with anonymization, I see a situation
where you would want to anonymize the IP addresses, but leave the country
codes intact.
Ok well, hopefully that is helpful. I'll add more later this week.
Carter