Any feedback is appreciated. Thanks.
dbparser: add groupingby() parser
This patch adds a new parser that can perform simple correllation on log
messages, e.g. when multiple input log messages describe the same event.
In a way it is similar to the SQL GROUP BY operation, where an aggregate of
a set of input records can be calculated.
The major difference between SQL GROUP BY and groupingby() is that the first
_always_ operates on a enumerable list of records, whereas groupingby()
works on a stream of data.
groupingby() produces related groups by using a sliding window on time, e.g.
it can be specified how much time we need to look back to group related
As a specific use-case, let's see Linux audit logs. Linux audit logs tend to
be broken to several lines generated as a list of lines. These tend to be
pretty close in time, however there might be multiple events logged at
around the same time, which get mixed up in the output.
The example below is the audit log for an ntpdate execution:
type=SYSCALL msg=audit(1440927434.124:40347): arch=c000003e syscall=59 success=yes exit=0 a0=7f121cef0b88 a1=7f121cef0c00 a2=7f121e690d98 a3=2 items=2 ppid=4312 pid=4347 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="ntpdate" exe="/usr/sbin/ntpdate" key=(null)
type=EXECVE msg=audit(1440927434.124:40347): argc=3 a0="/usr/sbin/ntpdate" a1="-s" a2="ntp.ubuntu.com
type=CWD msg=audit(1440927434.124:40347): cwd="/"
type=PATH msg=audit(1440927434.124:40347): item=0 name="/usr/sbin/ntpdate" inode=2006003 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
type=PATH msg=audit(1440927434.124:40347): item=1 name="/lib64/ld-linux-x86-64.so.2" inode=5243184 dev=08:01 mode=0100755 ouid=0 ogid=0 rdev=00:00 nametype=NORMAL
type=PROCTITLE msg=audit(1440927434.124:40347): proctitle=2F62696E2F7368002F7573722F7362696E2F6E7470646174652D64656269616E002D73
These lines are connected by their 2nd field, msg equals to audit(1440927434.124:40347).
This can be processed by the groupingby() parser in a similar way that
db-parser() could do correllation.
These are the options for groupingby():
* key(): specifies the key for the grouping, e.g. the value that must be the
same for all messages in the group
* scope(): specifies one of three values: "global", "host", "process", meaning
the same as in db-parser; whether to apply grouping for all messages
received by syslog-ng (global), only messages coming from the same host
(host), or the same process/pid combination.
* where(): specifies a filter condition, messages not matching the filter
will NOT be added to the group. where() only has access to a single
message, the current one being processed.
* having(): specifies a filter condition that must match in order for the
group to generate an aggregate message. having() has access to the
entire group through the "context".
* timeout(): specifies the maximum time to wait for all messages in the
group to arrive. After this time, the group is assumed to be complete
and is aggregation is triggered.
* aggregate(): this specifies the aggregate message that's going to be
generated when the group is complete.
* inject-mode(): how the aggregate message is injected into the syslog-ng
message routing, can be one of: "pass-through", "internal".
* trigger(): trigger the closure of the group by matching an incoming
message. If the filter condition specified here matches the incoming
message, it will cause the aggregate message to be calculated, emitted
and the group be discarded from the state table.
A few use-cases where this can be useful:
* Linux audit logs
* postfix logs
Signed-off-by: Balazs Scheidler <balazs.scheidler <at> balabit.com