Gyepi SAM | 5 Jul 2003 22:01

bogofilter now has tdb support

Greetings!

I have added trivial database (http://sourceforge.net/projects/tdb)
support to bogofilter. The Berkeley DB backend has never truly pleased me,
and it pleases me less whenever I find my databases corrupted for the nth time.

Since I proposed the whole datastore concept, I also felt that it would
only be fitting to test the idea with a second implementation, which
will hopefully be faster, easier to use, and less corruptible.

The initial code is available in cvs under the 'datastore_tdb' branch.
It should build fine, if you had tdb installed in a standard location.

If you are interested in using it, I would recommend checking out a
fresh copy to avoid confusion with your regular bogofilter setup.

To build a tdb bogofilter, pass  '--with-tdb' to configure.
Note that BDB is not used in that instance.

Some areas that need further work/thought/discussion include:

1. datastore_db.c contained a lot of generic datastore code, which has
now been moved into datastore.c. We should take a look at the current
design and determine whether it is the best way.  I am not fond of the
db_getvalue, db_get_dbvalue and db_setvalue, db_set_dbvalue pairs. They
obscure the code path and are convoluted. Ideally.

2. Ideally, one should be able to build in both databases and choose the
preferred one at run time, either by specifying it at the command line
or based on the database filename.
(Continue reading)

Greg Louis | 5 Jul 2003 23:53
Picon

Re: bogofilter now has tdb support

On 20030705 (Sat) at 1601:31 -0400, Gyepi SAM wrote:

> I have added trivial database (http://sourceforge.net/projects/tdb)
> support to bogofilter. The Berkeley DB backend has never truly pleased me,
> and it pleases me less whenever I find my databases corrupted for the nth time.

Sounds interesting.  I'll pull the code and the tdb package and take a
look -- I'm not familiar with tdb at all.

I'd just started thinking a few days ago about maybe supporting cdb, at
least for those of us who don't use -u.  The idea would be to
accumulate separate spam and nonspam register_me mailboxes, and then
run a registration tool daily (or every six hours, or whatever) that
would update the counts, build a new .cdb file and mv it into place
(I'm using the one-list patch, but pluralize as appropriate).  What do
you think?

--

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis <at> consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

Gyepi SAM | 8 Jul 2003 23:19

Re: cdb support [was: bogofilter now has tdb support]

On Sat, Jul 05, 2003 at 05:53:28PM -0400, Greg Louis wrote:
> I'd just started thinking a few days ago about maybe supporting cdb, at
> least for those of us who don't use -u.  The idea would be to
> accumulate separate spam and nonspam register_me mailboxes, and then
> run a registration tool daily (or every six hours, or whatever) that
> would update the counts, build a new .cdb file and mv it into place
> (I'm using the one-list patch, but pluralize as appropriate).  What do
> you think?

Sure, that's doable. There are a couple of issues to keep in mind:

I would use a perl script to build the
databases, using bogolexer and bogoutil, and just add read-only cdb support
to bogofilter proper.

Rebuilding the databases from scratch may take a long time though,
depending on the message counts and file sizes so it may not be worth
it. Please try it and see if the speed is acceptable.

-Gyepi

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

Gyepi SAM | 9 Jul 2003 06:27

Re: bogofilter now has tdb support

On Sun, Jul 06, 2003 at 07:12:24PM +0200, Matthias Andree wrote:
> Gyepi SAM <gyepi <at> praxis-sw.com> writes:
> > 2. Ideally, one should be able to build in both databases and choose the
> > preferred one at run time, either by specifying it at the command line
> > or based on the database filename.
> 
> I doubt anyone besides developers and maybe integration testers would
> use that. These groups can have two executables around.

OK. Fine with me. Perhaps we should use program suffixes to distinguish
between the various types then. eg bogofilter-bdb and bogofilter-tdb

Speaking of which, do we really need to build the static binaries in
addition to the dynamic linked versions. Why not build only the static
version if the user specifies --enable-static. Then we can generate
binaries like: bogofilter-bdb-static and bogofilter-tdb-static.
Makes it easier to build rpms too.

In any case, we should create symlinks pointing to the correct binary
which users can twiddle if desired.

> > This would probably require the use of a dynamic link loader (man dlopen)
> That harms portability. dlopen isn't available everywhere, and opens a
> whole can of worms.

I thought of this and decided on a better approach, but since we're only
supporting one database at a time, we can table the whole thing.

-Gyepi

(Continue reading)

Matthias Andree | 9 Jul 2003 13:03
Picon
Picon

Re: bogofilter now has tdb support

Gyepi SAM <gyepi <at> praxis-sw.com> writes:

> Speaking of which, do we really need to build the static binaries in
> addition to the dynamic linked versions. Why not build only the static
> version if the user specifies --enable-static. Then we can generate
> binaries like: bogofilter-bdb-static and bogofilter-tdb-static.
> Makes it easier to build rpms too.

Your patch :-)

I'm not in favour of shipping these static versions at all, they are an
update nightmare for the user, and we really shouldn't concern ourselves
with the differences between distros.

> In any case, we should create symlinks pointing to the correct binary
> which users can twiddle if desired.

May be a good idea, or we might factor out the executables, so we have:

bogofilter-base (documentation, scripts, all that), requires bogofilter-progs

bogofilter-dynamic, requires bogofilter-base, provides bogofilter-progs,
                    conflicts bogofilter-static

bogofilter-static, requires bogofilter-base, provides bogofilter-progs,
                   conflicts bogofilter-dynamic

> I thought of this and decided on a better approach, but since we're only
> supporting one database at a time, we can table the whole thing.

(Continue reading)

Matthias Andree | 9 Jul 2003 15:19
Picon
Picon

Re: cdb support

Attachment (bdbtocdb.pl): application/x-perl, 3321 bytes
---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com
Matthias Andree | 9 Jul 2003 15:27
Picon
Picon

Re: cdb support

WEEE -- ezmlm-idx removed my text/plain and left the application/x-perl in.

Adrian, is there any misconfiguration on the MIME filtering stuff on
this list?

Gyepi SAM <gyepi <at> praxis-sw.com> writes:

> On Sat, Jul 05, 2003 at 05:53:28PM -0400, Greg Louis wrote:
>> I'd just started thinking a few days ago about maybe supporting cdb, at
>> least for those of us who don't use -u.  The idea would be to
>> accumulate separate spam and nonspam register_me mailboxes, and then
>> run a registration tool daily (or every six hours, or whatever) that
>> would update the counts, build a new .cdb file and mv it into place
>> (I'm using the one-list patch, but pluralize as appropriate).  What do
>> you think?
>
> Sure, that's doable. There are a couple of issues to keep in mind:
>
> I would use a perl script to build the
> databases, using bogolexer and bogoutil, and just add read-only cdb support
> to bogofilter proper.
>
> Rebuilding the databases from scratch may take a long time though,
> depending on the message counts and file sizes so it may not be worth
> it. Please try it and see if the speed is acceptable.

I have a Perl script to dump the current Berkeley .db files in a format
that cdbmake understands.

To test, I use a test .DB file (btree format, DB 4.0, goodlist.db) that
has 551,633 entries and is 29,580 kByte.

To avoid any disk seeks distorting the result, I prime the caches by
doing cat goodlist.db >/dev/null

It takes 20.5 s (18.0 user 0.7 sys) to dump my BerkeleyDB to cdbdump
format. The resulting file is 14,758 kByte.

It takes 4.2 s (1.0 user 0.4 sys) to run cdbmake to convert the cdbdump
format to .cdb format. Most of the wallclock time is spent in
fsync(). The .cdb file is 23,060 kByte.

It takes 1.2 s (0.9 user 0.3 sys) to run cdbdump to convert the .cdb
file back to cdbdump format.

The missing part is how .cdb would fare in bogofilter. Someone eager to
code this?

This all happens on a Linux 2.4 box with SYM53C875 SCSI, Fujitsu MAH
drive, ext3 file system, plenty of RAM and 700 MHz Duron CPU.

It is good to know that you need plenty of disk space (twice the .cdb
file plus input data) to build the .cdb, in my case, I need 60 MB
(2 * 23 + 14) to use cdb, as opposed to 29 MB with BDB.

The Perl script is attached, documentation and modified BSD-license,
including an example of how to use, are inside the script.

NB: DJB's original cdb isn't Free or Open Source software. "You may
distribute unmodified copies of the cdb package."

However, there are two options; Debian ships a freecdb (haven't looked
at it), and Michael Tokarev has a public domain "tinycdb" that shares
the file format with DJB's cdb, I presume this is also available on
Debian Linux.

http://www.corpit.ru/ftp/tinycdb/changelog
http://www.corpit.ru/ftp/tinycdb/tinycdb-0.72.tar.gz

Perl script, gzipped and uuencoded, just in case:

begin 644 bdbtocdb.pl.gz
M'XL("`<3##\"`V)D8G1O8V1B+G!L`*U6?V_C1!#]FWR*(1=$`VX;.'Z(E#O)
MB;?M2JD=;*>]"A!R[$VSJF,'KYU2`=^=-^LD30L'0E!%Z69GY\V\M[.S^^I#
M.FU,=3K7Q:DJ-K165=[IO*)Y-J_+-)N?K',ZIH1,6NEU375):5EL5%5C;J2J
M>Y6K1_)&UI#-5\F](EVLFYH69;5*ZLXK8(W+]6.E[Y8U'8W[]/E <at> \)KFCW25
MU/52)X;<(JN4PCHWS\FN,U0IHZJ-RDXL <at> ">B<2BGL0S\(1W_QS^+&"\U <at> C29
M7JFB3JK''3_,FF:]+HW*F%*EDNP%T:3(:%WI`FOQ*2 <at> !6LN5ZB6^=BHT1:8J
M4V.Y:4E(BUZ4-4&^QU;&0B,\!VJ,<AA0_;)6*?C72T5%LE)4+NSXR"AUG\QS
MU:<LJ3GF/#&*H-Y"(PHEU5W#5!R;G[XK2BA("00MX5[M[4^YF&79Y!G-%9,T
M90%PJ)`LU#Z=>5/;V!RP#5>S:ENE2E1*4BL#M+* <at> 58,DF!OPTF52W$&_AZ7.
MG_M436%.6NVW,RP5BT)J`1%K1D/X=9ZDBO(RO=_F*]ZY5].)H."<9I& <at> H\08
MT#$T#2+YKO\O:\(BI <at> F+G:S6N3K)YO3V-%.;TZ*!8."SKC4$R1VD!TW6"F2:
M]98Q+'#7"WM2GAV30[C=N$H>Z(Q59"?:%\>WA_;=&,;]N%ZMN:XTOJH5'2_H
MKQSL(JR8R+'P(_'?C\9>G5!EVM251 <at> F`KZTIE`27NRF;"GO#,V <at> 9?'*X^(U#
M#[I>4EG9_V6# <at> T^K,M,+G5K)4)>58LE6NJX5GZ!RHS,^9,NDK;)%F>?E <at> R[N
M^&1DFIV,=5JI>FB3^N1%6H8/QS:?M,Q46X25JA/DR9#)O-RP:==]4)\Z16%S
M20(O!Q9#',8KLA?)(&":)V <at> 3U<E[DD"P`R5V28! <at> UJ3J[_+ <at> 6N>C\"_SH"V[
MK$SMD=Z5)'Q.H7][WM&.5*63W#PI;3?(.AZDOR/E*VW]7O:=%TT:J3^ML?HS
M`23>`I85Z\J'!FV <at> V?9056285UP;R&55UHI:<=#FT"(UNCPM8-AVBG)1/_"V
M/]41#F#*A<1MEPNLXA(JVF(R9D\BOI011<%Y?..& <at> C">AL&U](1'HUL8!8V#
MZ6TH+RYCN <at> PFG <at>  <at> C<GT/LWX<RM$L#C#1=2-X=ODN <at> LGU;]%YIJ&(( <at> I"DFA!
M$G#`#UT_EB)R2/KCR<R3_H5# <at> "`_B'$:KV2,97' <at> <%CNMG]RY$9V)<+Q)7ZZ
M(SF1\:V->"YCGZ.=(YQ+4S>,Y7 <at> V<4.:SD+T. <at> 9C<IZ,QA-77 <at> GO!!D <at> *HEK
MX<<47;J3R0NNP8TO0D[_&=&1L'W#':&IVF" <at> ZLE0C&/F]#0:0T"D.'$HFHJQ
MY(%X)\#'#6\=H/+M'J#W?#?#,IC)<Z_<"Q`\^ <at> =EL#GC62BN..W <at> '##1;!3%
M,I[% <at> BZ"P+.*1R*\1F^+SF <at> 21-&V^SN($;L<G$& <at> &,P8CV:19.U8;S\683BS
M[X4^-OL&ZH"_"V?/RASXEC"$"L);AF4E["XX=',I,(_=]EMN<>BR%!&T&\>'
M"Q$34L8'3,D7%Q-Y(?RQ8&O`.#<R$GV[;3+B);(-?>,B[HRIV^WB2\T.#TK8
ML9M*\IQ<[UIRZNUB <at> *$4(KDM&TQ&L_'E5OCM]<Z7)4OK_ <at> ]/)HO8D]GP\+)S
M-C0X>6W?<J>#KT\'W]!GKX>#+X>#`:G5*B'QRYIZ\.1;0P;#X26Z4Z[.[&]O
M]-.YYA^=3"OJXO+G;M+%BRE7AGO"0A?H"3TWO+C^?O`CUO4\>D,U%G^T3,S2
MH8^W$!\[^U4.#?#YZHLO,`7K*`Z%Z.#&Y;9CPZ1)P>\31D%3VKD-J?=A%Q'0
MNNFHA^=:W1C$ZGG';XWZ^:AWKQX!N$GR!O=&^-.Y#*.X?V:!:;_\#0WHY=Q[
M(7SQ+N[W.[_:]>U#LOMIUZ%<%7?UTB[O.]1U#J:L+T\.V7K\MIWXH>AV/GC!
MCGNCY8?.B4?XEMWOG0[>*WMZW]* <at> 3[\^\\.E66FU:=]ZK1?!JZ?+-X5Z>+:#
MAVYX$16TM(;]]L'G^.TB8]/1`GM4E$=1[*'6F<!#M[_;]BW$(F_,TD;\T_Y;
J(#8_#[HPCT7Z? <at> ^V(D93[.OEK-/[#1ORV5EG*S>4.^O\`4[0#7OY#```
`
end

--

-- 
Matthias Andree

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

Greg Louis | 9 Jul 2003 17:03
Picon

Re: cdb support

On 20030709 (Wed) at 1527:23 +0200, Matthias Andree wrote:

> The missing part is how .cdb would fare in bogofilter. Someone eager to
> code this?

Wouldn't put it quite like that, but I was planning to do it anyway :)

> NB: DJB's original cdb isn't Free or Open Source software. "You may
> distribute unmodified copies of the cdb package."

Yeah, I know.  However, he also suggests that one incorporate code into
one's own app rather than using a library, and implies permission to do
so.  Philip Hazel did that in Exim, with code provided by Nigel
Metheringham, which code contains:

     Copyright (c) 1998 Nigel Metheringham, Planet Online Ltd

     This program is free software; you can redistribute it and/or modify it
     under the terms of the GNU General Public License as published by the
     Free Software Foundation; either version 2 of the License, or (at your
     option) any later version.

     This code implements Dan Bernstein's Constant DataBase (cdb) spec.
     Information, the spec and sample code for cdb can be obtained from
     http://www.pobox.com/~djb/cdb.html. This implementation borrows some code
     from Dan Bernstein's implementation (which has no license restrictions
     applied to it).

--

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis <at> consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

Gyepi SAM | 9 Jul 2003 17:41

Re: cdb support

On Wed, Jul 09, 2003 at 03:27:23PM +0200, Matthias Andree wrote:
> > Rebuilding the databases from scratch may take a long time though,
> > depending on the message counts and file sizes so it may not be worth
> > it. Please try it and see if the speed is acceptable.
> 
> I have a Perl script to dump the current Berkeley .db files in a format
> that cdbmake understands.

This implies that bogofilter-cdb requires bogofilter-bdb.

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

Matthias Andree | 9 Jul 2003 18:18
Picon
Picon

Re: cdb support

Greg Louis <glouis <at> dynamicro.on.ca> writes:

[Nigel Metheringham's CDB implementation]
>      This code implements Dan Bernstein's Constant DataBase (cdb) spec.
>      Information, the spec and sample code for cdb can be obtained from
>      http://www.pobox.com/~djb/cdb.html. This implementation borrows some code
>      from Dan Bernstein's implementation (which has no license restrictions
>      applied to it).

If the stuff includes modified DJB code, I cannot distribute it, and I
doubt that David could either. I don't want to go into any of those "you
can ship the verbatim stuff and a patch" hassles if there is Michael
Tokarev's public domain implementation that does not borrow code.

We have a license to ship verbatim tarballs, not derivative works. If
there is evidence that DJB put cdb (or cdb-0.75, specifically) into the
public domain or allowed distribution in part, I'd like to see that.

  "Information for distributors

   You may distribute unmodified copies of the cdb package.

   Packages that need to read cdb files should incorporate the necessary
   portions of the cdb library rather than relying on an external cdb
   library." (Daniel J. Bernstein, in http://cr.yp.to/cdb.html)

No license = no permission to ship it, ยง106 UrHG (German Copyright Act),
threatens up to three years in prison or fine. The attempt is culpable.

It may look like nitpicking and not implying that "you should
incorporate" implies a license, but nitpicking is the lawyer's job. >:-)

--

-- 
Matthias Andree

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com


Gmane