Matthias Andree | 2 May 00:59 2004
Picon
Picon

bogofilter hang

Hi,

I just had my machine rebooted by the watchdog (must have been a kernel
bug, it was an experimental kernel), which does the equivalent of reboot
-f, no chance for any processes to survive or finish their work.

This situation evidently didn't leave bogofilter a chance to have
Berkeley DB release its internal locks on the data base, so bogofilter
locked dead next time it was run after the boot waiting for an orphaned
lock that no process could free. Stopping the mail system (Postfix),
then killing all bogofilter processes and running db_recover rectified
the problem.

I wonder if anything of this should be automated, i. e. bogofilter
should time out after like 5 minutes without progress.

Opinions are welcome, but I don't have the time to write, let alone test
and debug, serious amounts of code in the near future.

Cheers,

--

-- 
Matthias Andree

Encrypted mail welcome: my GnuPG key ID is 0x052E7D95

Clint Adams | 5 May 15:35 2004
Picon

[bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

I can't reproduce this, though I didn't try very hard.

----- Forwarded message from Soeren Sonnenburg <bugreports <at> nn7.de> -----

Date: Wed, 05 May 2004 07:50:52 +0200
From: Soeren Sonnenburg <bugreports <at> nn7.de>
To: Debian Bug Tracking System <submit <at> bugs.debian.org>
Subject: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.

Package: bogofilter
Version: 0.17.5-1
Severity: important
Tags: patch

bogofilter prints the message "Invalid buffer size, exiting." then segfaults on my daily:
( find spam/cur -type f -a -print0 | xargs -0 cat ) | bogofilter -s -d /tmp/.bogofilter_new

gdb analysis is:

Invalid buffer size, exiting.

Program received signal SIGABRT, Aborted.
0x4011b571 in kill () from /lib/libc.so.6
(gdb) bt
#0  0x4011b571 in kill () from /lib/libc.so.6
#1  0x4011b315 in raise () from /lib/libc.so.6
#2  0x4011c838 in abort () from /lib/libc.so.6
#3  0x0804ddbe in xfgetsl (buf=0x832a1bf "\n", max_size=0, in=0x807cae0, no_nul_terminate=1) at fgetsl.c:32
#4  0x0804c517 in buff_fgetsl (self=0xbffff8c0, in=0x0) at buff.c:63
#5  0x0804bd81 in mailbox_getline (buff=0xbffff8c0) at bogoreader.c:452
(Continue reading)

Clint Adams | 5 May 17:07 2004
Picon

[bugreports <at> nn7.de: Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]]

I'm guessing that this got eaten by ezmlm as usual.

----- Forwarded message from Soeren Sonnenburg <bugreports <at> nn7.de> -----

Date: Wed, 05 May 2004 15:41:27 +0200
From: Soeren Sonnenburg <bugreports <at> nn7.de>
To: Clint Adams <schizo <at> debian.org>
Cc: bogofilter-dev <at> aotto.com, 247434-forwarded <at> bugs.debian.org
Subject: Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with
	Invalid buffer size, exiting.]

On Wed, 2004-05-05 at 15:35, Clint Adams wrote:
> I can't reproduce this, though I didn't try very hard.

actually I can but I don't know where it is caused... it must be some
very malformed spam mail...

In any case it is a good idea to just fix the bail out on zero read
condition. I hope you agree with that on me... however I am not 100%
certain that the below patch won't do any harm...

Regards,
Soeren.

> ----- Forwarded message from Soeren Sonnenburg <bugreports <at> nn7.de> -----
> 
> Date: Wed, 05 May 2004 07:50:52 +0200
> From: Soeren Sonnenburg <bugreports <at> nn7.de>
> To: Debian Bug Tracking System <submit <at> bugs.debian.org>
> Subject: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.
(Continue reading)

David Relson | 5 May 18:53 2004

Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

Clint,

"Invalid buffer size" is a message generated by the flex code.  In the
past I've seen it when the incoming message is b0rked in some way.  To
fully evaluate the patch, I need a copy of the original message. 
Sending it in gzipped would be good.

'Til then, I'll give the patch a quick test to see ensure it does no
harm.

David

Matthias Andree | 5 May 23:16 2004
Picon
Picon

Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

Clint Adams schrieb am 2004-05-05:

> #2  0x4011c838 in abort () from /lib/libc.so.6
> #3  0x0804ddbe in xfgetsl (buf=0x832a1bf "\n", max_size=0, in=0x807cae0, no_nul_terminate=1) at fgetsl.c:32

Ah, one of my abort()-sentinels triggered. max_size=0 looks evil.

fgetsl.c:

    21  int xfgetsl(char *buf, int max_size, FILE *in, int no_nul_terminate)
    22  {
    23      int c = 0;
    24      char *cp = buf;
    25      char *end = buf + max_size;                         /* Physical end of buffer */
    26      char *fin = end - (no_nul_terminate ? 0 : 1);       /* Last available byte    */
    27
    28      if (cp >= fin) {
    29          fprintf(stderr, "Invalid buffer size, exiting.\n");
    30          abort();
    31      }

> a patch that checks for this zero read condition fixes the
> "Invalid buffer size, exiting."

I think there is something wrong if that function tries a zero read, and
I wonder if covering the problem with such a patch as you suggest will
bring us another problem, of a potentially unterminated loop.

--

-- 
Matthias Andree
(Continue reading)

David Relson | 5 May 23:18 2004

Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

On Wed, 5 May 2004 23:16:06 +0200
Matthias Andree wrote:

> Clint Adams schrieb am 2004-05-05:
> 
> > #2  0x4011c838 in abort () from /lib/libc.so.6
> > #3  0x0804ddbe in xfgetsl (buf=0x832a1bf "\n", max_size=0,
> > in=0x807cae0, no_nul_terminate=1) at fgetsl.c:32
> 
> Ah, one of my abort()-sentinels triggered. max_size=0 looks evil.
> 
> fgetsl.c:
> 
>     21  int xfgetsl(char *buf, int max_size, FILE *in, int
>     no_nul_terminate) 22  {
>     23      int c = 0;
>     24      char *cp = buf;
>     25      char *end = buf + max_size;                         /*
>     Physical end of buffer */ 26      char *fin = end -
>     (no_nul_terminate ? 0 : 1);       /* Last available byte    */ 27
>     28      if (cp >= fin) {
>     29          fprintf(stderr, "Invalid buffer size, exiting.\n");
>     30          abort();
>     31      }
> 
> > a patch that checks for this zero read condition fixes the
> > "Invalid buffer size, exiting."
> 
> I think there is something wrong if that function tries a zero read,
> and I wonder if covering the problem with such a patch as you suggest
(Continue reading)

Matthias Andree | 5 May 23:43 2004
Picon
Picon

Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

David Relson schrieb am 2004-05-05:

> As such, it doesn't add much to bogofilter.  I've requested a copy of
> the original message so that I can dig into what's actually going wrong.

I wonder if a line size exactly hits the (remaining at that time?)
buffer size and some part of the code tries to "read more" into an
exhausted buffer space.

--

-- 
Matthias Andree

Encrypted mail welcome: my GnuPG key ID is 0x052E7D95

David Relson | 6 May 13:35 2004

Re: [bugreports <at> nn7.de: Bug#247434: bogofilter segfaults with Invalid buffer size, exiting.]

Hello Soeren,

To summarize your finding:  bogofilter is fine with individual messages
and has a problem with mailboxes.

The situation is, I believe, that you have two messages "A" and "B".
Scored separately all is fine.  However if they're together in a
mailbox, i.e. "AB", bogofilter segfaults.  Probably what is happening is
that "A" is a mime multipart message with a block of encoded text (qp,
base64, or uuencoded) and the message isn't properly terminated.  This
causes bogofilter's mailbox processing code to be "confused" and results
in the segfault you've seen.  If possible, I'd like to have copies of
messages "A" and "B", and I'll give you a couple of fairly easy ways to
isolate them.

Run "cat foo2 | bogofilter -v", which will print an X-Bogosity line for
each message scored.  Counting the lines will give you the number "N" of
successfully read/scored messages.  Messages "A" and "B" are probably
messages "N+1" and "N+2", though they may be a bit earlier, i.e. "N", or
later, i.e. "N+2" or "N+3".  Extract those messages into their own
mailbox "AB" and run "cat AB | bogofilter -v" which should segfault.  If
it doesn't, try a slightly larger group of messages.  

The second method is a binary search.  Split foo2 into halves and run
bogofilter for each half (to find which has the problem).  Repeat the
split/run sequence until you can isolate "A" and "B".  I've successfully
used the "split" command, as in "split -l 100000", to divide a large
mailbox into chuncks for this kind of search.  The splitting technique
is admittedly crude and somewhat complicated because the problem message
may be broken up during the splitting processing.
(Continue reading)

David Relson | 10 May 03:36 2004

Bogofilter-0.90.0 - New Current Release

Greetings,

Bogofilter-0.90.0 has been released.  It provides minor code fixes and
clean-ups and documentation updates.  Robinson's Effective Size Factor
(ESF) is included in bogofilter and bogotune.  

ESF is described in Gary Robinson's "Rants" and in his article "Handling
Redundancy in Email Token Probabilities".  They can be found at
http://www.garyrobinson.net/2004/04/improved_chi.html and
http://garyrob.blogs.com//handlingtokenredundancy94.pdf

Current plans are for release 0.91 and 0.92 to provide BerkeleyDB
Transactions (for database security and integrity) and use libtool for
determining proper pathing for BerkeleyDB's libraries.  

After that comes 1.0rc1 !!

The files available at http://sourceforge.net/projects/bogofilter for
download.

Here are the md5sums for the release:

f656425233113f7f2541e4205dc07e74  bogofilter-0.90.0-1.i586.rpm
d575a4d89471a4975cd2280d0de61f8e  bogofilter-0.90.0-1.src.rpm
2e1b9f9c9e7569344391b4e297b2b1ad  bogofilter-0.90.0.tar.bz2
fa643169dd298a842dabd43b8ea5c4a1  bogofilter-0.90.0.tar.gz
a1659e7317a3951ced1a4096f579d279  bogofilter-static-0.90.0-1.i586.rpm

Also of note, the mailing lists are now hosted at bogofilter.org.  Many
thanks to Adrian Otto who has hosted the lsits and been the postmaster
(Continue reading)

Ihunda | 12 May 18:15 2004

What about html_reorder ?

Hi all,

  I am working on getting the most out of the bogofilter lexer
speed wise. The less malloc, the less parsing, the better :)
  That's why this line confuses me:

  <HTML>{TOKEN_12}({HTMLTOKEN})+/{NOTWHITESPACE}    { html_reorder();
return TOKEN;}

And the code after html_reorder that malloc, swap memory and call yyunput.
I do all the mime parsing before sending each decoded mime part to the
lexer, setting
the initial state myself. For example, for an HTML part, the lexer is
called with initial
state HTML and the buffer is the HTML part itself.

That sometimes give me a nice bug otherwise not seen:
*flex* scanner push-back *overflow

*Which means that the unput went too far and stepped outside of the
buffer. That didn't throw
an error before (when the parser handled the whole email) because there
was some data before
the HTML part but that doesn't mean that the bug didn't exist, it just
didn't crash :)).

To solve two problems in a row (Yeah, I am that kind of person), what
about getting rid of
this all html_reorder thing ?

(Continue reading)


Gmane