David Relson | 1 Mar 2003 02:20
Favicon

buff_t cleanup

Matthias,

I did a bit of cleaning up of the buff_t code.

Since struct field 'pos' is never used, I removed it.

The t.leng field shows how much data is actually _in_ the buffer and any 
additional data added to the buffer is at the end of the buffer's 
data.  Since this is so, buff_fgetsl() now uses buff->t.leng when it calls 
xfgetsl() to read more text.  This change also allowed removing any code 
(outside of buff.c) that set buff->read.

However, bogofilter does need to know about the most recently read line (to 
save it for passthrough and to print it for debugging), so buff_fgetsl() 
savees the start position of the line in buff->read.

These changes put more of the "bookkeeping code" for buff_t in buff.c and 
simplify the use of the struct.  This should help maintain the code.

David

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

Matthias Andree | 1 Mar 2003 02:52
Picon
Picon

Re: buff_t cleanup

On Fri, 28 Feb 2003, David Relson wrote:

> However, bogofilter does need to know about the most recently read line (to 
> save it for passthrough and to print it for debugging), so buff_fgetsl() 
> savees the start position of the line in buff->read.

OK. How do I use that buff->read, more precisely: when is buff->read
cleared (reset to 0)?

> These changes put more of the "bookkeeping code" for buff_t in buff.c and 
> simplify the use of the struct.  This should help maintain the code.

Thanks.

--

-- 
Matthias Andree

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

David Relson | 1 Mar 2003 03:07
Favicon

Re: buff_t cleanup

At 08:52 PM 2/28/03, Matthias Andree wrote:

>On Fri, 28 Feb 2003, David Relson wrote:
>
> > However, bogofilter does need to know about the most recently read line 
> (to
> > save it for passthrough and to print it for debugging), so buff_fgetsl()
> > savees the start position of the line in buff->read.
>
>OK. How do I use that buff->read, more precisely: when is buff->read
>cleared (reset to 0)?
>
> > These changes put more of the "bookkeeping code" for buff_t in buff.c and
> > simplify the use of the struct.  This should help maintain the code.
>
>Thanks.

Matthias,

It's set internally within buff.c when data is read into the buffer and is 
used for for identifying the contents of the most recently read line.  The 
two uses in the code are when creating a textblock_t (for passthrough mode) 
and when displaying the line (for debugging purposes).

A buff_t is created when flex calls yyinput() to ask for data.  At that 
point buff->read is zero because there's no previous data in the 
buffer.  It becomes non-zero only when data is added to a non-empty buff_t, 
which presently happens during the processing of long (multiline) html 
comments.  When the lexer rules start handling html comments, the C code 
for killing them will probably no longer be necessary.  At that time, 
(Continue reading)

Matthias Andree | 1 Mar 2003 03:32
Picon
Picon

Reversion to former xato?.c code.

Hi,

I've removed the string processing pollutions from the xato?.c files,
the xato?() functions will ONLY succeed when the full string is
parseable. Killing trailing whitespace or trailing "f" characters or
comments doesn't belong into these functions, they are for conversion
only, and allowing trailing whitespace actually made these functions
accept the null string (which is not what we want, and special-casing
this is inefficient and unnecessary).

The functionality we're looking for is now provided for by configfile.c.

This means the user will have to:

1. remove end-of-line comments (until the end-of-line starts in the
   leftmost column ;-)

2. remove trailing "f" characters after floats. They are misleading
   anyways because a) the floating point nature is clearly visible from
   the dot (or exponent), b) a trailing "f" in C numeric constant means
   "float" as opposed to our expected format "double".

Along with the most recent format.c fix, the current configfile.c and
xato?.c files parse bogofilter.cf.example.

--

-- 
Matthias Andree

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
(Continue reading)

Matthias Andree | 1 Mar 2003 16:11
Picon
Picon

flex/bison config file parser

Hi,

I have some stuff to let bison and flex build our parser, all that's
left for us is intergrate. Do we want to go that way and depend on bison
at packaging time and ship the ready-made parser_config.tab.[hc] just as
we ship ready-made parsers' c code (the lexer*.c stuff)?

Do we want to keep our current ad-hoc parser?

If you want to try that code, you'll need bison or yacc in addition to
the tools you usually need to build bogofilter from CVS.

To compile:

flex lexer_config.l

bison -d parser_config.y
# alternative: yacc -d -b parser_config parser_config.y

gcc -g -O -DYYDEBUG=1 -o parser_config \
    parser_config.tab.c lex.lexer_config_.c -I. xmalloc.o xmem_error.o

To try:

./parser_config <../bogofilter.cf.example

--

-- 
Matthias Andree
(Continue reading)

David Relson | 1 Mar 2003 16:36
Favicon

Re: flex/bison config file parser

At 10:11 AM 3/1/03, Matthias Andree wrote:
>Hi,
>
>I have some stuff to let bison and flex build our parser, all that's
>left for us is intergrate. Do we want to go that way and depend on bison
>at packaging time and ship the ready-made parser_config.tab.[hc] just as
>we ship ready-made parsers' c code (the lexer*.c stuff)?
>
>Do we want to keep our current ad-hoc parser?

It's been years since I used yacc.  In the early 90's I worked for a 
company writing CASE (Computer Aided Software Engineering) tools and used 
yacc in a project.  A few years later I wrote some bookkeeping software for 
charitable organizations and used yacc for parsing their bif (Bingo 
Information File) files, which contained the info on what they sold, what 
games they played, and their payouts.

I'm inclined to apply KISS at this time and vote for our simple ad-hoc 
parser.  If you don't mind, I'll add a function to trim comments from the 
end of the line before passing values to xatox().

>If you want to try that code, you'll need bison or yacc in addition to
>the tools you usually need to build bogofilter from CVS.
>
>To compile:
>
>flex lexer_config.l
>
>bison -d parser_config.y
># alternative: yacc -d -b parser_config parser_config.y
(Continue reading)

David Relson | 1 Mar 2003 16:42
Favicon

memory usage during registration

Matthias,

I usually have a copy of top running in a window and have noticed that 
during registration of a large mbox file (tens of MB), the SIZE of 
bogofilter gets quite large (on the order of 10 times the mailbox 
size).  At the end of the run, the system recovers all the space.  As 
bogofilter's normal usage is to classify messages, not to register large 
maiboxes, I'm not sure whether we should worry about this, or not.  What do 
you think?

David

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

David Relson | 1 Mar 2003 20:02
Favicon

config file error message

Matthias,

I've modified the error reporting in config file handling.

Consider the following, simple config file:

$ cat test.cf
ham_cutoff=0.22 # comment
spam_cutoff'...'0.66x'

Here's the output of running the current code:

[relson <at> osage src]$ ~/cvs/src/bogofilter -c test.cf < /dev/null
cannot parse double value '0.22 # comment'
cannot parse double value '0.66x'

And the output of the new code:

[relson <at> osage src]$ ~/cvs/src/bogofilter -c test.cf < /dev/null
test.cf:2:  Error - bad parameter in 'spam_cutoff...0.66x'

FWIW, if bogofilter  finds an error, it will always print a message.

David

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com
(Continue reading)

Matthias Andree | 4 Mar 2003 01:04
Picon
Picon

FreeBSD sparc64-5-latest issue with bogofilter 0.10.3.1

Hi,

bogofilter hasn't passed "make check" on FreeBSD 5-LATEST on sparc64:

http://bento.freebsd.org/errorlogs/sparc64-5-latest/bogofilter-0.10.3.1.log

| Making check in bogofilter
| make  check-TESTS
| PASS: t.lexer.mbx
> FAIL: t.robx
| SKIP: t.valgrind
| PASS: t.split
| PASS: t.systest
| PASS: t.grftest
| ======================
| 1 of 5 tests failed
| (1 tests were not run)
| ======================
| *** Error code 1

I don't know the reason yet, I'll try to get some diagnosis code in and
I'll also try to break Solaris-8 with sparc64 code (SunPro for now).

4-STABLE (i. e. 4.8-RC1) for i386 is fine.

The other issue is that the FreeBSD ports tree will be frozen tomorrow
(March 5, 2100 PST, March 6 0600 GMT IIRC), so if we want a new
bogofilter version (newer than 0.10.3.1) in, we should release a new
version and send the ports diff right now (if that is to happen, Cc: me
privately so I see that in time!). Fixes may still be accepted later.
(Continue reading)

David Relson | 4 Mar 2003 02:05
Favicon

Re: FreeBSD sparc64-5-latest issue with bogofilter 0.10.3.1

Matthias,

Like many of the other regression tests, t.robx can be run with a "-v" 
(verbose) flag.  Using "-v" saves all the generated files in 
"robx.YYYYMMDD".  If you can run the test and send me a tarball of the 
output, I can likely determine the cause of the failure.  From past 
experience with t.robx I've learned that most failures are due to slight 
differences in parsing, which leads to differences in token spamicity 
values.  A different value for even 1 token will cause a different robx 
value to be computed.  In turn, that will cause the test to fail.

To summarize, I need the output files (most especially spam.2 and good.2) 
to determine what's wrong.

On the subject of a new build, I think we're about ready for 0.11.  I was 
going to give you and Nick a chance to get in a last set of changes and 
build later this week.  However, I can do a build now (tonight) or 
tomorrow.  Seems like if I do it by noon EST, i.e. 17:00 GMT, we'll can 
make the FreeBSD cutoff date.

David

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com


Gmane