Anne Wilson | 21 May 15:45

Bogotrainer lost

For a very long time I've been training bogofilter with the command

bash /usr/share/bogofilter/contrib/contrib/trainbogo.sh -c -H 
/home/anne/Maildir/.INBOX.Bogotrain_ham/cur/ -S 
/home/anne/Maildir/.INBOX.Bogotrain_spam/cur

However, since I installed Scientific Linux 6 I find that I don't appear to have 
that file.  I'm running bogofilter-1.0.2-6.el6.x86_64.

I've got quite a store of training emails now.  Could you please help me get 
back on track?

Thanks

Anne
_______________________________________________
Bogofilter mailing list
Bogofilter <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

Jonathan Kamens | 1 Nov 03:41
Picon
Gravatar

New version of bogofilter-milter.pl

For those of you who use bogofilter-milter.pl, the Milter implementation 
of bogofilter filtering I wrote, there is a new version available at 
http://stuff.mit.edu/~jik/software/bogofilter-milter/bogofilter-milter.pl.txt 
<http://stuff.mit.edu/%7Ejik/software/bogofilter-milter/bogofilter-milter.pl.txt> 
which you should replace the version you're using with (David, please 
take this new version for the contrib directory in the bogofilter 
distribution).

This fixes a bug in the handling of incoming messages larger than one 
million bytes, or whatever you change the $MAX_INCORE_MSG_LENGTH limit 
to in the script, if you change it. In particular, the bug caused 
messages at least this large to be "chunked" into blocks of that many 
bytes, with only the last chunk preserved to be fed into bogofilter 
and/or archived in $archive_mbox or $ham_archive_mbox.

I am grateful for Stephen Davies for not only pointing out the bug to 
me, but also doing a great deal of troubleshooting on it and pointing me 
right at the problematic code. Even with his help, however, I had to 
stare at the code for several hours before I figured out the problem. I 
hate bugs like that. :-/

As always, please let me know if you have any questions, comments or 
suggestions about bogofilter-milter.

Regards,

Jonathan Kamens

_______________________________________________
Bogofilter mailing list
(Continue reading)

Tweeks | 24 Oct 06:21

Are Bogon IP addresses uses for bogosity, and where are they kept?

Couple questions...

Q1) I just started using bogofilter on my Kmail install, and it's flagging 
all mail coming from my mail server as spam. My server is not blacklisted 
anywhere, but IS in an old bogon IP range (98.129) that was released for use 
by the IANA some time back in 2006 (098/8)... and my fear is that my 
server's x-bogon is still being flagged by bogofilter as a bogon.

If this is not the case, someone please show me where any bogon lists are 
kept and how I can verify that my MTA's primary IP is not in such a list.

Q2) If I accidentally marked some HAM as spam.. how do I fix this? I can't 
find any good tutorials or anything on using bogotune (if that's eaven the 
right tool).

Thanks!

Tweeks

_______________________________________________
Bogofilter mailing list
Bogofilter <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

David Relson | 3 Sep 23:11
Favicon

Re: importing words from popfile in to bogofilter

On Sat, 3 Sep 2011 06:47:14 -0700 (PDT)
Joseph Harth wrote:

> Thanks David, What I did and i dont know if it worked was. I copied
> all the words into a message inside a mbox file and the loaded them
> as spam/ ham respectively but this is probably wrong?

Hi Joseph,

I think what you've done may be sub-optimal, possibly not even useful.

Part of the bayesian nature of bogofilter is to know how often words
appear in spam and ham.  In particular, bogofilter likes to know that
"xxx" appears in y% of spam messages and in z% of ham messages.  With
these numbers, the appearance of "xxx" can be judged as good or bad.

Consider the following

   .MSG_COUNT 1000 100
   xxx 500 90

These values indicate that 
 1000 spam have been processed and 500 of them had xxx, for a 50% score
  100 ham with 90 having xxx, for a score of 90%.

With numbers like the above, the appearance of xxx indicates the
message is more likely good than bad.

With the wordlist you've created, run "bogoutil -d wordlist.db" to
display your wordlist as text and see if you like the results.  
(Continue reading)

David Relson | 3 Sep 14:20
Favicon

Re: importing words from popfile in to bogofilter

On Fri, 2 Sep 2011 09:54:56 -0700 (PDT)
Joseph Harth wrote:

> Hi David 
> I exported my word datbase to two files. spam.txt and ham.txt. This
> list hast all the workds in my database repeted as many times as they
> were repeted in the database. How cam I get this words in bogo
> filter? It is just a plain text file with words. I also filter the
> files for dictionary words with aspell. 
> 

"bogoutil -l wordlist.db token_file.txt" will load the entries in
token_file.txt into wordlist.db

The format of token_file.txt is:

token1 spam_count ham_count date
token2 spam_count ham_count date
.MSG_COUNT spam_messages ham_messages date

Where spam_count is the number of times the token has been seen in
spam messages and ham_count is the count for ham messages.

Where spam_messages and ham_messages are the number of spam and ham
messages processed in building the wordlist.  If popfile doesn't have
that information, you'll have to make up this information.  Note: a
reasonable estimate for spam_messages might be double (2x) the largest
spam_count and 2x ham_count for ham_messages.

The date field is the date the tokens are entered into the wordlist.
(Continue reading)

Anne Wilson | 23 Aug 18:08

Re: Getting bogotrain back

On Tuesday 23 Aug 2011 David Relson wrote:
> Hello Anne,
> 
> My Gentoo system has trainbogo.sh in /usr/share/bogofilter/contrib.
> Being too lazy to check trainbogo.sh for dependencies, I've zipped the
> complete contrib directory and have attached the zip file.
> 
> Have fun!
>
Thanks David.  That solves the problem :-)  Training continues, as always.

 Anne
_______________________________________________
Bogofilter mailing list
Bogofilter <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
Anne Wilson | 23 Aug 11:31

Getting bogotrain back

My mail server has been upgraded to CentOS 6.  Bogofilter is installed and 
appears to be tagging correctly.  However, I'm having problems with retraining 
-

bash /usr/share/bogofilter/contrib/contrib/trainbogo.sh -c -H 
/home/anne/Maildir/.INBOX.Bogotrain_ham/cur/ -S 
/home/anne/Maildir/.INBOX.Bogotrain_spam/cur/
bash: /usr/share/bogofilter/contrib/contrib/trainbogo.sh: No such file or 
directory

On checking I find that there is no /usr/share/bogofilter directory, and no 
amount of searching is finding the scripts.  Unfortunately I don't appear to 
have archived the messages where you originally helped me set this up.

How can I get this working again?

Anne
--

-- 
New to KDE Software? - get help from http://userbase.kde.org
_______________________________________________
Bogofilter mailing list
Bogofilter <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
Jonathan Kamens | 23 Jun 03:23
Picon
Gravatar

bogotune claims too few messages despite >3000 ham and >5000 spam

I'm trying to tune bogofilter and can't get bogotune to work reliably. 
Below is a transcript.

Why is bogotune having trouble locking onto good settings, when I've got 
3121 ham messages and 5818 spam messages? Also, the settings I'm 
currently using, which were generated by an earlier, successful run of 
bogotune about four months ago, are working just fine, with my spam 
detection rate at above 99%. I've noticed a few more false positives 
than I prefer when I receive email from new entities, which is why I'm 
trying to retune.

Before bogotune was having this particular problem, it was having 
another one... it kept reporting that it couldn't read my wordlist.db. 
That problem went away after I used bogoutil to remove tokens from the 
word list that haven't been seen in 180 days, a maintenance task I do 
periodically to keep the size of the word list reasonable.

I've also recently dumped and reloaded the word list into a new file, 
which brought its size down from 9MB to 3MB, but that didn't help bogotune.

Oh, and I should mention that I actively, regularly retrain bogofilter 
with ham and spam, including fixing any mischaracterizations, so my word 
list is extremely accurate.

Thanks for any advice you can provide.

   jik

$ bogotune -v -T 0 -n /tmp/notspam -s /tmp/bogospam
Reading /home/jik/.bogofilter/wordlist.db
(Continue reading)

Doug | 19 Jun 05:55
Picon
Favicon

How to troubleshoot new installation

I have been running Bogofilter for several years and really like it. It works very well.

I brought up bogofilter on a new system -

This is the new system - 

slate:/usr/local/bin # bogofilter -V
bogofilter-sqlite version 1.2.2
    Database: SQLite 3.7.5
Copyright (C) 2002-2010 David Relson, Matthias Andree
Copyright (C) 2002-2004 Greg Louis
Copyright (C) 2002-2003 Eric S. Raymond, Adrian Otto, Gyepi Sam

And this is the old system, I am currently using -

bogofilter-sqlite version 1.2.2
    Database: SQLite 3.7.3
Copyright (C) 2002-2010 David Relson, Matthias Andree
Copyright (C) 2002-2004 Greg Louis
Copyright (C) 2002-2003 Eric S. Raymond, Adrian Otto, Gyepi Sam

I copied the database (wordlist.db) from the old system to the new. I am using the same procmail script. 

# filter mail through bogofilter, tagging it as Ham, Spam, or Unsure,
# and updating the wordlist

# :0fw: bogofilter.lock
# | /usr/local/bin/bogofilter -u -e -p -l -d /home/admin/.bogofilter

# if bogofilter failed, return the mail to the queue;
(Continue reading)

Martín Marqués | 12 May 16:54
Picon

DB missmatch

I found out the other day that I have some kind of database missmatch.

Trying to update spam and ham I get this:

$ /usr/bin/bogofilter -nS < ~/mail/newham
Program version 5.1 doesn't match environment version 4.8
bogofilter[10955]: cannot join environment: DB_VERSION_MISMATCH:
Database environment version mismatch
$ /usr/bin/bogofilter -sN < ~/mail/newspam
Program version 5.1 doesn't match environment version 4.8
bogofilter[11001]: cannot join environment: DB_VERSION_MISMATCH:
Database environment version mismatch

And I can't find a way to dump with 4.8 and restore with 5.1 my wordlist.

I'm on Debian testing and there is no db5.1-utils, which is quite odd.

Any solutions?

--

-- 
Martín Marqués
select 'martin.marques' || '@' || 'gmail.com'
DBA, Programador, Administrador
_______________________________________________
Bogofilter mailing list
Bogofilter <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
Seth David Schoen | 7 May 23:24
Picon

bogofilter 1.2.2 crashes with "flex scanner push-back overflow"

Hi,

(I sent this message yesterday but it didn't make it to the list,
maybe because the original version had an attachment.)

I'm using Ubuntu Maverick's bogofilter 1.2.2 package to filter spam.
Today I found that bogofilter was crashing with the error "flex scanner
push-back overflow" and not filtering spam.  I identified the two
particular spam messages that were causing this problem, and I found
that they would make bogofilter crash every time.  I've also confirmed
that they make bogofilter 1.2.2 crash on another machine, running FreeBSD,
with no wordlist.db, so I think there is a real bug here.

The spam messages seem to consider of a huge number of (long) separately
koi8-encoded tokens.  Their contents were identical except for the date
and recipient address.  I've posted one of the original messages at

http://www.loyalty.org/~schoen/spam.bz2

--

-- 
Seth Schoen
Senior Staff Technologist                         schoen <at> eff.org
Electronic Frontier Foundation                    https://www.eff.org/
454 Shotwell Street, San Francisco, CA  94110     +1 415 436 9333 x107
_______________________________________________
Bogofilter mailing list
Bogofilter <at> bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter

(Continue reading)


Gmane