Tom Marshall | 1 Mar 2004 04:32

Re: Training

On Sun, Feb 29, 2004 at 05:00:21PM +0000, Mark wrote:
> Hello All,
> 
> The docs that come with bmf suggest using Spamassassin to train off
> which seems to make allot of sense. But can anyone suggest when to
> stop training bmf? Is there a magical number of email messages that
> are sufficient training? Or perhaps bmf can be trusted to catagorise
> email after a certain ammount of data has been training in.

I suggest stopping training when it starts to recognize the majority of your
spam.

--

-- 
Absurdity, n.: A statement or belief manifestly inconsistent with one's own
opinion.
        -- Ambrose Bierce, "The Devil's Dictionary"
Mark | 1 Mar 2004 11:21

Re: Training

Tom,

Thanks for replying to my email.

> I suggest stopping training when it starts to recognize the majority of your
> spam.

If I'm not manually training bmf it would be nice for it to start
catagorising email itself automatically once it had been trained.
Ultimately I would like to include it in a 'roll your own' spam
solution that would use Bayesian as well as RBL's and spam signatures.

Which leads me to my next question - is there a problem using one
.bmf directory for an entire domain? Since more than one domain
will probably be using the end product it would be allot neater
(and smaller) to have one good and one bad list per domain.

Thanks,

Mark. 

-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
Tom Marshall | 2 Mar 2004 00:04

Re: Training

> > I suggest stopping training when it starts to recognize the majority of your
> > spam.
> 
> If I'm not manually training bmf it would be nice for it to start
> catagorising email itself automatically once it had been trained.
> Ultimately I would like to include it in a 'roll your own' spam
> solution that would use Bayesian as well as RBL's and spam signatures.

If you can get a 30-day history of email for the user(s), that would
probably be sufficient.

> Which leads me to my next question - is there a problem using one
> .bmf directory for an entire domain? Since more than one domain
> will probably be using the end product it would be allot neater
> (and smaller) to have one good and one bad list per domain.

There is no technical problem with using one database per domain.  If you
have only one user or a couple of users for that domain it's likely to work
pretty well.  But the more users sharing the database, the less likely it
will be to recognize spam.  Each user's email has a unique mix of words. 
Training is effectively finding that mix of word weights that the user
receives in their normal correspondence.  More users means a more ambiguous
set of word weights.  For example, one person may frequently receive HTML
email (HTML tokens are excellent spam identifiers) and skew the results for
others who rarely receive HTML email.

--

-- 
"What are we going to do tonight, Bill?"
"Same thing we do every night Steve, try to take over the world!"
(Continue reading)

Andre Berger | 22 Mar 2004 09:48
Picon

Provider-added headers

Hi there!

My provider has finally installed an Amavis/Spamassassin combination
on their server. Sometimes it catches the spam correctly and marks
it, sometimes it doesn't. I'm wondering now if this "unreliable"
recognition and marking will cause a problem with bmf's statistical
analysis in the long run, as all incoming messages go thru "bmf -p".
Is it necessary to remove the headers first, like "formail -I
'X-Spam'| bmf -p"?

-Andre

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
Tom Marshall | 23 Mar 2004 04:56

Re: Provider-added headers

On Mon, Mar 22, 2004 at 09:48:30AM +0100, Andre Berger wrote:
> Hi there!
> 
> My provider has finally installed an Amavis/Spamassassin combination
> on their server. Sometimes it catches the spam correctly and marks
> it, sometimes it doesn't. I'm wondering now if this "unreliable"
> recognition and marking will cause a problem with bmf's statistical
> analysis in the long run, as all incoming messages go thru "bmf -p".
> Is it necessary to remove the headers first, like "formail -I
> 'X-Spam'| bmf -p"?

No, bmf will ignore and replace the X-Spam* header(s).

--

-- 
I used to think I was a child; now I think I am an adult -- not because
I no longer do childish things, but because those I call adults are no
more mature than I am.
Andre Berger | 23 Mar 2004 10:42
Picon

Re: Provider-added headers

* Tom Marshall <tommy <at> home.tig-grr.com>, 2004-03-23 09:50 +0100:
> On Mon, Mar 22, 2004 at 09:48:30AM +0100, Andre Berger wrote:
> > Hi there!
> > 
> > My provider has finally installed an Amavis/Spamassassin combination
> > on their server. Sometimes it catches the spam correctly and marks
> > it, sometimes it doesn't. I'm wondering now if this "unreliable"
> > recognition and marking will cause a problem with bmf's statistical
> > analysis in the long run, as all incoming messages go thru "bmf -p".
> > Is it necessary to remove the headers first, like "formail -I
> > 'X-Spam'| bmf -p"?
> 
> No, bmf will ignore and replace the X-Spam* header(s).

Thanks. - I presume this goes for "*****Spam*****" headers as well?

-Andre

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click

Gmane