3 Oct 2003 09:48
ignoring text part of multipart/alternative?
Allyn Fratkin <allyn <at> fratkin.com>
2003-10-03 07:48:09 GMT
2003-10-03 07:48:09 GMT
hi, folks, long time no talk. i have been getting pretty good results with bogofilter, and in addition to my personal email for myself and my wife, i also have a multi-user bogofilter installation for 200+ users running at work, also with very good results. i've been getting a few false negatives lately and they are almost always of the same type: a multipart/alternative message with a long non-spam story or textbook excerpt in the plain text part, followed by a spam in the html part. the non-spamminess and length of the text part is causing the message to be misclassified. this is one case that the graham algorithm handles just fine since many of the words in the spammy part are very strong spam indicators. my mailer (mozilla) ignores the text part of a multipart/alternative. perhaps bogofilter should too? obviously this would be an option and multipart/alternative would need to be handled differently from multipart/mixed, which currently is not. thoughts? -- -- Allyn Fratkin allyn <at> fratkin.com Escondido, CA http://www.fratkin.com/ --------------------------------------------------------------------- FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com For more commands, e-mail: bogofilter-dev-help <at> aotto.com(Continue reading)
It'd probably be best to pass an option to bogofilter that lets the user
specify which multipart/alternative subpart (s)he wants scored, and
preset the defaults to that what I expect most spammers to aim for:
Outlook Express defaults.
There may be more complicated schemes (combining spamicity of individual
subparts, HTML, plain, enriched, DOC, you name it). I won't likely have
the time to do the necessary R&D before spring 2004.
RSS Feed