Allyn Fratkin | 3 Oct 2003 09:48

ignoring text part of multipart/alternative?

hi, folks, long time no talk.  i have been getting pretty
good results with bogofilter, and in addition to my personal
email for myself and my wife, i also have a multi-user bogofilter
installation for 200+ users running at work, also with very good results.

i've been getting a few false negatives lately and they are almost always
of the same type: a multipart/alternative message with a long non-spam
story or textbook excerpt in the plain text part, followed by a spam
in the html part.  the non-spamminess and length of the text part is
causing the message to be misclassified.  this is one case that the
graham algorithm handles just fine since many of the words in the
spammy part are very strong spam indicators.

my mailer (mozilla) ignores the text part of a multipart/alternative.
perhaps bogofilter should too?  obviously this would be an option
and multipart/alternative would need to be handled differently
from multipart/mixed, which currently is not.

thoughts?
--

-- 
Allyn Fratkin             allyn <at> fratkin.com
Escondido, CA             http://www.fratkin.com/

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

(Continue reading)

David Relson | 3 Oct 2003 13:26
Favicon

Re: ignoring text part of multipart/alternative?

On Fri, 03 Oct 2003 00:48:09 -0700
Allyn Fratkin <allyn <at> fratkin.com> wrote:

> hi, folks, long time no talk.  i have been getting pretty
> good results with bogofilter, and in addition to my personal
> email for myself and my wife, i also have a multi-user bogofilter
> installation for 200+ users running at work, also with very good
> results.
> 
> i've been getting a few false negatives lately and they are almost
> always of the same type: a multipart/alternative message with a long
> non-spam story or textbook excerpt in the plain text part, followed by
> a spam in the html part.  the non-spamminess and length of the text
> part is causing the message to be misclassified.  this is one case
> that the graham algorithm handles just fine since many of the words in
> the spammy part are very strong spam indicators.
> 
> my mailer (mozilla) ignores the text part of a multipart/alternative.
> perhaps bogofilter should too?  obviously this would be an option
> and multipart/alternative would need to be handled differently
> from multipart/mixed, which currently is not.
> 
> thoughts?
> -- 
> Allyn Fratkin             allyn <at> fratkin.com
> Escondido, CA             http://www.fratkin.com/

Greetings Allyn,

It _has_ been a long time and it's good to hear from you.  Glad to hear
(Continue reading)

Matthias Andree | 4 Oct 2003 14:37
Picon
Picon

Re: ignoring text part of multipart/alternative?

Allyn Fratkin <allyn <at> fratkin.com> writes:

> my mailer (mozilla) ignores the text part of a multipart/alternative.
> perhaps bogofilter should too?  obviously this would be an option
> and multipart/alternative would need to be handled differently
> from multipart/mixed, which currently is not.
>
> thoughts?

It depends on the mailer and the user preferences what part of a
multipart/alternative gets displayed. My mailers are usually configured
to prefer the plain text part, with scripts and plugins off in Mozilla,
so I like those multipart/alternative spams that have a blank text part
with HTML junk ;-)

It'd probably be best to pass an option to bogofilter that lets the user
specify which multipart/alternative subpart (s)he wants scored, and
preset the defaults to that what I expect most spammers to aim for:
Outlook Express defaults.

There may be more complicated schemes (combining spamicity of individual
subparts, HTML, plain, enriched, DOC, you name it). I won't likely have
the time to do the necessary R&D before spring 2004.

--

-- 
Matthias Andree

Encrypt your mail: my GnuPG key ID is 0x052E7D95

---------------------------------------------------------------------
(Continue reading)

Matthias Andree | 5 Oct 2003 13:47
Picon
Picon

Re: [bugs][ bogofilter-Bugs-817817 ] BF should decode &#65

On Sat, 04 Oct 2003, SourceForge.net wrote:

> https://sourceforge.net/tracker/?func=detail&atid=499997&aid=817817&group_id=62265
> 
> Submitted By: Tim Freeman (timfreeman)
> Summary: BF should decode &#65
> 
> Initial Comment:
> I received the following spam with a style of HTML 
> 
> obfuscation I have not seen before:
> 
> 
> 
> 
> 
> <font color="#FFFFFD">summon her allies), then the
...
> Y&#111;&#117;&#32;&#78;<!--18-->E&#69;D&#32;&#111;r&#32;R<!--1z-->e<!--7-->fi&#108;l&#115;!!<BR>

How about this patch? It emits those numeric entities as tokens in HTML:

Index: src/lexer_v3.l
===================================================================
RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
retrieving revision 1.94
diff -u -r1.94 lexer_v3.l
--- src/lexer_v3.l	5 Oct 2003 00:14:43 -0000	1.94
+++ src/lexer_v3.l	5 Oct 2003 11:46:34 -0000
 <at>  <at>  -174,6 +174,8  <at>  <at> 
(Continue reading)

David Relson | 5 Oct 2003 14:33
Favicon

Re: [bugs][ bogofilter-Bugs-817817 ] BF should decode &#65

On Sun, 5 Oct 2003 13:47:06 +0200
Matthias Andree <matthias.andree <at> gmx.de> wrote:

> On Sat, 04 Oct 2003, SourceForge.net wrote:
> 

> > <font color="#FFFFFD">summon her allies), then the
> ...
> > Y&#111;&#117;&#32;&#78;<!--18-->E&#69;D&#32;&#111;r&#32;R<!--1z-->e
> > <!--7-->fi&#108;l&#115;!!<BR>
> 
> How about this patch? It emits those numeric entities as tokens in
> HTML:
> 
> Index: src/lexer_v3.l
> ===================================================================
> RCS file: /cvsroot/bogofilter/bogofilter/src/lexer_v3.l,v
> retrieving revision 1.94
> diff -u -r1.94 lexer_v3.l
> --- src/lexer_v3.l	5 Oct 2003 00:14:43 -0000	1.94
> +++ src/lexer_v3.l	5 Oct 2003 11:46:34 -0000
>  <at>  <at>  -174,6 +174,8  <at>  <at> 
>  
>  NOTWHITESPACE	[^ \t\n]
>  
> +HTML_ENTITY		"&#"[[:digit:]]+";"
> +
>  HTML_WI_COMMENTS	"<"[^\>]*">"
>  
>  HTML_WO_COMMENTS	"<"[^!][^\>]*">"|"<>"
(Continue reading)

Evgeny Kotsuba | 5 Oct 2003 15:52
Picon

About slashes and backslashes

Hi,

It seems that
-----------------
src\maint.c  should have
#include "common.h"
-----------------
configure.ac   should be changed slightly
=============
if test "$have_dosish_system" = no; then
AC_DEFINE(DIRSEP_C, '/', [Define directory separator (C character)])
AC_DEFINE(DIRSEP_S, "/", [Define directory separator (C string)])
else
AC_DEFINE(DIRSEP_C, '\\\\')
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
AC_DEFINE(DIRSEP_S, "\\\\")
                       ^^^^^^^^^^^^^^^^^^^^^^^^^
fi
--------------
Also I don't understand what src\system.c -> bool bf_abspath(const char 
*path) should do.
Say if path is  C:\bla\bla and work dir is d:\somedir and there is also 
d:\bla\bla  bf_abspath will return \bla\bla.  This thing will have place 
in all systems with drive letters.
--------------
Next thing is that Windows now as well as many OS/2 applications work 
well with slashes and backslahes in path. For example, Mozilla can show 
file 
file:///M:\Evgen\Inet/Bogofilter/bogofilter/doc/bogofilter-faq.html#train_on_error

(Continue reading)

Evgeny Kotsuba | 5 Oct 2003 16:04
Picon

Re: About slashes and backslashes

Hi,
Evgeny Kotsuba wrote:

> It seems that
> -----------------
> src\maint.c  should have
> #include "common.h" 

Hmm... It has it already
 :-/

SY,
EK

---------------------------------------------------------------------
FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
To unsubscribe, e-mail: bogofilter-dev-unsubscribe <at> aotto.com
For summary digest subscription: bogofilter-dev-digest-subscribe <at> aotto.com
For more commands, e-mail: bogofilter-dev-help <at> aotto.com

David Relson | 5 Oct 2003 16:11
Favicon

Re: About slashes and backslashes

Greetings Evgeny,

On Sun, 05 Oct 2003 16:52:25 +0300
Evgeny Kotsuba <evgen <at> shatura.laser.ru> wrote:

...[snip]...

> -----------------
> configure.ac   should be changed slightly
> =============
> if test "$have_dosish_system" = no; then
> AC_DEFINE(DIRSEP_C, '/', [Define directory separator (C character)])
> AC_DEFINE(DIRSEP_S, "/", [Define directory separator (C string)])
> else
> AC_DEFINE(DIRSEP_C, '\\\\')
>                        ^^^^^^^^^^^^^^^^^^^^^^^^^
> AC_DEFINE(DIRSEP_S, "\\\\")
>                        ^^^^^^^^^^^^^^^^^^^^^^^^^
> fi

You've overlooked the comment that autoconf eats one level of
backslashes.

> --------------
> Also I don't understand what src\system.c -> bool bf_abspath(const
> char *path) should do.
> Say if path is  C:\bla\bla and work dir is d:\somedir and there is
> also d:\bla\bla  bf_abspath will return \bla\bla.  This thing will
> have place in all systems with drive letters.

(Continue reading)

Evgeny Kotsuba | 5 Oct 2003 16:34
Picon

Re: About slashes and backslashes

Hi,
David Relson wrote:

>Greetings Evgeny,
>
>On Sun, 05 Oct 2003 16:52:25 +0300
>Evgeny Kotsuba <evgen <at> shatura.laser.ru> wrote:
>
>...[snip]...
>  
>
>>-----------------
>>configure.ac   should be changed slightly
>>=============
>>if test "$have_dosish_system" = no; then
>>AC_DEFINE(DIRSEP_C, '/', [Define directory separator (C character)])
>>AC_DEFINE(DIRSEP_S, "/", [Define directory separator (C string)])
>>else
>>AC_DEFINE(DIRSEP_C, '\\\\')
>>                       ^^^^^^^^^^^^^^^^^^^^^^^^^
>>AC_DEFINE(DIRSEP_S, "\\\\")
>>                       ^^^^^^^^^^^^^^^^^^^^^^^^^
>>fi
>>    
>>
>
>You've overlooked the comment that autoconf eats one level of
>backslashes.
>  
>
(Continue reading)

David Relson | 5 Oct 2003 17:27
Favicon

Re: About slashes and backslashes

On Sun, 05 Oct 2003 17:34:54 +0300
Evgeny Kotsuba <evgen <at> shatura.laser.ru> wrote:

...[snip]...

> >You've overlooked the comment that autoconf eats one level of
> >backslashes.
> >  
> >
> Ah... I don't use autoconf

Fair enough.  It's a great tool, though I find it difficult to
create/modify the main project file - configure.ac.  With configure.ac
properly set, it becomes very easy to configure projects for supported
operating systems.

> >>--------------
> >>Also I don't understand what src\system.c -> bool bf_abspath(const
> >>char *path) should do.
> >>Say if path is  C:\bla\bla and work dir is d:\somedir and there is
> >>also d:\bla\bla  bf_abspath will return \bla\bla.  This thing will
> >>have place in all systems with drive letters.
> >>    
> >>
> >
> >It's written for Unix systems where a leading '/' indicates an
> >absolute path.  Perhaps for OS/2 the check needs to be for "?:\". 
> >Would that work?
> >
> Something like
(Continue reading)


Gmane