A few questions about ifile features compared to other classifiers
Xavier DUTOIT <xavier <at> sydesy.com>
2003-09-22 15:59:54 GMT
Dear all,
I've been looking at lots of bayesian filtering OSS around, and yours
seem to be one of few multi-purpose (with POPFile) as opposed to
"simple" antispam tools. I think ifile is the most interresting one to
do what I want (a generic mail classifier on the server), but compared
to others (mostly spam based), it seems to be not as complete as the
other. I've seen some features that seem to be quite usefull, could
you tell me what you thing about them (ie. I don't find them usefull,
to complicated to include, working on it, since you've asked, try to
add such a feature...) :
1) storage based on a real database (
Berkeley DB for instance) instead
of your file format ?
Do you think it would improve its performance ?
2) Mail parsing .
Features like recognition and decoding of MIME attachments in
quoted-printable and base64 encoding, Ignores HTML tags in emails,
handling things like V'I'A'G'R'A (random choosed example ;), Scores
only the Received, Subject, To, From, and Cc headers...
Well, if I'm correct, the only thing you can do right now is either
parsing the header in full or ignoring it. Althrough It arguable about
where to put the mail parser code (should it be done elsewhere that in
ifile ?), I feel that it is important to take the mail formats
specificities into account when analysing its content.
Have you tried to adapt the code writed in bogofilter for instance to
add such features to ifile ? Do you think It's worth trying (I'm
volunteering) ?
3) A last thing about sort accuracy.
I read in one page that some of you reached 96% accurate
classification. How have you calculated that ? Do you all have such
high percentage ?
Thanks in advance,
Xavier
_______________________________________________
Ifile-discuss mailing list
Ifile-discuss <at> nongnu.org
http://mail.nongnu.org/mailman/listinfo/ifile-discuss