Re: Xapian-assisted spam reporting for fun and glee
Jouni K. Seppänen <jks <at> iki.fi>
2006-12-01 13:11:30 GMT
Lars Magne Ingebrigtsen <larsi <at> gnus.org> writes:
> I narrowed the buffer to the From field that seemed to fit the
> pattern (to avoid messages that were taking about the spam (like
> this message
),
This didn't work quite perfectly in all cases; check your spam report
approval logs for (a small number of) rejected reports. Some of the
messages discussed a piece of spam received by the mailing list,
e.g. warning people not to reply to the "remove" address, and in the
process quoted pieces of the spam. In at least one case, someone's
signature quoted a discussion on spam, and at least one message was
about debugging some mail software (Horde?) where the debug output
included parts of spams. There was also some mailing list (Cocoon
documentation or something like that) where the project's wiki sends
diffs of changed pages, and when someone had removed spam links from
the wiki, the diff included bad words.
On some mailing lists, there is already pretty good spam filtering,
and hitting them with this kind of semi-automatic reporting is likely
to cause false positives. I wonder if it would be feasible to check
somewhere in the process the historical spamminess rate of the mailing
list, and be more careful with less spammy groups. The check could be
a part of the automatic reporting process, or there could be some
hints to report approvers.
One indicator of non-spamminess that I imagine would currently work is
if the message has an In-Reply-To or References header with a
message-id pointing to a previous message on the mailing list. Since
you already do threading in the web interface, you probably have the
(Continue reading)