Re: [Puppet-dev] Trac Spam Fighting
David Schmitt <david <at> schmitt.edv-bus.at>
2007-10-02 21:25:40 GMT
On Tuesday 02 October 2007, Luke Kanies wrote:
> On Sep 29, 2007, at 4:32 PM, David Schmitt wrote:
> > After the recent spam attack, I searched a bit around what it
> > currently hip
> > regarding spam fighting for trac.
> > There seem to be two major approaches. First, block suspicious
> > activities at
> > the HTTP level with special rules for mod_security:
> > http://madwifi.org/wiki/FightingTracSpam
> > Then, reject edits, comments and reports by filtering for several
> > content and
> > meta criteria with the SpamFilter trac plugin:
> > http://trac.edgewall.org/wiki/SpamFilter
> > The latter includes a filters for locally training a bayes decider,
> > a user
> > editable regex filter, ip blacklisting and throttling as well as
> > integration
> > with WebAdmin.
> Hi David,
> I appreciate the work you've put into this.
> At this point, I'd love to defer to someone else on what to do -- I
> know I have to do the actual work, but if someone can, well, tell me
> what to do, that'd be great.
> Should I just implement these things? Is that the best option? Am I
> going to have to configure them much?
The mod_security approach filters the spam attacks on a very basic and
technical level ("unusual" URL patterns, blocking of the misused #!html, only
allowing a single URL per ticket, requiring cookies), which -- according to
the author of  -- is quite effective in combatting trac spam. The
configuration is simply installing the apache module and configuring it with
the config from . The HOWTO recommends custom HTML Error Pages with with a
unique ID from mod_unique (displayed via SSI) to ease tracking of false
positives. On the downside, mod_security is not available via debian, only
from third party sites with unknown trust status.
The trac SpamFilter plugin on the other hand does all checking within trac
and requires probably more finetuning/configuration but of course therefore
also gives finer control. According to  you need to install "dnspython"
and "TracSpamFilter" with the "easy_install" tool from "setuptools". For
configuration I would propose to enable the ip_blacklist filter (Using 
and  by default), the ip_throttle filter (e.g. giving out negative points
beyond the fifth posting in the same hour), the Bayes filter (which can be
trained via tracs WebAdmin) and the BadContent wikipage which contains
globally blocked regular expressions.
The latter approach might even be complemented by programming a little filter
whitelisting "known good" users (e.g. with accepted bug reports, or who were
created long ago) to reduce false positive possibilities (e.g. when mass-bug
filing or mass-wiki editing).
Personally I'd prefer the TracSpamFilter plugin. Contrary to the global and
crude nature of the mod_security filtering, SpamFilter gives finegrained
control over the content rules. On the maintenance side the established trac
community development wins for me over the non-standard non-redistributable
Of course both alternatives might have an global impact on usability (to be
weighted against the general disturbance by spam reports and thus the image
of the project) so comments/second-opinions/encouragement from other members
of puppet-dev would be appreciated. :)
[Date: Sun, 12 Mar 2006 18:05:52 -0800] [ftpmaster: Jeroen van Wolffelaar]
Removed the following packages from unstable:
libapache-mod-security | 1.8.7-1 | source, alpha, arm, hppa, i386, ia64,
m68k, mips, mipsel, powerpc, s390, sparc
libapache2-mod-security | 1.8.7-1 | alpha, arm, hppa, i386, ia64, m68k,
mips, mipsel, powerpc, s390, sparc
mod-security-common | 1.8.7-1 | all
Closed bugs: 352344
------------------- Reason -------------------
RoM; undistributable for legal reasons
 I have _no_ fracking idea what that is
 Blog Spam Blocklist: http://bsb.empty.us/
 SpamCop message-body URI domains: http://www.surbl.org/lists.html#sc
The primary freedom of open source is not the freedom from cost, but the free-
dom to shape software to do what you want. This freedom is /never/ exercised
without cost, but is available /at all/ only by accepting the very different
costs associated with open source, costs not in money, but in time and effort.