1 Aug 2010 06:00
URIBL update
Jared Johnson <jjohnson <at> efolder.net>
2010-08-01 04:00:34 GMT
2010-08-01 04:00:34 GMT
I've made a bunch more changes to the uribl plugin locally; man, we
_really_ need to get some kind of svn-to-gig thing going. Or at least I
need to re-educate myself on git and start putting things in my github
again. If I don't manage to do this by the time (soon) that things are
settled down a bit and we have some production testing, I'll submit a new
one to the list again; but hopefully by the time I'll have found time to
git git going again and be able to point people to that.
I got permission from Dallas <at> URIBL to use the datafeed data, but also
got his opinion on the matter which is that using tld_lists the way I am
is not going to gain much, and introduces the risk that a new spammer
'haven' could be missed entirely. After talking with my team we're going
to go with a full TLD list right now, and perhaps later we'll collect our
own stats to verify Dallas is right about the tiny benefit (he probably
is). tld_lists has been updated to reflect this, though if anyone feels
more bold than me and wants updates to the 'pruned' list let me know.
I modified the parse_mime plugin as discussed previously on the list, now
the uribl plugin isa_plugin('mime_parser') and does lazy parsing.
I'll probably remove 'semicolon munging detection'; as Devin said, if real
(current) data doesn't show it's being used why bother. I'd like to go
over a larger sampling of current data first though, which I plan to do
soon.
I've re-arranged the code slightly to allow not only the async plugin but
our own local plugin to easily take advantage of plugin inheritance to
avoid code duplication. Our own plugin is now just 40 lines or so, thus
it gets to inherit the other 600 lines of uribl without any forking :)
(Continue reading)
RSS Feed