Evan Prodromou | 3 Feb 2008 21:55
Gravatar

Re: Site duplication... is this widespread?


On Tue, 2007-11-20 at 10:15 -0600, Tilghman Lesher wrote:

> Given that you control the CSS, you have a perfect opportunity to make his
> entire site un-viewable, simply by changing the CSS when the referer is set
> externally (white text on white background?)

My experience running Open Content sites is that people who create
mirrors generally have good intentions and would like to retain a
positive relationship. They don't abuse your bandwidth because they want
to screw you over; they do it because they think you're so big you won't
care.

Usually a quick email to the registered domain owner has been enough to
see some changes on the site -- reduced hotlinking, better attribution,
some differentiation of brand.

-Evan

_______________________________________________
linux-elitists mailing list
linux-elitists <at> zgp.org
http://allium.zgp.org/cgi-bin/mailman/listinfo/linux-elitists
Gerald Oskoboiny | 11 Feb 2008 08:06

web server software for tarpitting?

The other day we posted an article [1] about excessive traffic
for DTD files on www.w3.org: up to 130 million requests/day, with
some IP addresses re-requesting the same files thousands of times
per day. (up to 300k times/day, rarely)

The article goes into more details for those interested, but the
solution I'm thinking will work best (suggested by Don Marti
among others) is to tarpit the offenders.

I just followed up on slashdot [2] about the implementation I
have in mind, but that thread is pretty stale and this is
probably a better place to ask anyway, so:

Does anyone have specific web server software to recommend that
is able to keep tens of thousands of concurrent connections open
on a typical cheap Linux box? (Lighttpd? Nginx? Varnish? Yaws?)
It also needs to be able to proxy other requests to an Apache
server running elsewhere.

Bonus marks for:

   - ability to do content negotiation
   - ability to set different delays for different IP addresses
   - HTTP compliance

I'll research this myself as well, I'm just wondering if anyone
has recommendations based on experience.

thanks!

(Continue reading)

Robert Edmonds | 11 Feb 2008 19:38
Picon
Favicon

Re: web server software for tarpitting?

On 2008-02-11, Gerald Oskoboiny <gerald <at> impressive.net> wrote:
> The other day we posted an article [1] about excessive traffic
> for DTD files on www.w3.org: up to 130 million requests/day, with
> some IP addresses re-requesting the same files thousands of times
> per day. (up to 300k times/day, rarely)
>
> The article goes into more details for those interested, but the
> solution I'm thinking will work best (suggested by Don Marti
> among others) is to tarpit the offenders.

I have no experience with application layer tarpitting, but for
extremely persistent IP addresses I'd suggest TCP zero window tarpitting
-- this can hang a TCP connection for 12-24 minutes or so with only a
few packets.  Check out the iptables TARPIT and ipset modules; relevant
Debian packages are netfilter-extensions and ipset.

--

-- 
Robert Edmonds
edmonds <at> debian.org
Evan Prodromou | 12 Feb 2008 18:37
Gravatar

Re: web server software for tarpitting?


On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> The other day we posted an article [1] about excessive traffic
> for DTD files on www.w3.org: up to 130 million requests/day, with
> some IP addresses re-requesting the same files thousands of times
> per day. (up to 300k times/day, rarely)
> 
> The article goes into more details for those interested, but the
> solution I'm thinking will work best (suggested by Don Marti
> among others) is to tarpit the offenders.

...and not punish everybody else, right?

>      W3C's current traffic is something like:
> 
>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
>        - 25% valid HTML/CSS/WAI icons
>        - 9% other

It sounds like W3C has been having a problem satisfying its promises,
then. When you publicize an URL, like a DTD or schema, you're giving
some tacit permission to use that URL. Why are you now trying to
penalize those people who actually bought the story and are using the
URL?

It seems to me the way to solve your problem is to:

     1. Clarify and publicize best practises for using W3C resources
        into a server use policy. How often is it OK to hit a W3C-hosted
        DTD? Once a day? Once an hour? Once a minute?
(Continue reading)

Gerald Oskoboiny | 12 Feb 2008 19:33

Re: web server software for tarpitting?

* Evan Prodromou <evan <at> prodromou.name> [2008-02-12 12:37-0500]
>On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
>> The other day we posted an article [1] about excessive traffic
>> for DTD files on www.w3.org: up to 130 million requests/day, with
>> some IP addresses re-requesting the same files thousands of times
>> per day. (up to 300k times/day, rarely)
>>
>> The article goes into more details for those interested, but the
>> solution I'm thinking will work best (suggested by Don Marti
>> among others) is to tarpit the offenders.
>
>...and not punish everybody else, right?

Right, just punish those who are abusive.

>>      W3C's current traffic is something like:
>>
>>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
>>        - 25% valid HTML/CSS/WAI icons
>>        - 9% other
>
>It sounds like W3C has been having a problem satisfying its promises,
>then. When you publicize an URL, like a DTD or schema, you're giving
>some tacit permission to use that URL.

Yes, but a single IP address re-fetching the same URL thousands
or hundreds of thousands of times a day seems excessive.

>It seems to me the way to solve your problem is to:
>
(Continue reading)

Aaron Sherman | 12 Feb 2008 20:07
Gravatar

Re: web server software for tarpitting?

Evan Prodromou wrote:
> On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
>   
>
>>      W3C's current traffic is something like:
>>
>>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
>>        - 25% valid HTML/CSS/WAI icons
>>        - 9% other
>>     
>
> It sounds like W3C has been having a problem satisfying its promises,
> then. When you publicize an URL, like a DTD or schema, you're giving
> some tacit permission to use that URL. Why are you now trying to
> penalize those people who actually bought the story and are using the
> URL?
>   

"Using" is the term in question here. We're not talking about 
constructive access to these URLs, but poorly implemented software that 
is resorting to polling W3C where it should not. Well behaved software 
does not do this. Why should W3C foot the bill for poorly behaved software?

Making access to W3C degrade performance of poorly written software is a 
fine way to deal with this. Such software can be trivially fixed to 
avoid the degradation.

> It seems to me the way to solve your problem is to:
>
>      1. Clarify and publicize best practises for using W3C resources
(Continue reading)

Greg Folkert | 12 Feb 2008 21:07
Favicon
Gravatar

Re: web server software for tarpitting?


On Tue, 2008-02-12 at 10:33 -0800, Gerald Oskoboiny wrote:
> * Evan Prodromou <evan <at> prodromou.name> [2008-02-12 12:37-0500]
> >On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> >> The other day we posted an article [1] about excessive traffic
> >> for DTD files on www.w3.org: up to 130 million requests/day, with
> >> some IP addresses re-requesting the same files thousands of times
> >> per day. (up to 300k times/day, rarely)
> >>
> >> The article goes into more details for those interested, but the
> >> solution I'm thinking will work best (suggested by Don Marti
> >> among others) is to tarpit the offenders.
> >
> >...and not punish everybody else, right?
> 
> Right, just punish those who are abusive.
> 
> >>      W3C's current traffic is something like:
> >>
> >>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
> >>        - 25% valid HTML/CSS/WAI icons
> >>        - 9% other
> >
> >It sounds like W3C has been having a problem satisfying its promises,
> >then. When you publicize an URL, like a DTD or schema, you're giving
> >some tacit permission to use that URL.
> 
> Yes, but a single IP address re-fetching the same URL thousands
> or hundreds of thousands of times a day seems excessive.
> 
(Continue reading)

Karsten M. Self | 12 Feb 2008 22:21
Favicon

Re: web server software for tarpitting?

on Tue, Feb 12, 2008 at 10:33:17AM -0800, Gerald Oskoboiny (gerald <at> impressive.net) wrote:
> * Evan Prodromou <evan <at> prodromou.name> [2008-02-12 12:37-0500]
> >On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> >> The other day we posted an article [1] about excessive traffic
> >> for DTD files on www.w3.org: up to 130 million requests/day, with
> >> some IP addresses re-requesting the same files thousands of times
> >> per day. (up to 300k times/day, rarely)
> >>
> >> The article goes into more details for those interested, but the
> >> solution I'm thinking will work best (suggested by Don Marti
> >> among others) is to tarpit the offenders.
> >
> >...and not punish everybody else, right?
> 
> Right, just punish those who are abusive.

<...>

> I'd expect most medium/large sites have some kind of defensive
> measures in place to deal with abuse. Google and Wikipedia block all
> access from generic user-agents like Java/x and Python-urllib/x.

Just out of curiosity, what's the user-agent distribution on this
traffic?

  - Are we dealing with poorly-written end-user software (browsers and
    the like)?  Are these typically proprietary or Free Software?  Are
    you getting hammered on account of legacy proprietary software,
    non-standards-compliant FS tools?

(Continue reading)

James Sparenberg | 14 Feb 2008 02:29
Favicon

Re: web server software for tarpitting?

On Tuesday 12 February 2008 10:33:17 Gerald Oskoboiny wrote:
> * Evan Prodromou <evan <at> prodromou.name> [2008-02-12 12:37-0500]
>
> >On Sun, 2008-02-10 at 23:06 -0800, Gerald Oskoboiny wrote:
> >> The other day we posted an article [1] about excessive traffic
> >> for DTD files on www.w3.org: up to 130 million requests/day, with
> >> some IP addresses re-requesting the same files thousands of times
> >> per day. (up to 300k times/day, rarely)
> >>
> >> The article goes into more details for those interested, but the
> >> solution I'm thinking will work best (suggested by Don Marti
> >> among others) is to tarpit the offenders.
> >
> >...and not punish everybody else, right?
>
> Right, just punish those who are abusive.
>
> >>      W3C's current traffic is something like:
> >>
> >>        - 66% DTD/schema files (.dtd/ent/mod/xsd)
> >>        - 25% valid HTML/CSS/WAI icons
> >>        - 9% other
> >
> >It sounds like W3C has been having a problem satisfying its promises,
> >then. When you publicize an URL, like a DTD or schema, you're giving
> >some tacit permission to use that URL.
>
> Yes, but a single IP address re-fetching the same URL thousands
> or hundreds of thousands of times a day seems excessive.

(Continue reading)

Tony Godshall | 14 Feb 2008 18:55

Re: web server software for tarpitting?

Hi.

Could we clear something up?

I've always heard tarpitting as attempting to slow down a attacker
connections as much as possible by stringing along their TCP
connections for as long as possible.  Specifically, I've seen it done
against spammers (or spammerbots) since it reduces spam to oneself and
keeps them (at least the single-threaded ones, I guess) from moving on
to the next victim.  I.e. as long as possible[1]

It sounds, though, that you are wanting something different, which I
would call adaptive metering, i.e. slowing down the connections of IP
addresses that are causing too much traffic so that they are no longer
harmful.  Drop enough packets and the backoff kicks in[2] and their
traffic rate drops to "acceptable" levels.

By the way, there's really nothing unusual about having large amounts
of traffic coming from a single IP- many large organizations hide huge
numbers of machines behind a single IP.  Of course, they ought to be
running a transparent cache at their NAT point so repeats to the same
URL are cut way way back.  Presumably these organizations are not lost
causes and could be "taught".  And it sounds like you don't want to
disconnect them, just limit their harm.

So, do you really mean tarpitting/teergruben or do you want adaptive
metering?  And if the later, are you interested in limiting the data
rate per TCP connection, or are you interested in limiting the number
of connections per minute per IP addr.  Either has a much simpler
solution than a "retaliatory" one like tarpitting.
(Continue reading)


Gmane