[NBLUG/talk] webserver abuse

Troy Arnold troy at zenux.net
Mon Jan 24 16:18:10 PST 2005


On Mon, Jan 24, 2005 at 03:22:07PM -0800, Bob Blick wrote:
> 
> But I think it's time to switch to all-PHP for everything, and do some
> quotas.  What with brain-dead robots last month spidering the same file
> over and over, this is bumming me out.
> 
> Does anyone have any ideas how best to do this, or links?

I wrote something like this several years ago when I worked
for a content site that was constantly being hit by rude bots.

I'm too embarrased to post it because:
1) its really old
2) I didn't know what the hell I was doing
3) I'd do it differently now.

But the basic idea was:
You don't want a bunch of code at the top of each page, you want a
self-contained include file.
If you use a php auto_prepend file, then you don't have to modify
all your pages

The flow was:
1) check if remote ip is automatically allowed (i.e. someone from the
   office, or a legit bot.  Back in the day, altavista's scooter was
   really rude frequency wise, but we let them in anyway because we wanted to
   be indexed)
2) if not always allowed, log the ip and a timestamp
3) Check if the remote ip had exceeded X hits per Y amount of time,
    or X hits per day total.
4) if not allowed, {
       generate a 403 header and a message telling them they're lame.
       exit();
    else {
        continue as normal
    }

You could, perhaps, store the offending IP's somewhere and have a
cronjob come along and add an iptables rule for them.  Obviously, this
would have to run as root, and you'd want to seriously validate it
before letting root touch it.


You may find something lower-level at:
http://modules.apache.org/

I'm not sure if mod_throttle can do anything on a per ip address basis,
but it's worth checking out.

-troy




More information about the talk mailing list