Download of entire web site
David White
cavefish at pacbell.net
Sun Oct 6 01:38:49 PDT 2002
Has anyone on this list heard of web crawlers that download an entire
site - including 100s of megabytes of images and thousands of
dynamically generated pages? It's almost like someone was trying to set
up a duplicate site.
The download was apparently by one computer - but it used 4 IP
addresses, each with a different useragent: Win95, Mac_PowerPC, Win2000,
and Konqueror. The session cookie was the same for all 4 IP addresses,
suggesting that it was a single computer - but the different useragent
strings suggest that it was trying to make itself less conspicuous.
I've blocked the 4 addresses.
This doesn't seem like legitimate web crawler behavior. Has anyone
encountered this before? I'm worried that someone is trying to do
something bad - but so far I can't figure out what.
I'm sorry this isn't directly related to Linux - except that my server
is running Linux.
Thanks in advance,
David White
cavefish at pacbell.net
(I've been monitoring this list for years, but I can't remember whether
I've posted before.)
More information about the talk
mailing list