Not all spiders read the robots.txt file, nor do they bother to advertise
their identification string. So, you notices that your site is being
trounced by something that is draining your resources and will not go
away.
To the host that is causing you the problems, it will give the appearence
that you are no longer on the air! Of course, make certain that you are
not locking out a legitimate search engine.
Here is the BLACKHOLE trick. The idea to a BLACKHOLE is to reroute all
requests from a particular computer that is requesting information from
your server to an inactive IP address.
You will probably have to consult your route man page, but this
is what I use on Redhat Linux:
/sbin/route -n add -host 123.35.67.89 gw 168.1.0.219
DO NOT USE THE NUMBERS ABOVE- THEY ARE AN EXAMPLE!
The first number is the IP address of the offending host. The second
number is an inactive IP address on your local network. The number
must be inactive and it must be on your local network.
You also must be root, or have the ability to run administrative programs
to do this.
If you want to make the BLACKHOLE for this address permanent, you must
add the entry to your /etc/rc.d/rc.local file.
TO REMOVE THE BLACKHOLE
If you have added the route entry to your /etc/rc.d/rc.local or equivalent
start-up script- remove it. Then, issue the command:
/sbin/route -n del -host 123.35.67.89 gw 168.1.0.219
This will remove the BLACKHOLE.