Apache

How to stop GoogleBot scan my site?

Someone in my forum asked this question. In my forum you ask questions and you will have a great chance getting an answer from me. Some of the questions will be featured at my blog.

How to stop GoogleBot scan my site?

Good day all.

On my server, I'm getting logged from apache:

66.249.65.109 - - [26/Jan/2007:10:49:46 -0500] "GET /tmp/logs/etc/?C=D;O=D HTTP/1.1" 200 965 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Which is Google Bot (as seen above). I would like to know how to stop this scanning of some folders (ie,/var/www/tmp/ and /var/www/opt) and other ones. I don't if this means editing robot.txt or something like that.

Thanks for your time

-kuzew.
2007-01-26

This is how to do the job:

Make a robot.txt at your website root, like can be accessed though.
www.yourwebsite.com/robot.txt
and put these in it:

User-Agent: Googlebot
Disallow: /

Googlebot sees this and will stop scanning anything in your site.. I mean ANYTHING.
But if you only want Googlebot to stop scan some items, you can try this

User-Agent: Googlebot
Disallow: /dirname/
Disallow: /dirname2/somesubdir/

Also, if you replace Googlebot with *, all bots except bad bots that does not follow robot.txt will stop crawling your site. Bad bots should be stopped using bot traps.

Syndicate content
Honey Pot that kill bots