Bots

Bots get the Honey Pot--bees will do the justice

in

Harvesters and Comment Spam bots? Time for them to get pwned!
Akismet, Bad Behavior are doing their part, but one more recruit won't hurt.
Project Honey Pot just planted a pot in my site and waiting for bots to fly in and get screwed ...screwed well
The project let user downloads a small honey pot script and upload on your website. Then, after setting up, put a link to the script in the page, but can't be seen by normal visitors so only the evil bots will click on it and wala...
This is what happens to harvesters
This is what happens to spam bots

Blog owners who does not have PHP or other server side script support, you can sign up for a link from the project site so you don't have to host it.

MyCyberTwin

MyCyberTwin Logo
MyCyberTwin provides a service to clone yourself in the cyberspace. But again, it's just a bot go on your MSN.
You train the Bot to respond to different questions with three different answers, like "What is your name?" "Bob", "Smith", "Bot". The bot can also read RSS and output RSS to the user.
Each month, the bot can do 500 conversations with people for free, enough for average person.

How to stop GoogleBot scan my site?

Someone in my forum asked this question. In my forum you ask questions and you will have a great chance getting an answer from me. Some of the questions will be featured at my blog.

How to stop GoogleBot scan my site?

Good day all.

On my server, I'm getting logged from apache:

66.249.65.109 - - [26/Jan/2007:10:49:46 -0500] "GET /tmp/logs/etc/?C=D;O=D HTTP/1.1" 200 965 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Which is Google Bot (as seen above). I would like to know how to stop this scanning of some folders (ie,/var/www/tmp/ and /var/www/opt) and other ones. I don't if this means editing robot.txt or something like that.

Thanks for your time

-kuzew.
2007-01-26

This is how to do the job:

Make a robot.txt at your website root, like can be accessed though.
www.yourwebsite.com/robot.txt
and put these in it:

User-Agent: Googlebot
Disallow: /

Googlebot sees this and will stop scanning anything in your site.. I mean ANYTHING.
But if you only want Googlebot to stop scan some items, you can try this

User-Agent: Googlebot
Disallow: /dirname/
Disallow: /dirname2/somesubdir/

Also, if you replace Googlebot with *, all bots except bad bots that does not follow robot.txt will stop crawling your site. Bad bots should be stopped using bot traps.

Syndicate content
Honey Pot that kill bots