Archive - Jan 28, 2007

Date

OpenDomains

OpenDomains is a open source PHP script to become a free domain provider.
I don't know how it works but I posted it here because it SOUNDS COOL.

OpenDomains

Bad Behavior

Bad behavior just WORKS! The bad behavior block count in the footer of the page shows 231. It is pretty powerful because I have only installed it for 2 days. There is no spam in the comment, and there is only one spam found. I have just added bad behavior on my forum and let us check out how it does.

Bad Behavior works by checking the HTTP user agent, check the database of the bad bots, if matches, block them entirely. Because this works before the bot can even get into your site, it saves your time by not loading the entire page. I think any CMS should have incorporate this system into their script because for the basic function, only one line of code is need to be added.

How to stop GoogleBot scan my site?

Someone in my forum asked this question. In my forum you ask questions and you will have a great chance getting an answer from me. Some of the questions will be featured at my blog.

How to stop GoogleBot scan my site?

Good day all.

On my server, I'm getting logged from apache:

66.249.65.109 - - [26/Jan/2007:10:49:46 -0500] "GET /tmp/logs/etc/?C=D;O=D HTTP/1.1" 200 965 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)"

Which is Google Bot (as seen above). I would like to know how to stop this scanning of some folders (ie,/var/www/tmp/ and /var/www/opt) and other ones. I don't if this means editing robot.txt or something like that.

Thanks for your time

-kuzew.
2007-01-26

This is how to do the job:

Make a robot.txt at your website root, like can be accessed though.
www.yourwebsite.com/robot.txt
and put these in it:

User-Agent: Googlebot
Disallow: /

Googlebot sees this and will stop scanning anything in your site.. I mean ANYTHING.
But if you only want Googlebot to stop scan some items, you can try this

User-Agent: Googlebot
Disallow: /dirname/
Disallow: /dirname2/somesubdir/

Also, if you replace Googlebot with *, all bots except bad bots that does not follow robot.txt will stop crawling your site. Bad bots should be stopped using bot traps.

Honey Pot that kill bots