|
CrawlSnare
The Ultimate Bad Bot Blocking Script
CrawlSnare is a script I developed due to my own personal need for a powerful and scalable way to block bad interent bots, crawlers, scrapers, and other nasty non-human visitors from viewing and stealing from my site. Over the course of my web design adventures I've learned the hard way that not all of my visitors are human. If you own or run a website I'm sure the same is true for you. In fact, in some cases it may be that not even MOST of your visitors are human.
Those visitors who are not human are computer conrolled automotons. Made by a programmer and set loose upon the internet to do that programmer's bidding, these bots crawl the internet looking for their goal... be it content, e-mail addresses, personal information, copyright information, or any number of other things that could be on our websites.
As far as most webmasters are concerned, there are two different types of these bots. Good and bad. Good bots are bots like GoogleBot, MSNBot, and Yahoo's Slurp. These bots only want to index your page for inclusion on their search engines, which can only help you get more visitors. In these cases, you want these bots to visit your site. Bad bots, on the other hand, are like a cancer. They spread across the internet without regard to borders, rules, protocols, bandwidth, or privacy. They hit your site and steal your content, take your e-mail address, take your personal information, and may even bring your site to its knees.
This has been going on for years, and if you're a webmaster you absolutely need to find a way to stop it from happening to you! That's why I wrote CrawlSnare. I needed a way to stop these nasty bots from getting into my sites once and for all. There are several other scripts on the internet for this purpose... but none of them fit the robustness and scalability that I needed to protect all my sites from all types of bad bots. I needed something I could package and easily plug in to each site I administrate, and I needed to be versatile enough that I could use it on a site with very few visitors or a site with thousands of visitors a day. I also decided that I would write the script in such a way, from the very beginning, that it would be easy for others to implement it on their sites. The more bad bots we block the better... So I offer CrawlSnare to you here for free.
CrawlSnare Features:
Almost all features are configurable. Timelimits, banlimits, logging, notifications, etc.
Automatically bans bots which violate robots.txt directives
Bans bots which access your site too fast (access throttling)
Speed banning is fully configurable - ban bots for only short times or ban them outright.
Bans bots which request too many pages within a 24 hour period (slow scraper protection)
Custom 403 page with the option to unban a human visitor via Captcha technology
Administration panel - view all banned bots and perform administration functions
Allows you to whitelist any bot or bots that you wish the script to ignore
Cleanup script to unban IPs after a configurable time period
Includes a blacklist that is automatically filled with repeat offenders
Add a User Agent to .htaccess for banning without messing with the file yourself
Filenames were chosen to be unusual so that bots don't expect them
Allows you to change the main program directory - prevent bots from learning it
Sends you e-mail updates whenever a bot is caught
Keeps logs of all program actions
Only requires a basic knowledge of php, Apache, and mySQL to implement
Fully commented settings file and extensive readme
CrawlSnare Requirements:
PHP
MySQL
Apache
Access to .htaccess file
Ability to change file permissions
Access to Mod Rewrite (ReWrite Engine) (Only required for User Agent banning)
Click Here to View the ReadMe File
See it in action (ban yourself)
CrawlSnare is still currently under development and is in the testing phase.
|