How to Stop Unknown Robots from crawling my website?

Sam308

Neophyte
Joined
Sep 22, 2015
Messages
3
A PHP solution by Sam308

Block Unwanted Robots/Spiders visitors

Instructions:
Place the following PHP Code in the beginning of your index.php file.

<?php
// ---------------------------------------------------------------------------------------------------------------

// Banned IP Addresses and Bots - Redirects banned visitors who make it past the .htaccess and or robots.txt files to an URL.
// The $banned_ip_addresses array can contain both full and partial IP addresses, i.e. Full = 123.456.789.101, Partial = 123.456.789. or 123.456. or 123.
// Use partial IP addresses to include all IP addresses that begin with a partial IP addresses. The partial IP addresses must end with a period.
// The $banned_bots, $banned_unknown_bots, and $good_bots arrays should contain keyword strings found within the User Agent string.
// The $banned_unknown_bots array is used to identify unknown robots (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-).
// The $good_bots array contains keyword strings used as exemptions when checking for $banned_unknown_bots. If you do not want to utilize the $good_bots array such as
// $good_bots = array(), then you must remove the the keywords strings 'bot.','bot/','bot-' from the $banned_unknown_bots array or else the good bots will also be banned.
$banned_ip_addresses = array('41.','64.79.100.23','5.254.97.75','148.251.236.167','88.180.102.124','62.210.172.77','45.','195.206.253.146');
$banned_bots = array('.ru','AhrefsBot','crawl','crawler','DotBot','linkdex','majestic','meanpath','PageAnalyzer','robot','rogerbot','semalt','SeznamBot','spider');
$banned_unknown_bots = array('bot ','bot_','bot+','bot:','bot,','bot;','bot\\','bot.','bot/','bot-');
$good_bots = array('Google','MSN','bing','Slurp','Yahoo');
$banned_ip_address_url = 'http://english-1329329990.spampoison.com';

// Visitor's IP address and Browser (User Agent)
$ip_address = $_SERVER['REMOTE_ADDR'];
$browser = $_SERVER['HTTP_USER_AGENT'];

// Declared Temporary Variables
$ipfound = $piece = $banned_piece = $botfound = $gbotfound = $ubotfound = '';

// Checks for Banned IP Addresses and Bots
if($banned_ip_address_url != ''){
// Checks for Banned IP Address
if(!empty($banned_ip_addresses)){
if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';}
if($ipfound != 'found'){
$ip_pieces = explode('.', $ip_address);
foreach ($ip_pieces as $value){
$piece = $piece.$value.'.';
if(in_array($piece, $banned_ip_addresses)){$banned_piece = 'found'; break;}
}
}
if($banned_piece == 'found'){header("location: $banned_ip_address_url"); exit();}
}

// Checks for Banned Bots
if(!empty($banned_bots)){
foreach ($banned_bots as $bbvalue){
$pos1 = stripos($browser, $bbvalue);
if($pos1 !== false){$botfound = 'found'; break;}
}
if($botfound == 'found'){header("location: $banned_ip_address_url"); exit();}
}

// Checks for Banned Unknown Bots
if(!empty($good_bots)){
foreach ($good_bots as $gbvalue){
$pos2 = stripos($browser, $gbvalue);
if($pos2 !== false){$gbotfound = 'found'; break;}
}
}
if($gbotfound != 'found'){
if(!empty($banned_unknown_bots)){
foreach ($banned_unknown_bots as $bubvalue){
$pos3 = stripos($browser, $bubvalue);
if($pos3 !== false){$ubotfound = 'found'; break;}
}
if($ubotfound == 'found'){header("location: $banned_ip_address_url"); exit();}
}
}
}

// ---------------------------------------------------------------------------------------------------------------
?>
 

Sam308

Neophyte
Joined
Sep 22, 2015
Messages
3
Minor Correction to A PHP solution by Sam308
Replace in the above php code with the following:


// Declared Temporary Variables
$ipfound = $piece = $botfound = $gbotfound = $ubotfound = '';

// Checks for Banned IP Addresses and Bots
if($banned_ip_address_url != ''){
// Checks for Banned IP Address
if(!empty($banned_ip_addresses)){
if(in_array($ip_address, $banned_ip_addresses)){$ipfound = 'found';}
if($ipfound != 'found'){
$ip_pieces = explode('.', $ip_address);
foreach ($ip_pieces as $value){
$piece = $piece.$value.'.';
if(in_array($piece, $banned_ip_addresses)){$ipfound = 'found'; break;}
}
}
if($ipfound == 'found'){header("location: $banned_ip_address_url"); exit();}
}
 

BirdOPrey5

#Awesome
Joined
Aug 14, 2008
Messages
4,218
But what about all the pages besides index.php? If crawling forum.php for example index.php is never accessed.
 

Sam308

Neophyte
Joined
Sep 22, 2015
Messages
3
But what about all the pages besides index.php? If crawling forum.php for example index.php is never accessed.

Then just place the code in the beginning of your forum.php file.

The idea here is to place the code in the main site's php home page, the main entry point of the site.

If you have other php files that are accessed directly via an url, not including include or require files, then place the code in those files.
For most php sites and CMS sites, the root's index.php file is the file that is the main entry point of the site.

Keep in mind that your site statistics, i.e. awstats will still log the hits under Unknown robot (identified by 'bot' followed by a space or one of the following characters _+:,.;/\-), but these bots will be blocked from accessing your site's content.
 
Last edited:

Nirjonadda

Aspirant
Joined
Feb 4, 2017
Messages
17
I currently use this on all my sites , basically it blocks all bad user agents , bad bots and scrappers, Not only can it save your content from being mass harvested but will also save you a little bandwidth because of less bots running around your site. Hope it helps

Where you are added this code? In your .htaccess?
 

pierce

Habitué
Joined
Apr 10, 2016
Messages
1,165
It's also important to remember some bots are connected to Adsense adverts! So best to do research on what your banning first!
 
Top