|
 |
|
 |
09-15-2005, 10:14 AM
|
#1 (permalink)
|
|
Senior Contributor
Join Date: Mar 2005
Posts: 677
|
Catch most of the malicious visitors
As a CMS developer i've found out that there are people on this earth that try to do anything with your website EXCEPT for what you've build it for (reading the content as a normal human being would do).
There are several ways to catch those people and one of them is thru their UA (User Agent)
However there are bots that fake the UA and try to bypass your security which i will probably post in a later topic.
Anyway thru experience i've found out how the most "real" browsers behave and what their UA looks like.
So if you want to prevent any unknown UA use the following code which took me 5 days of checking and identifying all UA's i recieved on a website.
PHP Code:
function check_ua($agent, $os, $extra, &$data)
{
if (!empty($agent)) {
$data = array('agent' => $agent, 'os' => $os, 'ext' => $extra);
return true;
}
return false;
}
function identify_ua($agent)
{
$data = array();
$pattern = array(
# Gecko family
'#^Mozilla/5.0 \(([a-zA-Z0-9]+); U; (.*[^;])(; [a-zA-Z\-]{2,5})?; rv:[0-9\.]+.*?\) Gecko/[0-9]{8} .* (Firefox).*#e',
'#^Mozilla/5.0 \(([a-zA-Z0-9]+); U; (.*[^;])(; [a-zA-Z\-]{2,5})?; rv:[0-9\.]+.*?\) Gecko/[0-9]{8} ([a-zA-Z\-]+)/[0-9\.]+.*#e',
'#^Mozilla/5.0 \(([a-zA-Z0-9]+); U; (.*[^;])(; [a-zA-Z\-]{2,5})?; rv:[0-9\.]+.*?\) Gecko/[0-9]{8}$#e',
# Galeon alternate
'#^Mozilla/5.0 (Galeon)/[0-9\.]+ \(([a-zA-Z0-9]+); (.*[^;]); U\)#e',
# Konqueror
'#^Mozilla/5.0 \(compatible; (Konqueror)/[0-9\.\-rc]+; (i686 )?(Linux|FreeBSD).*#e',
# Lynx
'#^(Lynx)/2.[0-9\.]+(rel|dev)[0-9\.]+ libwww-FM/.*#e',
# Safari family
'#^Mozilla/5.0 \(Macintosh; U; PPC Mac OS X; [a-zA-Z\-]{2,5}\) AppleWebKit/.*? \(KHTML, like Gecko.*?\) ([a-zA-Z]+)/.*#e',
# w3m
'#^(w3m)/[0-9\.]+#e',
# Links
'#^(Links) \([0-9].[a-z0-9]+; (.*?);#e',
# Voyager
'#^Mozilla/4.0 \(compatible; (Voyager); (AmigaOS).*#e',
# Opera
'#^(Opera)/[67].[0-9]{1,2} \((.*?); U\)[\ ]{1,2}\[[a-zA-Z\-]{2,5}\]#e', # Opera 6-7
'#^Mozilla/[45].0 \(compatible; MSIE [56].0; (.*?)\) (Opera) [567].[0-9]{1,2} \[[a-zA-Z\-]{2,5}\]#e', # Opera 6-7 faking IE
'#^Mozilla/5.0 \((.*?); U\) (Opera) [67].[0-9]{1,2} \[[a-zA-Z\-]{2,5}\]#e', # Opera 6-7 faking Gecko
'#^(Opera)/8.[0-9]{1,2} \((.*?); U; [a-zA-Z\-]{2,5}\)#e', # Opera 8
'#^Mozilla/4.0 \(compatible; MSIE 6.0; (.*?); [a-zA-Z\-]{2,5}\) (Opera) 8.[0-9]{1,2}#e', # Opera 8 faking IE
'#^Mozilla/5.0 \((.*?); U; [a-zA-Z\-]{2,5}\) (Opera) 8.[0-9]{1,2}#e', # Opera 8 faking Gecko
# IE
'#^Mozilla/4.0 \(compatible; MSIE (4.0|5.0|5.5|6.0|7.0)[b1]?(; .*[^;])?; (Windows) [A-Z0-9\ \.]+[;)](.*)?#e',
'#^Mozilla/2.0 \(compatible; MSIE (3.0|4.0)[1]?(; .*[^;])?; (Windows) [A-Z0-9\ \.]+[;)](.*)?#e',
);
$replacement = array(
# Gecko family
'check_ua(\'\\4\', \'\\2\', \'\', $data)',
'check_ua(\'\\4\', \'\\2\', \'\', $data)',
'check_ua(\'Mozilla\', \'\\2\', \'\', $data)',
# Galeon
'check_ua(\'\\1\', \'\\3\', \'\', $data)',
# Konqueror
'check_ua(\'\\1\', \'\\3\', \'\', $data)',
# Lynx
'check_ua(\'\\1\', \'N/A\', \'\', $data)',
# Safari family
'check_ua(\'\\1\', \'Mac\', \'\', $data)',
# w3m
'check_ua(\'\\1\', \'N/A\', \'\', $data)',
# Links
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
# Voyager
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
# Opera
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
# IE
'check_ua(\'MSIE\', \'\\3\', \'\\4\', $data)',
'check_ua(\'MSIE\', \'\\3\', \'\\4\', $data)',
);
preg_replace($pattern, $replacement, $agent);
if (!isset($data['agent'])) return identify_bot($agent);
if ($data['agent'] == 'MSIE') {
# Detect bot that simulates MSIE
preg_match('#(Fetch API Request|Microsoft Scheduled Cache Content Download Service|Have a nice day\!|Your Own World|Mozilla/|Medusa)#is', $data['ext'], $regs);
if (!empty($regs[0])) {
$data['bot'] = $regs[0];
unset($data['agent']);
return $data;
}
preg_match_all('#(iRider|Crazy Browser|NetCaptor|Maxthon|Avant Browser)#s', $data['ext'], $regs);
if (!empty($regs[0])) {
$data['agent'] = str_replace(' Browser','',$regs[0][count($regs[0])-1]);
$data['ext'] = '';
}
}
preg_match('#(Win|Mac|Linux|FreeBSD|SunOS|IRIX|BeOS|OS/2|AIX|Amiga)#is', $data['os'], $regs);
$data['os'] = empty($regs[0]) ? 'Other' : $regs[0];
if ($data['os'] == 'Win') $data['os'] = 'Windows';
return $data;
}
$data = identify_ua($_SERVER['HTTP_USER_AGENT']);
if (!$data || empty($data['agent'])) {
die('We are sorry but unidentified User Agents are not allowed on this website');
}
Feel free to add comments if you have a UA of a "real" browser that fails this test and DON'T FORGET to mention if what plugin/software you used that modifies the UA.
NOTE: I am also working on a bot identification thru HTTP_REFERER and IP's/Network to prevent anything like referer spamming, harvesting and image grabbing.
|
|
|
09-15-2005, 10:19 AM
|
#2 (permalink)
|
|
Senior Contributor
Join Date: Mar 2005
Posts: 677
|
Forgot to mention that $data will contain the browser information.
$data['agent'] = "real" Browser name
$data['os'] = Operating System
|
|
|
09-15-2005, 12:35 PM
|
#3 (permalink)
|
|
Jack of all trades
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
|
Does this mean I can't browse your website with my UA string of Mozilla Firefox on a Smith Corona Typewriter?
__________________
Stop intellectual property from infringing on me
|
|
|
09-15-2005, 01:32 PM
|
#4 (permalink)
|
|
Senior Contributor
Join Date: Mar 2005
Posts: 677
|
If you get Firefox running on your Smith Corona Typewrite i definatly want a picture of it
|
|
|
09-15-2005, 02:56 PM
|
#5 (permalink)
|
|
Moderator
Join Date: May 2002
Location: us.ca
Posts: 4,489
|
cool, . . i've done some useragent work too. that is how i show the icons for your o/s and browser at the top. it recognizes version numbers too but i don't do much with that right now. i guess it's for a different purpose though.
i found this link to someone who maintains a list of agents: http://www.psychedelix.com/agents.html
__________________
Mike
|
|
|
09-15-2005, 10:11 PM
|
#6 (permalink)
|
|
Senior Contributor
Join Date: Mar 2005
Posts: 677
|
thanks sde but i already had that bookmarked 
This script is not about specific bots, it just bans them all at the moment 
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -8. The time now is 09:47 AM.
|
Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting
|
 |
|