Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums
Old 09-15-2005, 10:14 AM   #1 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 677
DJMaze is on a distinguished road
Catch most of the malicious visitors

As a CMS developer i've found out that there are people on this earth that try to do anything with your website EXCEPT for what you've build it for (reading the content as a normal human being would do).

There are several ways to catch those people and one of them is thru their UA (User Agent)
However there are bots that fake the UA and try to bypass your security which i will probably post in a later topic.

Anyway thru experience i've found out how the most "real" browsers behave and what their UA looks like.

So if you want to prevent any unknown UA use the following code which took me 5 days of checking and identifying all UA's i recieved on a website.
PHP Code:
function check_ua($agent$os$extra, &$data)
{
    if (!empty(
$agent)) {
        
$data = array('agent' => $agent'os' => $os'ext' => $extra);
        return 
true;
    }
    return 
false;
}

function 
identify_ua($agent)
{
    
$data = array();
    
$pattern = array(
        
# Gecko family
        
'#^Mozilla/5.0 \(([a-zA-Z0-9]+); U; (.*[^;])(; [a-zA-Z\-]{2,5})?; rv:[0-9\.]+.*?\) Gecko/[0-9]{8} .* (Firefox).*#e',
        
'#^Mozilla/5.0 \(([a-zA-Z0-9]+); U; (.*[^;])(; [a-zA-Z\-]{2,5})?; rv:[0-9\.]+.*?\) Gecko/[0-9]{8} ([a-zA-Z\-]+)/[0-9\.]+.*#e',
        
'#^Mozilla/5.0 \(([a-zA-Z0-9]+); U; (.*[^;])(; [a-zA-Z\-]{2,5})?; rv:[0-9\.]+.*?\) Gecko/[0-9]{8}$#e',
        
# Galeon alternate
        
'#^Mozilla/5.0 (Galeon)/[0-9\.]+ \(([a-zA-Z0-9]+); (.*[^;]); U\)#e',
        
# Konqueror
        
'#^Mozilla/5.0 \(compatible; (Konqueror)/[0-9\.\-rc]+; (i686 )?(Linux|FreeBSD).*#e',
        
# Lynx
        
'#^(Lynx)/2.[0-9\.]+(rel|dev)[0-9\.]+ libwww-FM/.*#e',
        
# Safari family
        
'#^Mozilla/5.0 \(Macintosh; U; PPC Mac OS X; [a-zA-Z\-]{2,5}\) AppleWebKit/.*? \(KHTML, like Gecko.*?\) ([a-zA-Z]+)/.*#e',
        
# w3m
        
'#^(w3m)/[0-9\.]+#e',
        
# Links
        
'#^(Links) \([0-9].[a-z0-9]+; (.*?);#e',
        
# Voyager
        
'#^Mozilla/4.0 \(compatible; (Voyager); (AmigaOS).*#e',
        
# Opera
        
'#^(Opera)/[67].[0-9]{1,2} \((.*?); U\)[\ ]{1,2}\[[a-zA-Z\-]{2,5}\]#e',    # Opera 6-7
        
'#^Mozilla/[45].0 \(compatible; MSIE [56].0; (.*?)\) (Opera) [567].[0-9]{1,2} \[[a-zA-Z\-]{2,5}\]#e',    # Opera 6-7 faking IE
        
'#^Mozilla/5.0 \((.*?); U\) (Opera) [67].[0-9]{1,2} \[[a-zA-Z\-]{2,5}\]#e',    # Opera 6-7 faking Gecko
        
'#^(Opera)/8.[0-9]{1,2} \((.*?); U; [a-zA-Z\-]{2,5}\)#e',    # Opera 8
        
'#^Mozilla/4.0 \(compatible; MSIE 6.0; (.*?); [a-zA-Z\-]{2,5}\) (Opera) 8.[0-9]{1,2}#e',    # Opera 8 faking IE
        
'#^Mozilla/5.0 \((.*?); U; [a-zA-Z\-]{2,5}\) (Opera) 8.[0-9]{1,2}#e',    # Opera 8 faking Gecko
        # IE
        
'#^Mozilla/4.0 \(compatible; MSIE (4.0|5.0|5.5|6.0|7.0)[b1]?(; .*[^;])?; (Windows) [A-Z0-9\ \.]+[;)](.*)?#e',
        
'#^Mozilla/2.0 \(compatible; MSIE (3.0|4.0)[1]?(; .*[^;])?; (Windows) [A-Z0-9\ \.]+[;)](.*)?#e',
    );
    
$replacement = array(
        
# Gecko family
        
'check_ua(\'\\4\', \'\\2\', \'\', $data)',
        
'check_ua(\'\\4\', \'\\2\', \'\', $data)',
        
'check_ua(\'Mozilla\', \'\\2\', \'\', $data)',
        
# Galeon
        
'check_ua(\'\\1\', \'\\3\', \'\', $data)',
        
# Konqueror
        
'check_ua(\'\\1\', \'\\3\', \'\', $data)',
        
# Lynx
        
'check_ua(\'\\1\', \'N/A\', \'\', $data)',
        
# Safari family
        
'check_ua(\'\\1\', \'Mac\', \'\', $data)',
        
# w3m
        
'check_ua(\'\\1\', \'N/A\', \'\', $data)',
        
# Links
        
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
        
# Voyager
        
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
        
# Opera
        
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
        
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
        
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
        
'check_ua(\'\\1\', \'\\2\', \'\', $data)',
        
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
        
'check_ua(\'\\2\', \'\\1\', \'\', $data)',
        
# IE
        
'check_ua(\'MSIE\', \'\\3\', \'\\4\', $data)',
        
'check_ua(\'MSIE\', \'\\3\', \'\\4\', $data)',
    );
    
preg_replace($pattern$replacement$agent);
    if (!isset(
$data['agent'])) return identify_bot($agent);
    if (
$data['agent'] == 'MSIE') {
        
# Detect bot that simulates MSIE
        
preg_match('#(Fetch API Request|Microsoft Scheduled Cache Content Download Service|Have a nice day\!|Your Own World|Mozilla/|Medusa)#is'$data['ext'], $regs);
        if (!empty(
$regs[0])) {
            
$data['bot'] =  $regs[0];
            unset(
$data['agent']);
            return 
$data;
        }
        
preg_match_all('#(iRider|Crazy Browser|NetCaptor|Maxthon|Avant Browser)#s'$data['ext'], $regs);
        if (!empty(
$regs[0])) {
            
$data['agent'] = str_replace(' Browser','',$regs[0][count($regs[0])-1]);
            
$data['ext'] = '';
        }
    }
    
preg_match('#(Win|Mac|Linux|FreeBSD|SunOS|IRIX|BeOS|OS/2|AIX|Amiga)#is'$data['os'], $regs);
    
$data['os'] = empty($regs[0]) ? 'Other' $regs[0];
    if (
$data['os'] == 'Win'$data['os'] = 'Windows';
    return 
$data;
}

$data identify_ua($_SERVER['HTTP_USER_AGENT']);
if (!
$data || empty($data['agent'])) {
    die(
'We are sorry but unidentified User Agents are not allowed on this website');

Feel free to add comments if you have a UA of a "real" browser that fails this test and DON'T FORGET to mention if what plugin/software you used that modifies the UA.

NOTE: I am also working on a bot identification thru HTTP_REFERER and IP's/Network to prevent anything like referer spamming, harvesting and image grabbing.
DJMaze is offline   Reply With Quote
Old 09-15-2005, 10:19 AM   #2 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 677
DJMaze is on a distinguished road
Forgot to mention that $data will contain the browser information.
$data['agent'] = "real" Browser name
$data['os'] = Operating System
DJMaze is offline   Reply With Quote
Old 09-15-2005, 12:35 PM   #3 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
Does this mean I can't browse your website with my UA string of Mozilla Firefox on a Smith Corona Typewriter?
__________________
Stop intellectual property from infringing on me
teknomage1 is offline   Reply With Quote
Old 09-15-2005, 01:32 PM   #4 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 677
DJMaze is on a distinguished road
If you get Firefox running on your Smith Corona Typewrite i definatly want a picture of it
DJMaze is offline   Reply With Quote
Old 09-15-2005, 02:56 PM   #5 (permalink)
sde
Moderator
 
sde's Avatar
 
Join Date: May 2002
Location: us.ca
Posts: 4,489
sde is on a distinguished road
cool, . . i've done some useragent work too. that is how i show the icons for your o/s and browser at the top. it recognizes version numbers too but i don't do much with that right now. i guess it's for a different purpose though.

i found this link to someone who maintains a list of agents: http://www.psychedelix.com/agents.html
__________________
Mike
sde is offline   Reply With Quote
Old 09-15-2005, 10:11 PM   #6 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 677
DJMaze is on a distinguished road
thanks sde but i already had that bookmarked
This script is not about specific bots, it just bans them all at the moment
DJMaze is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
.pif - malicious email ? sde Lounge 5 04-02-2003 09:52 PM


All times are GMT -8. The time now is 09:47 AM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0 RC8





Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting