Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums

Go Back   Code Forums > Application and Web Development > PHP

Reply
 
LinkBack Thread Tools Display Modes
Old 05-21-2006, 10:53 AM   #1 (permalink)
NirTivAal
Registered User
 
Join Date: Oct 2003
Posts: 11
NirTivAal is on a distinguished road
Using regular expressions to clean HTML in user input

Essentially what I'm trying to do is take input from a form and clean it up so that uses of < and > that aren't meant to be for tags are changed to the notation &lt; and &gt; so that there's no chance of them being misinterpreted as tags by a browser.

Code:
$input = htmlspecialchars($input);
$input = preg_replace('@&lt;([a-z0-9]+(\s[a-z]+(=".+")?)*/?|/[a-z0-9]+)&gt;@', '</1>', $input);
This is as far as I've gotten; basically I convert every < or > to the other notaion, then look for uses of them that look like tags and convert them back. The preg_replace looks for substrings that start with the modified <, followed by the name of the tag, then possibly attributes and the closing /, followed by > OR <, followed by /, then the name of the tag, then >
It works pretty well in most cases, except when the user does something like <sfajklhsfsjk> or </rhfkj>

I was thinking I could solve this by another preg_replace that undoes the first one where there is a tag that isn't properly closed/opened. The problem is I'm not sure how to implement it, since I don't know how to do a regular expression that looks for the lack of a pattern.

Is there a way to solve the problem, or just a simpler way of doing everything?
NirTivAal is offline   Reply With Quote
Old 05-21-2006, 11:14 AM   #2 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 661
DJMaze is on a distinguished road
http://php.net/htmlspecialchars
DJMaze is offline   Reply With Quote
Old 05-21-2006, 10:50 PM   #3 (permalink)
DJMaze
Senior Contributor
 
DJMaze's Avatar
 
Join Date: Mar 2005
Posts: 661
DJMaze is on a distinguished road
preg_replace('#&lt;([a-z0-9]+)(.*?)&gt;(.*?)&lt;/\\1&gt;#', '<\\1\\2>\\3</\\1>', $input);
DJMaze is offline   Reply With Quote
Old 05-23-2006, 10:19 PM   #4 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 596
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
If you want to start checking for properly balanced tags you have to build more sophisticated machinery than just regular expressions. Arbitrarily nestable expressions like (1 + (2 * (6 - 3)) and html tag trees, are not regular languages and cannot be accurately described by regular expressions.

It seems like you can get away with only replacing < or > than signs that are not next to text on your first run through. eg. 's/<([^A-Za-z])/&lt\1/g' and 's/([^A-Za-z])>/\1&gt/g' .
__________________
Stop intellectual property from infringing on me
teknomage1 is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
need help with copying backwards rogue Standard C, C++ 9 04-24-2005 04:39 PM
User input teknomage1 Standard C, C++ 8 04-22-2005 12:28 AM
PHP Regular Expressions Epsilon PHP 5 12-24-2003 07:36 AM
regular expressions - extract a string sde PHP 1 06-16-2003 03:05 PM
dynamic allocation..urgent help needed!!! kashif Standard C, C++ 4 04-21-2003 08:50 AM


All times are GMT -8. The time now is 07:27 PM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0 RC8





Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting