Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums
Old 01-19-2006, 01:30 PM   #1 (permalink)
sde
Moderator
 
sde's Avatar
 
Join Date: May 2002
Location: us.ca
Posts: 4,532
sde is on a distinguished road
need help with a regex

HTML Code:
var str = "abc ( 123 ( 456 ) ) ( 789 )";
i need to capture the first level of open/close parens in an array. i want the output of my regext to match 2 items creating an array like this:
HTML Code:
array[0] = "( 123 ( 456 ) )";
array[1] = "( 789 )";
sde is offline   Reply With Quote
Old 01-19-2006, 09:26 PM   #2 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
I think this is best done with a stack rather than regular expressions, because regexes break down on arbitrarily nestable structures. So basically everytime you see a open paren, push onto a stack and whenever you see a close paren, pop stuff off the stack and when the stack is empty you'll have the outermost parenthetical expression.
__________________
Stop intellectual property from infringing on me

Last edited by teknomage1; 01-19-2006 at 11:35 PM.
teknomage1 is offline   Reply With Quote
Old 01-19-2006, 11:33 PM   #3 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
I think this is reasonably robust. It takes a string as input and returns an array with each of the top level parenthetical expressions. It seems to handle stray parenthesis gracefully.

PHP Code:
function parenreader($input) {
    
$output = array();
    
$inbuffer $input;
    
$outbuffer "";
    
$stack 0;
    while( 
strlen($inbuffer) > ) {
        
$op strpos($inbuffer,'('); #saw an open paren
        
$cp strpos($inbuffer,')'); #saw a close paren
        #saw an open paren first
        
if( $op $cp && !empty($op)) {
            
$stack++;
            
$outbuffer .= ( $stack substr($inbuffer0$op+1) : '(' );
            
$inbuffer substr($inbuffer,$op+1);
        }
        
#saw a close paren first
        
if( $cp < ( empty($op) ? $cp+$op) ) {
            
$segment substr($inbuffer0$cp+1); #save this in case we need it
            
$inbuffer substr($inbuffer,$cp+1); #move on in case it's a stray paren
            #this is part of an expression
            
if( $stack ) {
                
$stack--;
                
$outbuffer .= $segment;
            }
        }
        
#the stack is empty and we have an expression
        
if ($stack == && strlen($outbuffer) > 0) {
            
array_push($output$outbuffer);
            
$outbuffer "";
        }
        
#No more full expressions to read
        
if ( empty($cp) || empty($op) ) {
            
$inbuffer "";
        }
    }
    return 
$output;

__________________
Stop intellectual property from infringing on me
teknomage1 is offline   Reply With Quote
Old 01-20-2006, 12:06 AM   #4 (permalink)
sde
Moderator
 
sde's Avatar
 
Join Date: May 2002
Location: us.ca
Posts: 4,532
sde is on a distinguished road
i was thinking regex .. this is an interesting approach. i'll port it to javascript and give it a try tomorrow. thanks!
sde is offline   Reply With Quote
Old 01-20-2006, 01:22 AM   #5 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
I'm pretty sure it's impossible to write an accurate regex for an infinitely nestable structure due to their nature. Once a regular expression is compiled it basically has no memory and all state changes are based on the character from the input stream and one character look ahead. Smart tricks like back references make them seem smarter, but they're really simplistic at their core. The device simply doesn't have enough information to balance parenthesis or html tags.
__________________
Stop intellectual property from infringing on me
teknomage1 is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
regex for allowed characters in a form field DogTags PHP 1 03-13-2005 09:42 AM
regex philthee Java 5 11-01-2004 05:34 AM
regex match help please sde PHP 4 06-24-2003 08:32 PM
preg_replace regex question sde PHP 4 09-18-2002 11:23 AM
regex? sde PHP 4 07-02-2002 05:40 PM


All times are GMT -8. The time now is 12:32 AM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO 3.0.0 RC8 ©2007, Crawlability, Inc.





Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting