Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums
Old 10-18-2005, 12:27 AM   #1 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 596
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
Help Optimizing a Tokenizer in Lisp

Okay so I'm working on a language interpreter to settle a bet with a friend. Unfortunately, right now the tokenizer I've writen is taxing the processor. Originally I had it set up to all be handled by the regex engine. It used almost no processing time, but it was dropping important characters, and seemed somewhat unpredicable (in the sense that it wouldn't always return the longest match). Any ideas (short of rewrite it in C with lex and yacc) would be appreciated.
Code:
;original
(defun tokenizer (str re-list &optional (output nil))
    (let ((target (string-trim '(#\Newline #\Tab #\Space) str)))
      (if (string= "" target)
	  output
	  (let
	      ((pos
		(car (sort (delete-if-not
			    (lambda (x) (and (numberp (car x)) (numberp (cadr x))))
			    (mapcar
			     (lambda (re)
				   (multiple-value-bind (start end)
				       (cl-ppcre:scan re target) 
                                       (list start end))) re-list))
			   (lambda (x y)
			     (cond ((< (car x) (car y)) t)
				   ((and (= (car x) (car y)) (> (cadr x) (cadr y))) t)
				   (t nil)))))))
	    (if (null pos)
		(append1 output target)
		(tokenizer
		 (subseq target (cadr pos)) re-list
		 (if (> (car pos) 0)
		     (nconc output (list (subseq target 0 (car pos))
					 (subseq target (car pos) (cadr pos))))
		     (append1 output (subseq target (car pos) (cadr pos))))))))))
EDIT: Got some help from the kind folks over at irc.freenode.net#lisp (second code snippet reflects some of their suggestions), and it now uses an acceptable amount of processing power (memory usage might need som work) but any other ideas would be appreciated.

Code:
;post suggestions from #lisp
(defun tokenizer (str re-list &optional (output nil))
    ;re-list is now a precompiled list of re's instead of text strings 
    ;that would be compiled by cl-ppcre:scan 
    (let ((target (string-trim '(#\Newline #\Tab #\Space) str)))
      (if (string= "" target)
	  output
	  (let
	      ((pos
                 ;reduce replaces the expensive calls to sort 
		(reduce (lambda (x y)
			     (cond ((< (car x) (car y)) x)
				   ((and (= (car x) (car y)) (> (cadr x) (cadr y))) x)
				   (t y)))
			(delete-if-not
			    (lambda (x) (and (numberp (car x)) (numberp (cadr x))))
			    (mapcar
			     (lambda (re)
				   (multiple-value-bind (start end)
				       (cl-ppcre:scan re target) 
                                       (list start end))) re-list)))))
	    (if (null pos)
		(append1 output target)
		(tokenizer
		 (subseq target (cadr pos)) re-list
		 (if (> (car pos) 0)
		     (nconc output (list (subseq target 0 (car pos))
					 (subseq target (car pos) (cadr pos))))
		     (append1 output (subseq target (car pos) (cadr pos))))))))))[
__________________
Stop intellectual property from infringing on me

Last edited by teknomage1; 10-18-2005 at 01:19 AM.
teknomage1 is offline   Reply With Quote
Old 11-08-2005, 07:44 PM   #2 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 596
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
Round 2 based on the chained lexer from "Higher Order Perl"
Code:
(defun return-matched-and-unmatched-strings (input-string search-pattern token-label make-token-function)
  (let ((outlist nil)
	(flag nil)
	(last-index 0))
    ;;Seperates the numeric indices return from all-matches into chunks of chars
    (mapc (lambda (x)
	    (if flag
		(progn (push
			     (funcall make-token-function (subseq input-string last-index x) token-label)
			     outlist)
			    (setf flag nil))
		(progn (push (subseq input-string last-index x) outlist)
		       (setf flag t)))
	    (setf last-index x)) (cl-ppcre:all-matches search-pattern input-string))
    (push (subseq input-string last-index) outlist)
    ;;return in the order tokens were found
    (nreverse (remove-if (lambda (x)
		 (and (stringp x)
		      (string= x ""))) outlist))))

(defun lexer (input token-label search-pattern make-token-function)
  (mapcan (lambda (x)
	    (cond ((listp x) (list x))
		  ((stringp x) (return-matched-and-unmatched-strings x search-pattern token-label make-token-function))))
	  input))

;(tokenizer "string" '((:name pattern func)
;		      (:name2 pattern2 func2)))
;   =>
;(lexer (lexer (list "string") :name pattern func) :name2 pattern2 func2)
;let input = (list string)
;mapc lambda patterns
;;(setf input (lexer input (car x) (cadr x) (caadr x)))
;input

(defun tokenizer (input-string token-lists)
  "Takes a string and a list of token patterns and returns the results of chain lexing the patterns.
Patterns are in the format (:name regex make-token-function)
Examples:
(tokenizer \"1 * 6 + 7\"
	   (list (list :whitespace \"\\\\s+\" (lambda (x y) \"\"))
		 (list :number \"[0-9]+\" (lambda (x y) (list y x)))
		 (list :operator \"[*/+-]\" (lambda (x y) (list y x)))))
((:NUMBER \"1\") (:OPERATOR \"*\") (:NUMBER \"6\") (:OPERATOR \"+\") (:NUMBER \"7\"))"
  (let ((input (list input-string)))
	(mapc (lambda (x)
		(setf input (lexer input (car x) (cadr x) (caddr x)))) token-lists)
	input))
__________________
Stop intellectual property from infringing on me
teknomage1 is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -8. The time now is 02:14 PM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0 RC8





Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting