View Single Post
Old 10-18-2005, 12:27 AM   #1 (permalink)
teknomage1
Jack of all trades
 
teknomage1's Avatar
 
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
teknomage1 is on a distinguished road
Send a message via AIM to teknomage1
Help Optimizing a Tokenizer in Lisp

Okay so I'm working on a language interpreter to settle a bet with a friend. Unfortunately, right now the tokenizer I've writen is taxing the processor. Originally I had it set up to all be handled by the regex engine. It used almost no processing time, but it was dropping important characters, and seemed somewhat unpredicable (in the sense that it wouldn't always return the longest match). Any ideas (short of rewrite it in C with lex and yacc) would be appreciated.
Code:
;original
(defun tokenizer (str re-list &optional (output nil))
    (let ((target (string-trim '(#\Newline #\Tab #\Space) str)))
      (if (string= "" target)
	  output
	  (let
	      ((pos
		(car (sort (delete-if-not
			    (lambda (x) (and (numberp (car x)) (numberp (cadr x))))
			    (mapcar
			     (lambda (re)
				   (multiple-value-bind (start end)
				       (cl-ppcre:scan re target) 
                                       (list start end))) re-list))
			   (lambda (x y)
			     (cond ((< (car x) (car y)) t)
				   ((and (= (car x) (car y)) (> (cadr x) (cadr y))) t)
				   (t nil)))))))
	    (if (null pos)
		(append1 output target)
		(tokenizer
		 (subseq target (cadr pos)) re-list
		 (if (> (car pos) 0)
		     (nconc output (list (subseq target 0 (car pos))
					 (subseq target (car pos) (cadr pos))))
		     (append1 output (subseq target (car pos) (cadr pos))))))))))
EDIT: Got some help from the kind folks over at irc.freenode.net#lisp (second code snippet reflects some of their suggestions), and it now uses an acceptable amount of processing power (memory usage might need som work) but any other ideas would be appreciated.

Code:
;post suggestions from #lisp
(defun tokenizer (str re-list &optional (output nil))
    ;re-list is now a precompiled list of re's instead of text strings 
    ;that would be compiled by cl-ppcre:scan 
    (let ((target (string-trim '(#\Newline #\Tab #\Space) str)))
      (if (string= "" target)
	  output
	  (let
	      ((pos
                 ;reduce replaces the expensive calls to sort 
		(reduce (lambda (x y)
			     (cond ((< (car x) (car y)) x)
				   ((and (= (car x) (car y)) (> (cadr x) (cadr y))) x)
				   (t y)))
			(delete-if-not
			    (lambda (x) (and (numberp (car x)) (numberp (cadr x))))
			    (mapcar
			     (lambda (re)
				   (multiple-value-bind (start end)
				       (cl-ppcre:scan re target) 
                                       (list start end))) re-list)))))
	    (if (null pos)
		(append1 output target)
		(tokenizer
		 (subseq target (cadr pos)) re-list
		 (if (> (car pos) 0)
		     (nconc output (list (subseq target 0 (car pos))
					 (subseq target (car pos) (cadr pos))))
		     (append1 output (subseq target (car pos) (cadr pos))))))))))[
__________________
Stop intellectual property from infringing on me

Last edited by teknomage1; 10-18-2005 at 01:19 AM.
teknomage1 is offline   Reply With Quote