View Single Post
Old 07-21-2006, 11:36 PM   #4 (permalink)
waveclaw
Recruit
 
waveclaw's Avatar
 
Join Date: Jul 2006
Location: USA
Posts: 19
waveclaw is on a distinguished road
Send a message via ICQ to waveclaw Send a message via AIM to waveclaw Send a message via MSN to waveclaw Send a message via Yahoo to waveclaw
Quote:
Also I am not limited to C / C++ if this solution would be easier using a different language.
C and C++ are not very fun languages with which to write string maniplating programs. You will spend at least 1/2 your time dealing with memory allocation, leaks and buffer manangment (overflows, underflows, fix vs dynamic allocation, magic size numbers.)

Perl is available for both Microsoft and other platforms (Linux, Solaris, etc) and is well suited to text translation. There are some who will encourage you to learn Python instead. However, to effectively use these languages you will need to know Regular Expression syntax (arguable a language all it's own.)

Do you have access to a Unix environment?

There are several tools including tr, awk, sed, cut and paste that make text conversion easy if you can get predictable and regular strings. awk and sed have their own lightweight languages suited to line-by-line text editing. For example:

Code:
#translate all pairs of spaces into tabs
tr '  ' '\t' < file.in > file.out
or
Code:
#use the stream editor (sed) to replace streches of whitespace with commas
sed -e 's/\s\s+/,/' <file.in > file.out

This particular 'project' is a common one used for training people new Unix: given a text file with some symbol throughout it, replace it with another. Introductory Unix texts and most automation, scripting or system administration books include examples of how to do this. For example, off the top of my head, a book on Perl might have:

Code:
perl -p -e 's/\s\s+/\t/g' my_file.txt > my_file.tsv
or
Code:
perl -p -e 's/\s\s+/,/g'  my_file.txt > my_file.csv
as an example of redirecting stdout to a file in a Unix shell (> my_file) and the -p (iterate over lines in a file) and -e 'text' (execute script fragment 'text' on the command line) options for perl.

Note that the 's/\s\s+//g' is a PCRE (Perl-compatible Regular Expression.) It is the substituting (s///) command for Perl and sed. It has been given the global option (g) to replace all strings on a given line. It describes any string composed of a whitespace character (\s) followed by one or more whitespace characters (\s+). It will replace the whitespace (spaces, tabs, vertical tabs, pretty much anything not a-zA-Z0-9 or a symbol) with the given character (a tab in the first example and comma in the second.) FYI, comma-seperated values can be complicated, esp. if you already have commas in your input. In that case, you will have to wrap enties in quotes or escape the comma. As I can bet this is being fed into Microsoft Excel(tm), you won't be able to use escape codes.
waveclaw is offline   Reply With Quote