Code Newbie
News     Forums     Search     Members     Sign Up    

My Code Newbie
Username

Password

Articles/Snippets
ASP Classic
ASP.NET
C
C#
C++
HTML / CSS
Java
Javascript
Linux / BSD
Perl
PHP
Python
Ruby
SQL
VB 6
VB.NET

C.N. Friends
  Planet Rome

Link to Us!
Code Newbie
  Code Newbie
    forums
Old 07-13-2006, 09:05 AM   #1 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
Question can i clean up my data with C

hey all

i've got some data that i'm getting from SBA's website in comma-delimited form, and i'm trying to import into Excel.

http://dsbs.sba.gov/dsbs/dsp_dsbs.cfm

however some of the data includes hitting the "enter" key, and screws up import in to excel. I have thousands of lines to go through, and it happens ever 10 lines or so, so i need to write something to take care of it. I've done a little bit of coding, though mostly web-based (php, javascript, asp, etc). Anyway it's been awhile, but I know there should be a pretty quick way for me to do this....Any suggestions? maybe java would be easier?

also, would this be using reg exps, or are there such things in C?

I don't have any compiliers on this computer so any help is much appreciated...thanks guys.
heinz1218 is offline   Reply With Quote
Old 07-13-2006, 09:20 AM   #2 (permalink)
redhead
Newbie
 
redhead's Avatar
 
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
redhead is on a distinguished road
What does it look like, what you're saying here isn't exactly describing your problem, if we get some info on how your comma seperated is formed, if the Includes hitting the enter key has something todo with breaking the lines so it would be something like:
Quote:
first, second, third, forth
first, second, third, forth
first, second, th<enter>
ird, forth
first, second, third, forth
So you had something usefull to search for, say "all lines which has less than N commas must be broken from a user entered <enter>" so combine that with whatever following on the next line..

Else if it's just something like: "every tenth line is screwed, so combine that with the eleventh it's just a matter of counting the lines, and then mengle with the tenth/eleventh line...

But in order to see what we have here, we realy need to take a closer look at what an occurance of this might look like.
__________________
Don't worry Ma'am, We're university students, We know what We're doing.
-----
If you pull the pin, Mr.Grenade would no longer be your friend.
-----
01000111 01101111 00100000 01000011 00100000 00100001
redhead is offline   Reply With Quote
Old 07-13-2006, 09:38 AM   #3 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
Ok my bad, didn't know how to properly explain it before; the data looks like this:

Quote:
"1175","MONTGOMERY, BILLY","SYSNETCORP","BILLY MONTGOMERY","Owner/CEO","8647 RICHMOND HWY STE 637","","ALEXANDRIA","VA","22309-4316","Workstation and Server<enter> Maintenance and Installation<enter>
Router Installation and Configuration<enter>
Firewall Installation and Configuration<enter>
LAN and WAN Design"

Quote:
"first","second","third","forth, 4th"
"first","second","third","forth - 4th"
"first","second","third","for<enter>
th"
"first","second","third","forth, 4th"
basically the last section has variations (dashes, slashes, commas) within the quote marks, but what seems to be screwing up the import is the <enter>'s. So whenever someone has entered data and hit <enter>, like in the example, the next line is messed up in excel.
heinz1218 is offline   Reply With Quote
Old 07-13-2006, 10:39 AM   #4 (permalink)
redhead
Newbie
 
redhead's Avatar
 
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
redhead is on a distinguished road
This might sound as a silly question... But can one safely rely on every contence beeing embedded in quotes ??
ie
Quote:
"first", "second", "third, with commas (other chars in)", "firth"
So it would be, start at begin quote, search for ending quote, combine what ever was read between quotes to a single line....
__________________
Don't worry Ma'am, We're university students, We know what We're doing.
-----
If you pull the pin, Mr.Grenade would no longer be your friend.
-----
01000111 01101111 00100000 01000011 00100000 00100001
redhead is offline   Reply With Quote
Old 07-13-2006, 10:43 AM   #5 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
Quote:
This might sound as a silly question... But can one safely rely on every contence beeing embedded in quotes ??
ie

Quote:
"first", "second", "third, with commas (other chars in)", "firth"
yes, because the user is intering this into a web application at that website:



and then the code is being generated in comma delimited format. So do you know of a way I go about reformatting this text? Thanks man.
heinz1218 is offline   Reply With Quote
Old 07-13-2006, 10:59 AM   #6 (permalink)
redhead
Newbie
 
redhead's Avatar
 
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
redhead is on a distinguished road
Then something like this might be usefull:
Code:
#include <stdio.h>

int main(int argc, char** argv)
{
  FILE *ifp, *ofp;
  int hold_eol = 0;
  char ch;
  if(argc != 3){
    printf("Usage: %s <infile> <outfile>\n", argv[0]);
    return -1;
  }
  if(!(ifp=fopen(argv[1], "r"))){
    printf("Error opening file %s for reading\n", argv[1]);
    return -1;
  }
  if(!(ofp=fopen(argv[2], "w"))){
    printf("Error opening file %s for writing\n", argv[2]);
    return -1;
  }
  while(EOF!=(ch=fgetc(ifp))){
    if(ch == '\"')
      if(hold_eol)
	hold_eol = 0; /* we exit a quote container */
      else
	hold_eol = 1; /* we enter a quote container */
    if(ch == '\n' && hold_eol)
      ch=' '; /* replace eol with space */
    fprintf(ofp, "%c", ch); /* print current char to file */
  }
  fclose(ifp);
  fclose(ofp);
  return 0;
}
I think it is quite easy to transform into some PHP code... Since PHP syntax is very much like C/C++ and the PHP API provides almost identical library functions.
__________________
Don't worry Ma'am, We're university students, We know what We're doing.
-----
If you pull the pin, Mr.Grenade would no longer be your friend.
-----
01000111 01101111 00100000 01000011 00100000 00100001
redhead is offline   Reply With Quote
Old 07-13-2006, 11:13 AM   #7 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
forgive my ignorance, but i don't really understand what that does. i'm assuming somewhere i'd have to put in the filename of the actual data, and this would just replace certain characters and edit the file directly?

also, i'm downloading the bloodshed compiler, so once i install that i should run your code w/the correct filename of the source data and it'll work? it's importing "<stdio.h>"?
heinz1218 is offline   Reply With Quote
Old 07-13-2006, 11:26 AM   #8 (permalink)
redhead
Newbie
 
redhead's Avatar
 
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
redhead is on a distinguished road
The stdio.h provides the file access functions ie: fopen(), fgetch(), fclose()
This code is ANSI compliant so it should compile with bloodshed without any hassle.

Basicaly what the program does, is:
  • open the text file given as first argument (input file)
  • open (and overwriting) a text file given as second argument (output file)
  • read input file one char at a time
  • Check to see if it's a quote
    • if it's a start-quote refuse to print any <newlines> read from input file to output file, untill end-quote is read
  • print any char read to output file
You would use it like:
Quote:
program my_input_data.txt altered_data.txt
__________________
Don't worry Ma'am, We're university students, We know what We're doing.
-----
If you pull the pin, Mr.Grenade would no longer be your friend.
-----
01000111 01101111 00100000 01000011 00100000 00100001
redhead is offline   Reply With Quote
Old 07-13-2006, 11:49 AM   #9 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
again, forgive my ignorance.

i've installed dev-c++, and created a new file for a "windows based environment." Then i renamed my data file "input.txt"

so then where do i put this line into the code:
Quote:
program input.txt output.txt
or if that's not what i put into the code, do i replace
Quote:
FILE *ifp, *ofp;
thanks again
heinz1218 is offline   Reply With Quote
Old 07-13-2006, 12:01 PM   #10 (permalink)
redhead
Newbie
 
redhead's Avatar
 
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
redhead is on a distinguished road
Create a new project (dos/console app) and place the code I've given into the suggested .c file.
Compile the project, which in terms will produce an executable file, usualy called a.out or out.exe (can't remember what dev-c++ produces, mainly because I've never used it)
This output file, is the actual "program" mentioned in my example of usage.

So theres no need to alter any part of the code I've provided.

The mentioned program input.txt output.txt is the actual way to use it from a console (start -> run -> cmd)
__________________
Don't worry Ma'am, We're university students, We know what We're doing.
-----
If you pull the pin, Mr.Grenade would no longer be your friend.
-----
01000111 01101111 00100000 01000011 00100000 00100001
redhead is offline   Reply With Quote
Old 07-13-2006, 12:26 PM   #11 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
awesome that worked. so here's what i did, to anyone else reading it:

-downloaded bloodshed, look at the sticky for the link in the main forum
-unzipped/installed dev-c++
-copy/pasted the code redhead so graciously provided, then compiled it (Execute>compile)
-loaded dos prompt (start>run>cmd then 'ok')
-navigated to the folder where the compiled program is (same name as you saved the project as)
-type name of your program input.txt output.txt

That worked, found my input.txt and outputed it as output.txt, sweeeet

out of curiousity, using C++ is there a way to make it user friendly so someone can just double click it in windows, and then it prompts them to find the file or something to that effect? (some other people besides me who don't know dos or anything might need this some day).

Thanks again
heinz1218 is offline   Reply With Quote
Old 07-13-2006, 12:33 PM   #12 (permalink)
redhead
Newbie
 
redhead's Avatar
 
Join Date: Jun 2002
Location: Denmark
Posts: 1,705
redhead is on a distinguished road
Quote:
is there a way to make it user friendly so someone can just double click it in windows, and then it prompts them to find the file or something to that effect?
That would require some GUI interface, which would require quite alot more code, than the provided snippet...
This however would also make it uncompliant with any ANSI/ISO standards, and thus wouldn't work "out of the box" for anyone intended on using it.

I'm tired, I'm going to bed.. Hope everyone had a good day.
__________________
Don't worry Ma'am, We're university students, We know what We're doing.
-----
If you pull the pin, Mr.Grenade would no longer be your friend.
-----
01000111 01101111 00100000 01000011 00100000 00100001
redhead is offline   Reply With Quote
Old 07-13-2006, 12:38 PM   #13 (permalink)
heinz1218
Registered User
 
Join Date: Jul 2006
Posts: 9
heinz1218 is on a distinguished road
good to know, thanks a bunch.
heinz1218 is offline   Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
C#...Converting Binary Data to ASCII wvufreelancer MS Technologies ( ASP, VB, C#, .NET ) 2 05-28-2006 10:04 AM
Kaat a talking bot in c nvictor Platform/API C++ 10 05-19-2005 01:16 PM
edit? anon Lounge 10 11-21-2002 03:02 PM


All times are GMT -8. The time now is 09:19 PM.


Powered by vBulletin® Version 3.7.0
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.0.0 RC8





Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting