|
 |
|
 |
04-27-2006, 02:57 AM
|
#1 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
Reading a file
I am trying to read in a file which will get rid of unwanted variables like ------ as you can see below
I was thinking of using split and strip but if i do surly i gota stores these lines as string then
I can manipulate the string and get rid of the -----------
ID |FIRSTNAME| SECONDNAME |Study |
------------------------------------------
12400 JOHN MAN IT
------------------------------------------
12500 NICE JOHN LAW
-------------------------------------------
12600 PATRICK MAN CS
------------------------------------------
so far my psuedo code is
Read in the file
Store each line to be a string array
get rid of the ------ (so get rid the lines that have -------)
while loop print line
until no more line
Which is poor
i dont know what function i can use to get rid of the ------ and display the data in a nice format like
ID FIRSTNAME SECONDNAME Study
12400 JOHN MAN IT
12500 NICE JOHN LAW
12600 PATRICK MAN CS
so what would be my next line in the psuedo code after Read in file, do u think I should store the data in a array then have a loop which will display the lines until EOF but after that what function do i use to remove the -----
Last edited by coolman; 04-27-2006 at 09:43 AM.
|
|
|
04-29-2006, 02:35 AM
|
#2 (permalink)
|
|
Jack of all trades
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
|
I assume you're using python? Python probably has a regular expression library, so you could use a regular expression like '^-+$' to match lines that only contain -'s . Or you could decide that every --- line will have exactly 12 dashes and you could use a string comparison function to see if the value of the line is eq to '------------' .
__________________
Stop intellectual property from infringing on me
|
|
|
04-29-2006, 03:19 AM
|
#3 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
Yeah Python
but dont i need to store each line as a string in the file and then get rid of the string that contain -------------
|
|
|
04-30-2006, 01:31 PM
|
#4 (permalink)
|
|
Jack of all trades
Join Date: Feb 2005
Location: Los Angeles
Posts: 598
|
So your psuedo code looks like this:
Code:
Open File
while there are lines to be read in file
Store each line as a temporary String
If string does not match '^-+$' add string to output array
Close file
Foreach entry in output array
print string
Now for the match I'd recommend something like re.compile('^-+$').search(tmpstr,1) Here's a link to the pydoc for re http://docs.python.org/lib/module-re.html
__________________
Stop intellectual property from infringing on me
|
|
|
05-01-2006, 04:52 AM
|
#5 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
This is what I have so far not sure its right
This is what I have so far not sure its right as I am new to Python
Code:
f = open("StudentList.txt", "r") #Open file
for line in f: # while there are lines to be read in file
String = line # Store each line as a temporary String
if string = re.sub(‘\-*’, ‘’, string) # If string does not match '^-+$' add string to output array
|
|
|
05-01-2006, 06:28 AM
|
#6 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,720
|
Perhaps this small sample code will help you along:
Code:
#!/usr/bin/python
import re # make regular expression available
list = []
f = open("Studentlist.txt", "r") # open file
lines = f.readlines() # make every line available
for line in lines: # loop through lines
if not re.match("^-", line): # if line dosn't start with '-'
list.append(line[0:-1])# Add to line array without '\n'
del list[0] # remove the first line "ID | etc "
for item in list:
print item
FOr reference, as teknomage noted, you might want to read up on the regular expression object in python, I find The Regular expression howto usefull, from my code, what you need to do, is to find a way to split your lines in the list, so you'll end up with a list like:
Code:
[["12400", "JOHN MAN", "IT"],
["12500", "NICE JOHN", "LAW"],
["12600", "PATRICK MAN", "CS"]]
Right now the list in my code will end up containing
Code:
["12400 JOHN MAN IT",
"12500 NICE JOHN LAW",
"12600 PATRICK MAN CS"]
Which can be acomplished by using the regular expression methods on the list created by my small code portion.
|
|
|
05-01-2006, 08:20 AM
|
#7 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
ok
i cant seem to print the final result so i dont know if it has worked but yeah i agree i still need to make it better like u said. so i need to add the commas , after each string in that list
Code:
[["12400", "JOHN MAN", "IT"],
["12500", "NICE JOHN", "LAW"],
["12600", "PATRICK MAN", "CS"]]
|
|
|
05-01-2006, 09:23 AM
|
#8 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,720
|
Look at the split section That might help you a bit.
Here is a fast inplementation of it, it's not ideal, but it will give you a hint..
Code:
#!/usr/bin/python
import re # make regular expression available
list = []
f = open("Studentlist.txt", "r") # open file
lines = f.readlines() # make every line available
for line in lines: # loop through lines
if not re.match("^-", line): # if line dosn't start with '-'
list.append(re.split('[\W]+', line[0:-1]))# add splitted line
del list[0] # remove the first line "ID | etc "
for item in list:
if len(item):
print "--------------------------------"
print "Student ID:\t", item[0]
print "Student Name:\t", item[1], item[2]
print "Student Class:\t", item[-1]
|
|
|
05-02-2006, 11:04 AM
|
#9 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
I am going to take a different apporch now
I am going to take a slightly different apporch now. I want to mainly concentrate on the spliting the data up using the commas ,
current data
ID |FIRSTNAME| SECONDNAME |Study |
------------------------------------------
12400 JOHN MAN IT
------------------------------------------
12500 NICE JOHN LAW
-------------------------------------------
12600 PATRICK MAN CS
------------------------------------------
Code:
f = open("List", "r") #open and read file
returnline = [] #empty list
for line in f.readlines(): #loop for making each line into a large string
Now I would think it would be to split the data up with the , so in the program i am going to have ' ' a white space and i want to tell the program to fill that white space with , so the data is splited by , since i am using list i know i gota concate the string with the list called returnline using append
i was thinking using this code
Code:
returnlines.append(line.replace('\n', '').split(','))
#by asking the line string to be replaced the \n with nothing ''
the new data should look like something like dis
------------------------------------------
12400, JOHN, MAN, IT,
------------------------------------------
12500, NICE, JOHN, LAW,
-------------------------------------------
12600, PATRICK, MAN, CS,
------------------------------------------
but i am stuck now dont know wat to do
|
|
|
05-02-2006, 11:24 AM
|
#10 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,720
|
That is not how split() works, it splits the line up by the given parameter, the way you are using it, it would split the lines where ever it encounters a ',', but you want to replace every space with your ',', that way you would need something like this:
Code:
f = open("List", "r") #open and read file
returnline = [] #empty list
for line in f.readlines():
returnline.append(re.compile(' ').sub(',', line[0:-1]))
as an explanation, the will set it to be active at every space found, the
Code:
.subn(',', line[0:-1])
will substitute every occurence of the activated part (the space) within the line[0:-1] (meaning the line without the trailing '\n') with the comma (',')
Since the line can be accessed as a char array, theres no point in trying to replace '\n' with anything, since it will allways remain as the very last char (index END-1, thus [0:-1] meaning char array starting from index 0, first char to position -1 from end) beacause when reading from a file, you will allways read to EOF or EOL (end of line) hence where '\n' resides.
Then later on, you can use split() to split at every ',' in your line, if that is what you wishes, but then again, I can't see why it wouldn't be usefull with my usage, where I split it at every space.. It would be the same result, only a few more operations in your solution.
Last edited by redhead; 05-02-2006 at 11:43 AM.
|
|
|
05-02-2006, 01:40 PM
|
#11 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
it says line is not defined
when i ran dis pieace of code
returnline.append(re.compile(' ').sub(',', line[0:-1]))
|
|
|
05-02-2006, 07:50 PM
|
#12 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,720
|
Works for me:
Code:
#!/usr/bin/python
import re
returnline = []
f = open("List", "r")
for line in f.readlines():
returnline.append(re.compile(' ').sub(',', line[0:-1]))
for item in returnline:
print item
Quote:
> ./test.py
------------------------------------------
12400,,JOHN,,MAN,,IT,
------------------------------------------
12500,,NICE,,JOHN,,LAW,
-------------------------------------------
12600,,PATRICK,,MAN,,CS,
------------------------------------------
|
|
|
|
05-03-2006, 11:26 AM
|
#13 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
sorry it still cant get it to work
I am sure its just the indentation
Code:
import re
returnline = []
f = open("StudentList.txt", "r")
for line in f.readline():
returnline.append(re.compile(' ').sub(',', line[0:-1]))
for item in returnline: #its dis bit that is pain when i put it there it says
IndentationError: unindent does not match any outer indentation level (<pyshell#92>, line 3)
I dont know wat i am doing wrong
Last edited by coolman; 05-03-2006 at 11:27 AM.
Reason: spelling
|
|
|
05-04-2006, 03:34 AM
|
#14 (permalink)
|
|
Registered User
Join Date: Mar 2006
Posts: 25
|
it dont work fully
current data
NUM |FIRSTNAME| SURNAME |course |
------------------------------------------
12400 ERIC RONNEY CS
------------------------------------------
12500 MICHAEL OWEN CS
------------------------------------------
12600 SCOT MAN CS
------------------------------------------
12700 GEOGE YO IT
------------------------------------------
Code:
import re
returnline = []
f = open("List.txt", "r")
for line in f.readlines():
returnline.append(re.compile(' ').sub(',', line[0:-1]))
for item in returnline:
print item
new data after the program is excutered
Code:
NUM,,,|FIRSTNAME|,SURNAME,|course,|
NUM,,,|FIRSTNAME|,SURNAMENAME,|course,|
------------------------------------------
NUM,,,|FIRSTNAME|,SECONDNAME,|course,|
------------------------------------------
12400 ERIC ROONEY CS
NUM,,,|FIRSTNAME|,SURNAME,|course,|
------------------------------------------
12400 ERIC ROONEY CS
------------------------------------------
NUM,,,|FIRSTNAME|,SURNAME,|course,|
------------------------------------------
12400 ERIC ROONEY CS
------------------------------------------
12500 MICHAEL OWEN CS
NUM,,,|FIRSTNAME|,SURNAME,|course,|
------------------------------------------
and the program just rans forever and it dont stop
|
|
|
05-04-2006, 07:12 AM
|
#15 (permalink)
|
|
Newbie
Join Date: Jun 2002
Location: Denmark
Posts: 1,720
|
It's the indentation, where otehr programming languagess wants scopes defining where what starts/ends by using brackets, python uses indentation, in your example code you have the for loop printing the item from returnline within the scope of you for loop reading from the file, in C/C++ it would be equivalent to writing something like this:
Code:
int j=0, i;
while(fgets(buffer, MAX_SIZE, file_pointer))
{
buffer[strlen(buffer)-1] = '\0';
strcpy(list[j], buffer);
for(i=0; i< j; ++i)
{
printf("%s\n", list[i]);
}
++j;
}
As you can see the printing of your gathered lines should only occure when you've finished filling in the list of lines found in the file, so obvius we're looking for this solution:
Code:
int j=0, i;
while(fgets(buffer, MAX_SIZE, file_pointer))
{
buffer[strlen(buffer)-1] = '\0';
strcpy(list[j], buffer);
++j;
}
for(i=0; i< j; ++i)
{
printf("%s\n", list[i]);
}
This should work, but not testet:
Code:
import re
returnline = []
f = open("List.txt", "r")
for line in f.readlines():
returnline.append(re.compile(' ').sub(',', line[0:-1]))
for item in returnline:
print item
Note the way theres no indentation befor the second for line.
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -8. The time now is 02:48 PM.
|
Copyright © 2000-2008, Milano Interactive
Web Hosting provided by Portal 360 Web Hosting
|
 |
|