I'm new to python and I wanted my code to read a csv line by line (large file) and then if the experiment ID is a specific number write that line to a csv. Only problem is it only writes the first instance then stops. Any suggestions? Thanks.
out = open('new.csv', 'w')
with open('exp.csv','r') as w:
header =w.readline()
out.write(header)
for line in w:
line = line.strip("\n")
tokens = line.split(",")
exp_id = tokens[0]
if (exp_id=='2243920414'):
out.write(line)
continue
out.close()
You can just remove the line = line.strip('\n') and it works fine.
If you remove the newline, you just append all of the matching lines onto a single line which in your eyes looked like it had only matched once.
with open('exp.csv','r') as w, open('new.csv', 'w') as out:
header = w.readline()
out.write(header)
for line in w:
tokens = line.split(",")
exp_id = tokens[0]
if (exp_id=='2243920414'):
out.write(line)
You should also check the csv module.
It's a great module for everything related to csv files.
Related
How do I remove a line from a txt file which start with ">"?
For example, in the txt file, there is about 250k+ lines and if I were to use the code below, it will take quite some time.
data = ""
with open(fileName) as f:
for line in f:
if ">" not in line:
line = line.replace("\n", "")
data += line
An example of the txt file is:
> version 1.0125 revision 0... # This is the line to be removed
some random line 1
some random line 2
> version 1.0126 revision 0... # This is the line to be removed
...
I have tried using data = f.read(), it is instant but the data will contain line that start with ">".
Any help is appreciated. Thank you :)
Not knowing what you want to do with the data afterwards, this should be fast and correct:
with open(fileName) as f:
data = "".join(line for line in f if not line.startswith(">"))
If you just want to remove these lines from the file, I would honestly not do it in Python, but in your shell directly, e.g. on Linux:
$ grep -v '^>' original_file.txt >fixed_file.txt
If you insist on Python, do it on a line-by-line basis:
with open(original_file) as f:
with open(new_file, "w") as g:
for line in f:
if not line.startswith(">"):
g.write(line)
Use two files, one for reading, second for appending:
with open(fileName, 'r') as f, open(fileName.raplace('.txt', '_1.txt'), 'a+') as df:
for line in f.readlines():
if not line.startswith('>'):
df.write(line)
I am trying to do what for many will be a very straight forward thing but for me is just infuriatingly difficult.
I am trying search for a line in a file that contains certain words or phrases and modify that line...that's it.
I have been through the forum and suggested similar questions and have found many hints but none do just quite what I want or are beyond my current ability to grasp.
This is the test file:
# 1st_word 2nd_word
# 3rd_word 4th_word
And this is my script so far:
############################################################
file = 'C:\lpthw\\text'
f1 = open(file, "r+")
f2 = open(file, "r+")
############################################################
def wrline():
lines = f1.readlines()
for line in lines:
if "1st_word" in line and "2nd_word" in line:
#f2.write(line.replace('#\t', '\t'))
f2.write((line.replace('#\t', '\t')).rstrip())
f1.seek(0)
wrline()
My problem is that the below inserts a \n after the line every time and adds a blank line to the file.
f2.write(line.replace('#\t', '\t'))
The file becomes:
1st_word 2nd_word
# 3rd_word 4th_word
An extra blank line between the lines of text.
If I use the following:
f2.write((line.replace('#\t', '\t')).rstrip())
I get this:
1st_word 2nd_wordd
# 3rd_word 4th_word
No new blank line inserted but and extra "d" at the end instead.
What am I doing wrong?
Thanks
Your blank line is coming from the original blank line in the file. Writing a line with nothing in it writes a newline to the file. Instead of not putting anything into the written line, you have to completely skip the iteration, so it does not write that newline. Here's what I suggest:
def wrline():
lines = open('file.txt', 'r').readlines()
f2 = open('file.txt', 'w')
for line in lines:
if '1st_word' in line and '2nd_word' in line:
f2.write((line.replace('# ', ' ')).rstrip('\n'))
else:
if line != '\n':
f2.write(line)
f2.close()
I would keep read and write operations separate.
#read
with open(file, 'r') as f:
lines = f.readlines()
#parse, change and write back
with open(file, 'w') as f:
for line in lines:
if line.startswith('#\t'):
line = line[1:]
f.write(line)
You have not closed the files and there is no need for the \t
Also get rid of the rstrip()
Read in the file, replace the data and write it back.. open and close each time.
fn = 'example.txt'
new_data = []
# Read in the file
with open(fn, 'r+') as file:
filedata = file.readlines()
# Replace the target string
for line in filedata:
if "1st_word" in line and "2nd_word" in line:
line = line.replace('#', '')
new_data.append(line)
# Write the file out again
with open(fn, 'w+') as file:
for line in new_data:
file.write(line)
I have the following problem. I am supposed to open a CSV file (its an excel table) and read it without using any library.
I tried already a lot and have now the first row in a tuple and this in a list. But only the first line. The header. But no other row.
This is what I have so far.
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
The output should: be every line in a tuple and all the tuples in a list.
My question is now, how can I read the other lines in python?
I am really sorry, I am new to programming all together and so I have a real hard time finding my mistake.
Thank you very much in advance for helping me out!
This problem was many times on Stackoverflow so you should find working code.
But much better is to use module csv for this.
You have wrong indentation and you use return results after reading first line so it exits function and it never try read other lines.
But after changing this there are still other problems so it still will not read next lines.
You use readline() so you read only first line and your loop will works all time with the same line - and maybe it will never ends because you never set text = ''
You should use read() to get all text which later you split to lines using split("\n") or you could use readlines() to get all lines as list and then you don't need split(). OR you can use for line in file: In all situations you don't need while
def read_csv(path):
with open(path, 'r+') as file:
results = []
text = file.read()
for line in text.split('\n'):
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
lines = file.readlines()
for line in lines:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
for line in file:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
All this version will not work correctly if you will '\n' or , inside item which shouldn't be treated as end of row or as separtor between items. These items will be in " " which also can make problem to remove them. All these problem you can resolve using standard module csv.
Your code is pretty well and you are near goal:
with open(path, 'r+') as file:
results=[]
text = file.read()
#while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
Your Code:
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
So enjoy learning :)
One caveat is that the csv may not end with a blank line as this would result in an ugly tuple at the end of the list like ('',) (Which looks like a smiley)
To prevent this you have to check for empty lines: if line != '': after the for will do the trick.
I have a large 11 GB .txt file with email addresses. I would like to save only the strings till the # symbol among each other. My output only generate the first line.I have used this code of a earlier project. I would like to save the output in a different .txt file. I hope someone could help me out.
my code:
import re
def get_html_string(file,start_string,end_string):
answer="nothing"
with open(file, 'rb') as open_file:
for line in open_file:
line = line.rstrip()
if re.search(start_string, line) :
answer=line
break
start=answer.find(start_string)+len(start_string)
end=answer.find(end_string)
#print(start,end,answer)
return answer[start:end]
beginstr=''
end='#'
file='test.txt'
readstring=str(get_html_string(file,beginstr,end))
print readstring
Your file is quite big (11G) so you shouldn't keep all those strings in memory. Instead, process the file line by line and write the result before reading next line.
This should be efficient :
with open('test.txt', 'r') as input_file:
with open('result.txt', 'w') as output_file:
for line in input_file:
prefix = line.split('#')[0]
output_file.write(prefix + '\n')
If your file looks like this example:
user#google.com
user2#jshds.com
Useruser#jsnl.com
You can use this:
def get_email_name(file_name):
with open(file_name) as file:
lines = file.readlines()
result = list()
for line in lines:
result.append(line.split('#')[0])
return result
get_email_name('emails.txt')
Out:
['user', 'user2', 'Useruser']
I'd like to remove the first column from a file. The file contains 3 columns separated by space and the columns has the following titles:
X', 'Displacement' and 'Force' (Please see the image).
I have came up with the following code, but to my disappointment it doesn't work!
f = open("datafile.txt", 'w')
for line in f:
line = line.split()
del x[0]
f.close()
Any help is much appreciated !
Esan
First of all, you're attempting to read from a file (by iterating through the file contents) that is open for writing. This will give you an IOError.
Second, there is no variable named x in existence (you have not declared/set one in the script). This will generate a NameError.
Thirdly and finally, once you have finished (correctly) reading and editing the columns in your file, you will need to write the data back into the file.
To avoid loading a (potentially large) file into memory all at once, it is probably a good idea to read from one file (line by line) and write to a new file simultaneously.
Something like this might work:
f = open("datafile.txt", "r")
g = open("datafile_fixed.txt", "w")
for line in f:
if line.strip():
g.write("\t".join(line.split()[1:]) + "\n")
f.close()
g.close()
Some reading about python i/o might be helpful, but something like the following should get you on your feet:
with open("datafile.txt", "r") as fin:
with open("outputfile.txt", "w") as fout:
for line in fin:
line = line.split(' ')
if len(line) == 3:
del line[0]
fout.write(line[0] + ' ' + line[1])
else:
fout.write('\n')
EDIT: fixed to work with blank lines
print ''.join([' '.join(l.split()[1:]) for l in file('datafile.txt')])
or, if you want to preserve spaces and you know that the second column always starts at the, say, 10th character:
print ''.join([l[11:] for l in file('datafile.txt')])