I have a very simple script in python that runs a user defined function (hetero) that joins sequences (strings of text) together over very large files, 2 sequences (rows) at a time. Anyway, As I have it written, it prints out to the screen, but I would like to write all output to a single file.
f = open ("new", "r")
while True:
line1 = f.readline()
line1a = line1.split()
line2 = f.readline()
line2a =line2.split()
if not line2: break
tri="".join ([hetero(b1, b2) for (b1, b2) in zip(line1a[2], line2a[2])])
print line1a[1]+"_"+line1a[0],tri
This simply prints to the terminal the results of the script. So I tried to write the results (from the print command, "line1a[1]+.....") to an another file opened for writing (appended to the end of the script):
out_file = open ("out.txt", "w")
out_file.write(line1a[1]+"_"+line1a[0],tri)
out_file.close()
But of course it does not work. I don't understand why though...Do I need to open the file to write along with the file for reading, so that its outside teh While loop? The thing that is tricky is that the script reads in two lines at a time over the entire file, and prints the ID info and the sequence in a single line, each time -- I want to print all those results to a single file.
This is a simple fix I'm sure, but I don't use python that often and always find the file system i/o difficult to deal with.
Every time you open the file for writing it gets truncated. If you want to append, you can open it at the beginning and keep it open, or open in append mode instead (a instead of w).
Also, you should be using the with statement:
with open('new', 'r') as f, open('out.txt', 'w') as out:
while True:
...
That will call close automatically for you after the block ends.
You can also clean up your "read a pair of lines and split them" code. Instead of while True:
from itertools import izip
pairs = ((l1.split(), l2.split()) for l1, l2 in izip(f, f))
for line1a, line2a in pairs:
tri = ...
Note that you want to use izip instead of zip or it'll just read the whole file into memory right away.
Not sure where you put your out_file code but you likely put that in the loop and it opened and closed every pass. Try something like
with open('out.txt', 'w') as outfile, open("new", "r") as f:
while True:
line1 = f.readline()
line1a = line1.split()
line2 = f.readline()
line2a =line2.split()
if not line2: break
tri="".join ([hetero(b1, b2) for (b1, b2) in zip(line1a[2], line2a[2])])
#print line1a[1]+"_"+line1a[0],tri
out_file.write(line1a[1]+"_"+line1a[0],tri)
EDIT You'll notice I opened the file using a context, I am fan of this because you don't have to worry about closing it later and it seems clearer to me how long the file is open
You are using this code
out_file = open ("out.txt", "w")
out_file.write(line1a[1]+"_"+line1a[0],tri)
out_file.close()
at every iteration. Note the 'w' flag: this means you are opening again the file at each iteration and overwriting it from start. If you want instead to append to it you can use the flag 'a'.
But there is more: this code
out_file = open ("out.txt", "w")
[while ...]
out_file.close()
should be outside the while loop, since you only need to open and close this file once.
You can only open the file inside the loop if you open it like:
out_file = open ("out.txt", "a")
Notice the "a" for appending mode.
If you open it using "w" it will be overwritten every iteration of the loop.
You can check this Python files reference to learn more about.
Related
I want to load/read a text file and write it to two other text files "entirely". I will write other different data to the following of these two files later.
The problem is that the loaded file is only written to the first file, and no data from that loaded file is written to the second file.
The code I am using:
fin = open("File_Read", 'r')
fout1 = open("File_Write1", 'w')
fout2 = open("File_Write2", 'w')
fout1.write(fin.read())
fout2.write(fin.read()) #Nothing is written here!
fin.close()
fout1.close()
fout2.close()
What is happening and what is the solution?
I prefer using open instead of with open.
Thanks.
Apparently the fin.read() reads all the lines, the next fin.read() will continue from where the previous .read() ended (which is the last line). To solve this, I would simply go for:
text_fin = fin.read()
fout1.write(text_fin)
fout2.write(text_fin)
fin = open("test.txt", 'r')
data = fin.read()
fin.close()
fout1 = open("test2.txt", 'w')
fout1.write(data)
fout1.close()
fout2 = open("test3.txt", 'w')
fout2.write(data)
fout2.close()
N.B. with open is the safest and best way but at least you need to close the file as soon as there are not needed anymore.
You can try iterating through your original file line by line and appending it to both the files. You are running into the problem because file.write() method takes string argument.
fin = open("File_Read",'r')
fout1 = open("File_Write1",'a') #append permissions for line-by-line writing
fout2 = open("File_Write2",'a') #append permissions for line-by-line writing
for lines in fin:
fout1.write(lines)
fout2.write(lines)
fin.close()
fout1.close()
fout2.close()
*** NOTE: Not the most efficient solution.
I would like to make it so that it opens up alan.txt, search the text for all instance of scholary_tehologian and if found, add the word "test" under it. when I tried doing it this way:
## Script
with open('alan.txt', 'r+') as f:
for line in f:
if "scholarly_theologian" in line:
f.write("test")
it wouldn't write anything. I'm in Windows 8.1
You can't modify a file like this. You can only append to it, write characters instead of others, or rewrite it entirely. See How do I modify a text file in Python?.
What you should do is create another file with the content you want.
EDIT:
Claudio's answer has the code for what I offered. It has the benefit (over manicphase's code) of not keeping the whole file in memory. This is important if the file is long. manicphase's answer, on the other hand, has the benefit of not creating a second file. It rewrites the original one. Choose the one that fits your needs.
Rewritten answer because the last one was wrong.
If you want to read lines you have to put .readlines() after open(...) or f. Then there's a few ways you could insert "test".
## Script
with open('alan.txt', 'r') as f:
lines = f.readlines()
for i in range(len(lines)):
if "scholarly_theologian" in lines[i]:
lines[i] = lines[i] + "\ntest"
with open('alan.txt', 'w') as f:
f.write("\n".join(lines))
This should do the trick:
with open('output.txt', 'w') as o:
with open('alan.txt', 'r') as f:
for line in f:
o.write(line)
if line.find('scholarly_theoligian'):
o.write('test')
Like Ella Shar mentioned, you need to create a new file and add the new content into it.
If working with two files is not acceptable, the next step would be to delete the input file, and to rename the output file.
I am trying to remove duplicates of 3-column tab-delimited txt file, but as long as the first two columns are duplicates, then it should be removed even if the two has different 3rd column.
from operator import itemgetter
import sys
input = sys.argv[1]
output = sys.argv[2]
#Pass any column number you want, note that indexing starts at 0
ig = itemgetter(0,1)
seen = set()
data = []
for line in input.splitlines():
key = ig(line.split())
if key not in seen:
data.append(line)
seen.add(key)
file = open(output, "w")
file.write(data)
file.close()
First, I get error
key = ig(line.split())
IndexError: list index out of range
Also, I can't see how to save the result to output.txt
People say saving to output.txt is a really basic matter. But no tutorial helped.
I tried methods that use codec, those that use with, those that use file.write(data) and all didn't help.
I could learn MatLab quite easily. The online tutorial was fantastic and a series of Googling always helped a lot.
But I can't find a helpful tutorial of Python yet. This is obviously because I am a complete novice. For complete novices like me, what would be the best tutorial with 1) comprehensiveness AND 2) lots of examples 3) line by line explanation that dosen't leave any line without explanation?
And why is the above code causing error and not saving result?
I'm assuming since you assign input to the first command line argument with input = sys.argv[1] and output to the second, you intend those to be your input and output file names. But you're never opening any file for the input data, so you're callling .splitlines() on a file name, not on file contents.
Next, splitlines() is the wrong approach here anyway. To iterate over a file line-by-line, simply use for line in f, where f is an open file. Those lines will include the newline at the end of the line, so it needs to be stripped if it's not supposed to be part of the third columns data.
Then you're opening and closing the file inside your loop, which means you'll try to write the entire contents of data to the file every iteration, effectively overwriting any data written to the file before. Therefore I moved that block out of the loop.
It's good practice to use the with statement for opening files. with open(out_fn, "w") as outfile will open the file named out_fn and assign the open file to outfile, and close it for you as soon as you exit that indented block.
input is a builtin function in Python. I therefore renamed your variables so no builtin names get shadowed.
You're trying to directly write data to the output file. This won't work since data is a list of lines. You need to join those lines first in order to turn them in a single string again before writing it to a file.
So here's your code with all those issues addressed:
from operator import itemgetter
import sys
in_fn = sys.argv[1]
out_fn = sys.argv[2]
getkey = itemgetter(0, 1)
seen = set()
data = []
with open(in_fn, 'r') as infile:
for line in infile:
line = line.strip()
key = getkey(line.split())
if key not in seen:
data.append(line)
seen.add(key)
with open(out_fn, "w") as outfile:
outfile.write('\n'.join(data))
Why is the above code causing error?
Because you haven't opened the file, you are trying to work with the string input.txtrather than with the file. Then when you try to access your item, you get a list index out of range because line.split() returns ['input.txt'].
How to fix that: open the file and then work with it, not with its name.
For example, you can do (I tried to stay as close to your code as possible)
input = sys.argv[1]
infile = open(input, 'r')
(...)
lines = infile.readlines()
infile.close()
for line in lines:
(...)
Why is this not saving result?
Because you are opening/closing the file inside the loop. What you need to do is write the data once you're out of the loop. Also, you cannot write directly a list to a file. Hence, you need to do something like (outside of your loop):
outfile = open(output, "w")
for item in data:
outfile.write(item)
outfile.close()
All together
There are other ways of reading/writing files, and it is pretty well documented on the internet but I tried to stay close to your code so that you would understand better what was wrong with it
from operator import itemgetter
import sys
input = sys.argv[1]
infile = open(input, 'r')
output = sys.argv[2]
#Pass any column number you want, note that indexing starts at 0
ig = itemgetter(0,1)
seen = set()
data = []
lines = infile.readlines()
infile.close()
for line in lines:
print line
key = ig(line.split())
if key not in seen:
data.append(line)
seen.add(key)
print data
outfile = open(output, "w")
for item in data:
outfile.write(item)
outfile.close()
PS: it seems to produce the result that you needed there Python to remove duplicates using only some, not all, columns
Running this code creates file2.txt as expected, but the file is empty. (Note: file1.txt just has the lines of a poem.) Why does this happen? How can I get it to write array a2 to a text file?
import copy
#Open input file, read it into an array, and remove the every other line.
f = open('file1.txt','r')
a1 = f.readlines()
a2 = copy.deepcopy(a1)
f.close
for n in range(len(a1)):
if n%2 == 0:
a2.remove(a1[n])
# Open output file and write array into it.
fo = open('file2.txt','w')
fo.writelines(a2)
fo.close
you need a () after close:
fo.close()
Also consider using the with statement when working with files.
You do realise this is better written as:
from itertools import islice
with open('input') as fin, open('output','w') as fout:
every_other = islice(fin, None, None, 2)
fout.writelines(every_other)
Reasoning:
File isn't loaded into memory for no reason
islice can be used to create a generator for every other line
... which can then be passed to the output's .writelines()
the with statement (context manager) automatically closes the files afterwards
It's (IMHO) easier to read and understand what the intention is
As the comment said, your forgetting to close the file, so the buffer is never flushed.
replace
fo.close
with
fo.close()
'close' is a method -- ie. use fo.close() in place of fo.close
text_file = open("new.txt", "r")
lines = text_file.readlines()
for line in lines:
var1, var2 = line.split(",");
myfile = open('xyz.txt', 'w')
myfile.writelines(var1)
myfile.close()
text_file.close()
I have 10 lines of text in new.txt like Adam:8154,
George:5234, and so on. Now I want a text file which contains only the names. xyz.txt must contain Adam, George, and so on. The above code leaves me with the 10th name only.
How to have all the 10 names in a single text file?
That is because you are opening , writing and closing the file 10 times inside your for loop. Opening a file in w mode erases whatever was in the file previously, so every time you open the file, the contents written to it in previous iterations get erased.
myfile = open('xyz.txt', 'w')
myfile.writelines(var1)
myfile.close()
You should open and close your file outside for loop.
myfile = open('xyz.txt', 'w')
for line in lines:
var1, var2 = line.split(",");
myfile.write("%s\n" % var1)
myfile.close()
text_file.close()
You should also notice to use write and not writelines.
writelines writes a list of lines to your file.
Also you should check out the answers posted by folks here that uses with statement. That is the elegant way to do file read/write operations in Python
The main problem was that you were opening/closing files repeatedly inside your loop.
Try this approach:
with open('new.txt') as text_file, open('xyz.txt', 'w') as myfile:
for line in text_file:
var1, var2 = line.split(",");
myfile.write(var1+'\n')
We open both files at once and because we are using with they will be automatically closed when we are done (or an exception occurs). Previously your output file was repeatedly openend inside your loop.
We are also processing the file line-by-line, rather than reading all of it into memory at once (which can be a problem when you deal with really big files).
Note that write() doesn't append a newline ('\n') so you'll have to do that yourself if you need it (I replaced your writelines() with write() as you are writing a single item, not a list of items).
When opening a file for rread, the 'r' is optional since it's the default mode.
It's preferable to use context managers to close the files automatically
with open("new.txt", "r"), open('xyz.txt', 'w') as textfile, myfile:
for line in textfile:
var1, var2 = line.split(",");
myfile.writelines(var1)