How to save output from python like tsv - python

I am using biopython package and I would like to save result like tsv file. This output from print to tsv.
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
print ("%s %s %s" % (record.id,record.seq, record.format("qual")))
Thank you.

My preferred solution is to use the CSV module. It's a standard module, so:
Somebody else has already done all the heavy lifting.
It allows you to leverage all the functionality of the CSV module.
You can be fairly confident it will function as expected (not always the case when I write it myself).
You're not going to have to reinvent the wheel, either when you write the file or when you read it back in on the other end (I don't know your record format, but if one of your records contains a TAB, CSV will escape it correctly for you).
It will be easier to support when the next person has to go in to update the code 5 years after you've left the company.
The following code snippet should do the trick for you:
#! /bin/env python3
import csv
with open('records.tsv', 'w', newline='') as tsvfile:
writer = csv.writer(tsvfile, delimiter='\t', lineterminator='\n')
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
writer.writerow([record.id, record.seq, record.format("qual")])
Note that this is for Python 3.x. If you're using 2.x, the open and writer = ... will be slightly different.

That is fairly simple , instead of printing it you need to write that to a file.
with open("records.tsv", "w") as record_file:
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
record_file.write("%s %s %s\n" % (record.id,record.seq, record.format("qual")))
And if you want to name the various columns in the file then you can use:
record_file.write("Record_Id Record_Seq Record_Qal\n")
So the complete code may look like:
with open("records.tsv", "w") as record_file:
record_file.write("Record_Id Record_Seq Record_Qal\n")
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
record_file.write(str(record.id)+" "+str(record.seq)+" "+ str(record.format("qual"))+"\n")

If you want to use the .tsv to label your word embeddings in TensorBoard, use the following snippet. It uses the CSV module (see Doug's answer).
# /bin/env python3
import csv
def save_vocabulary():
label_file = "word2context/labels.tsv"
with open(label_file, 'w', encoding='utf8', newline='') as tsv_file:
tsv_writer = csv.writer(tsv_file, delimiter='\t', lineterminator='\n')
tsv_writer.writerow(["Word", "Count"])
for word, count in word_count:
tsv_writer.writerow([word, count])
word_count is a list of tuples like this:
[('the', 222594), ('to', 61479), ('in', 52540), ('of', 48064) ... ]

The following snippet:
from __future__ import print_function
with open("output.tsv", "w") as f:
print ("%s\t%s\t%s" % ("asd", "sdf", "dfg"), file=f)
print ("%s\t%s\t%s" % ("sdf", "dfg", "fgh"), file=f)
Yields a file output.tsv containing
asd sdf dfg
sdf dfg fgh
So, in your case:
from __future__ import print_function
with open("output.tsv", "w") as f:
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
print ("%s %s %s" % (record.id,record.seq, record.format("qual")), file=f)

I prefer using join() in this type of code:
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
print ( '\t'.join((str(record.id), str(record.seq), str(record.format("qual"))) )
The 'tab' character is \t and the join function takes the (3) arguments and prints them with a tab in between.

Related

How to write a list to a file in Python? [duplicate]

How do I write a list to a file? writelines() doesn't insert newline characters, so I need to do:
f.writelines([f"{line}\n" for line in lines])
Use a loop:
with open('your_file.txt', 'w') as f:
for line in lines:
f.write(f"{line}\n")
For Python <3.6:
with open('your_file.txt', 'w') as f:
for line in lines:
f.write("%s\n" % line)
For Python 2, one may also use:
with open('your_file.txt', 'w') as f:
for line in lines:
print >> f, line
If you're keen on a single function call, at least remove the square brackets [], so that the strings to be printed get made one at a time (a genexp rather than a listcomp) -- no reason to take up all the memory required to materialize the whole list of strings.
What are you going to do with the file? Does this file exist for humans, or other programs with clear interoperability requirements?
If you are just trying to serialize a list to disk for later use by the same python app, you should be pickleing the list.
import pickle
with open('outfile', 'wb') as fp:
pickle.dump(itemlist, fp)
To read it back:
with open ('outfile', 'rb') as fp:
itemlist = pickle.load(fp)
Simpler is:
with open("outfile", "w") as outfile:
outfile.write("\n".join(itemlist))
To ensure that all items in the item list are strings, use a generator expression:
with open("outfile", "w") as outfile:
outfile.write("\n".join(str(item) for item in itemlist))
Remember that itemlist takes up memory, so take care about the memory consumption.
Using Python 3 and Python 2.6+ syntax:
with open(filepath, 'w') as file_handler:
for item in the_list:
file_handler.write("{}\n".format(item))
This is platform-independent. It also terminates the final line with a newline character, which is a UNIX best practice.
Starting with Python 3.6, "{}\n".format(item) can be replaced with an f-string: f"{item}\n".
Yet another way. Serialize to json using simplejson (included as json in python 2.6):
>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()
If you examine output.txt:
[1, 2, 3, 4]
This is useful because the syntax is pythonic, it's human readable, and it can be read by other programs in other languages.
I thought it would be interesting to explore the benefits of using a genexp, so here's my take.
The example in the question uses square brackets to create a temporary list, and so is equivalent to:
file.writelines( list( "%s\n" % item for item in list ) )
Which needlessly constructs a temporary list of all the lines that will be written out, this may consume significant amounts of memory depending on the size of your list and how verbose the output of str(item) is.
Drop the square brackets (equivalent to removing the wrapping list() call above) will instead pass a temporary generator to file.writelines():
file.writelines( "%s\n" % item for item in list )
This generator will create newline-terminated representation of your item objects on-demand (i.e. as they are written out). This is nice for a couple of reasons:
Memory overheads are small, even for very large lists
If str(item) is slow there's visible progress in the file as each item is processed
This avoids memory issues, such as:
In [1]: import os
In [2]: f = file(os.devnull, "w")
In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop
In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.
Traceback (most recent call last):
...
MemoryError
(I triggered this error by limiting Python's max. virtual memory to ~100MB with ulimit -v 102400).
Putting memory usage to one side, this method isn't actually any faster than the original:
In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop
In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop
(Python 2.6.2 on Linux)
Because i'm lazy....
import json
a = [1,2,3]
with open('test.txt', 'w') as f:
f.write(json.dumps(a))
#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
a = json.loads(f.read())
Serialize list into text file with comma sepparated value
mylist = dir()
with open('filename.txt','w') as f:
f.write( ','.join( mylist ) )
In Python 3 you can use print and * for argument unpacking:
with open("fout.txt", "w") as fout:
print(*my_list, sep="\n", file=fout)
Simply:
with open("text.txt", 'w') as file:
file.write('\n'.join(yourList))
In General
Following is the syntax for writelines() method
fileObject.writelines( sequence )
Example
#!/usr/bin/python
# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]
# Write sequence of lines at the end of the file.
line = fo.writelines( seq )
# Close opend file
fo.close()
Reference
http://www.tutorialspoint.com/python/file_writelines.htm
file.write('\n'.join(list))
Using numpy.savetxt is also an option:
import numpy as np
np.savetxt('list.txt', list, delimiter="\n", fmt="%s")
You can also use the print function if you're on python3 as follows.
f = open("myfile.txt","wb")
print(mylist, file=f)
with open ("test.txt","w")as fp:
for line in list12:
fp.write(line+"\n")
Why don't you try
file.write(str(list))
I recently found Path to be useful. Helps me get around having to with open('file') as f and then writing to the file. Hope this becomes useful to someone :).
from pathlib import Path
import json
a = [[1,2,3],[4,5,6]]
# write
Path("file.json").write_text(json.dumps(a))
# read
json.loads(Path("file.json").read_text())
You can also go through following:
Example:
my_list=[1,2,3,4,5,"abc","def"]
with open('your_file.txt', 'w') as file:
for item in my_list:
file.write("%s\n" % item)
Output:
In your_file.txt items are saved like:
1
2
3
4
5
abc
def
Your script also saves as above.
Otherwise, you can use pickle
import pickle
my_list=[1,2,3,4,5,"abc","def"]
#to write
with open('your_file.txt', 'wb') as file:
pickle.dump(my_list, file)
#to read
with open ('your_file.txt', 'rb') as file:
Outlist = pickle.load(file)
print(Outlist)
Output:
[1, 2, 3, 4, 5, 'abc', 'def']
It save dump the list same as a list when we load it we able to read.
Also by simplejson possible same as above output
import simplejson as sj
my_list=[1,2,3,4,5,"abc","def"]
#To write
with open('your_file.txt', 'w') as file:
sj.dump(my_list, file)
#To save
with open('your_file.txt', 'r') as file:
mlist=sj.load(file)
print(mlist)
This logic will first convert the items in list to string(str). Sometimes the list contains a tuple like
alist = [(i12,tiger),
(113,lion)]
This logic will write to file each tuple in a new line. We can later use eval while loading each tuple when reading the file:
outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence: # iterate over the list items
outfile.write(str(item) + '\n') # write to the file
outfile.close() # close the file
Another way of iterating and adding newline:
for item in items:
filewriter.write(f"{item}" + "\n")
In Python3 You Can use this loop
with open('your_file.txt', 'w') as f:
for item in list:
f.print("", item)
Redirecting stdout to a file might also be useful for this purpose:
from contextlib import redirect_stdout
with open('test.txt', 'w') as f:
with redirect_stdout(f):
for i in range(mylst.size):
print(mylst[i])
i suggest this solution .
with open('your_file.txt', 'w') as f:
list(map(lambda item : f.write("%s\n" % item),my_list))
Let avg be the list, then:
In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')
You can use %e or %s depending on your requirement.
i think you are looking for an answer like this.
f = open('output.txt','w')
list = [3, 15.2123, 118.3432, 98.2276, 118.0043]
f.write('a= {:>3d}, b= {:>8.4f}, c= {:>8.4f}, d= {:>8.4f}, e=
{:>8.4f}\n'.format(*list))
f.close()
poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file
How It Works:
First, open a file by using the built-in open function and specifying the name of
the file and the mode in which we want to open the file. The mode can be a
read mode (’r’), write mode (’w’) or append mode (’a’). We can also specify
whether we are reading, writing, or appending in text mode (’t’) or binary
mode (’b’). There are actually many more modes available and help(open)
will give you more details about them. By default, open() considers the file to
be a ’t’ext file and opens it in ’r’ead mode.
In our example, we first open the file in write text mode and use the write
method of the file object to write to the file and then we finally close the file.
The above example is from the book "A Byte of Python" by Swaroop C H.
swaroopch.com

Python csv reader // how to ignore enclosing char (because sometimes it's missing)

I am trying to import csv data from files where sometimes the enclosing char " is missing.
So I have rows like this:
"ThinkPad";"2000.00";"EUR"
"MacBookPro";"2200.00;EUR"
# In the second row the closing " after 2200.00 is missing
# also the closing " before EUR" is missing
Now I am reading the csv data with this:
csv.reader(
codecs.open(filename, 'r', encoding='latin-1'),
delimiter=";",
dialect=csv.excel_tab)
And the data I get for the second row is this:
["MacBookPro", "2200.00;EUR"]
Aside from pre-processing my csv files with a unix command like sed and removing all closing chars " and relying on the semicolon to seperate the columns, what else can I do?
This might work:
import csv
import io
file = io.StringIO(u'''
"ThinkPad";"2000.00";"EUR"
"MacBookPro";"2200.00;EUR"
'''.strip())
reader = csv.reader((line.replace('"', '') for line in file), delimiter=';', quotechar='"')
for row in reader:
print(row)
The problem is that if there are any legitimate quoted line, e.g.
"MacBookPro;Awesome Edition";"2200.00";"EUR"
Or, worse:
"MacBookPro:
Description: Awesome Edition";"2200.00";"EUR"
Your output is going to produce too few/many columns. But if you know that's not a problem then it will work fine. You could pre-screen the file by adding this before the read part, which would give you the malformed line:
for line in file:
if line.count(';') != 2:
raise ValueError('No! This file has broken data on line {!r}'.format(line))
file.seek(0)
Or alternatively you could screen as you're reading:
for row in reader:
if any(';' in _ for _ in row):
print('Error:')
print(row)
Ultimately your best option is to fix whatever is producing your garbage csv file.
If you're looping through all the lines/rows of the file, you can use string's .replace() function to get rid off the quotes (if you don't need them later-on for other purposes.).
>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... my_file = csv.reader(codecs.open(filename, 'r', encoding='latin-1')
... delimiter=";",
... dialect=csv.excel_tab)
... )
... for row in my_file:
... (model,price,currency) = row
... model.replace('"','')
... price.replace('"','')
... currency.replace('"','')v
... print 'Model is: %s (costs %s%s).' % (model,price,currency)
>>>
Model is: MacBookPro (costs 2200.00EUR).

write output with delimiter in python

I want to save my output as csv file with custom name and delimiter.
I tried this code but now works for me.
out = open('out.csv', 'w')
for row in l:
for column in row:
out.write('%d;' % column)
out.write('\n')
out.close()
Here is my data
100A7E54111FB143
100D11CF822BBBDB
1014120EE9CCB1E0
10276825CD5B4A26
10364F56076B46B7
103D1DDAD3064A66
103F4F66EEB54308
104310B0280E4F20
104E80752424B1C3
106BE9DBB186BEC5
10756F745D8A4123
107966C82D8BAD8
I want to save like this
input.csv
input_id data
number 107966C82D8BAD8 | 10756F745D8A4123 | 106BE9DBB186BEC5
The delmiter would be '|'.The data is in dtype:object
Any help will be appreciated.
Use a writer for the CSV object instead:
import csv
with open('out.csv', 'w', newline='') as out:
spamwriter = csv.writer(out, delimiter='|')
spamwriter.writerow(column)
I have omitted your for loop
One simple way is to use print of python 3.x, If you are using python 2.x then import print from future
from __future__ import print_function #Needed in python 2.x
print(value, ..., sep=' ', end='\n', file=sys.stdout)
You can use this snippet of code as an example. I did something similar in my program, creating a text file with a comma separator, no csv module involved. Not sure about newlines. I just used it for one line in my case...
cachef_h = [a,b,c,d]
f = open('cachef_h.txt', 'x')
f.write(cachef_h[0])
for column_headers in cachef_h[1:]:
f.write(',' + column_headers)

Printing a random csv cell with string and inserted variable in Python 2

I am a python noob, so please take it easy on me.
I have an example csv file (actual csv file has 20 rows and 2 columns similar to what is shown below):
"I hate %s" % x, "I am a %s" % x
"I heart %s" % x, "I am not a %s" % x
My python 2.7 script:
from csv import *
x = "gorillas"
with open('csv_test.csv', 'rU') as csvfile:
spamreader = reader(csvfile, quoting = QUOTE_MINIMAL)
list = []
for row in spamreader:
list.append(row[0])
print list[1]
csvfile.close()
I would like my script to print:
I heart gorillas
instead it is printing:
"I heart %s" % x
So, my variable, x, is not being inserted into my string. I assume that my problem is that when I pull the contents of the cell from my csv, the whole cell content is considered a string. However, I do not know how to fix this issue.
As a bonus or follow-up, I would also like to be selecting a random cell from my csv file.
Thanks for the help.
One option could be to let the csv file be interpreted as a string format code line.
That would require you to interpret the string as a part of the script.
You can do this with eval()
print eval(list[1])
ought to do it.
Depending on you application, eval can be useful, but generally I would not recommend reading an input from somewhere, and then run the contents using eval.
consider the thought experiment where the text is posted by a user on a website.
If the post contains valid python code, they have just gained access to running their own scripts on your machine.
Instead, you could replace parts of the string.
and then loose the '% x' format specifier
from csv import *
x = "gorillas"
with open('csv_test.csv', 'rU') as csvfile:
spamreader = reader(csvfile, quoting = QUOTE_MINIMAL)
list = []
for row in spamreader:
list.append(row[0])
print list[1].replace("%s", x)
csvfile.close()
you can do that using eval :
from csv import *
x = "gorillas"
with open('csv_test.csv', 'rU') as csvfile:
spamreader = reader(csvfile, quoting = QUOTE_MINIMAL)
list = []
for row in spamreader:
list.append(eval(row[0]))
print list[1]
csvfile.close()
There are two issues I saw when running the code -
You are not setting the quoting to QUOTE_NONE in your reader which causes the csv reader to not replace the double double quotes (atleast it didn't in my machine) , with QUOTE_MINIMAL (in my machine , both python 3.4 and python 2.7) it was replacing the double quotes complete and in the list there was no double quotes.
When printing the strings, you have to evaluate the expression and print it = using eval() function.
The code would look something like -
>>> with open('csv_test.csv', 'rU') as csvfile:
... spamreader = reader(csvfile, quoting = QUOTE_NONE)
... list = []
... for row in spamreader:
... list.append(row[0])
...
>>> print(eval(list[1]))
I heart gorillas
I am just using | since it does not occur in your csv example, you can use anything.
Please note its not good practice to keep such statements in a csv and then evaluating them in your function using eval because they can write anything into the csv, and can cause potential issues for your program.

Nested with blocks in Python, level of nesting variable

I would like to combine columns of various csv files into one csv file, with a new heading, concatenated horizontally. I want to only select certain columns,chosen by heading. There are different columns in each of the files to be combined.
Example input:
freestream.csv:
static pressure,static temperature,relative Mach number
1.01e5,288,5.00e-02
fan.csv:
static pressure,static temperature,mass flow
0.9e5,301,72.9
exhaust.csv:
static pressure,static temperature,mass flow
1.7e5,432,73.1
Desired output:
combined.csv:
P_amb,M0,Ps_fan,W_fan,W_exh
1.01e5,5.00e-02,0.9e6,72.9,73.1
Possible call to the function:
reorder_multiple_CSVs(["freestream.csv","fan.csv","exhaust.csv"],
"combined.csv",["static pressure,relative Mach number",
"static pressure,mass flow","mass flow"],
"P_amb,M0,Ps_fan,W_fan,W_exh")
Here is a previous version of the code, with only one input file allowed. I wrote this with help from write CSV columns out in a different order in Python:
def reorder_CSV(infilename,outfilename,oldheadings,newheadings):
with open(infilename) as infile:
with open(outfilename,'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in oldheadings.split(",")]
reorderfunc = operator.itemgetter(*writeindices)
writer.writerow(newheadings.split(","))
for row in reader:
towrite = reorderfunc(row)
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)
So what I have figure out, in order to adapt this to multiple files, is:
-I need infilename, oldheadings, and newheadings to be a list now (all of the same length)
-I need to iterate over the list of input files to make a list of readers
-readnames can also be a list, iterating over the readers
-which means I can make name2index a list of dictionaries
One thing I don't know how to do, is use the keyword with, nested n-levels deep, when n is known only at run time. I read this: How can I open multiple files using "with open" in Python? but that seems to only work when you know how many files you need to open.
Or is there a better way to do this?
I am quite new to python so I appreciate any tips you can give me.
I am only replying to the part about opening multiple files with with, where the number of files is unknown before. It shouldn't be too hard to write your own contextmanager, something like this (completely untested):
from contextlib import contextmanager
#contextmanager
def open_many_files(filenames):
files = [open(filename) for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
Which you would use like this:
innames = ['file1.csv', 'file2.csv', 'file3.csv']
outname = 'out.csv'
with open_many(innames) as infiles, open(outname, 'w') as outfile:
for infile in infiles:
do_stuff(in_file)
There is also a function that does something similar, but it is deprecated.
I am not sure if this is the correct way to do this, but I wanted to expand on Bas Swinckels answer. He had a couple small inconsistencies in his very helpful answer and I wanted to give the correect code.
Here is what I did, and it worked.
from contextlib import contextmanager
import csv
import operator
import itertools as IT
#contextmanager
def open_many_files(filenames):
files=[open(filename,'r') for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
def reorder_multiple_CSV(infilenames,outfilename,oldheadings,newheadings):
with open_many_files(filter(None,infilenames.split(','))) as handles:
with open(outfilename,'w') as outfile:
readers=[csv.reader(f) for f in handles]
writer = csv.writer(outfile)
reorderfunc=[]
for i, reader in enumerate(readers):
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in filter(None,oldheadings[i].split(","))]
reorderfunc.append(operator.itemgetter(*writeindices))
writer.writerow(filter(None,newheadings.split(",")))
for rows in IT.izip_longest(*readers,fillvalue=['']*2):
towrite=[]
for i, row in enumerate(rows):
towrite.extend(reorderfunc[i](row))
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)

Categories