write output with delimiter in python - python

I want to save my output as csv file with custom name and delimiter.
I tried this code but now works for me.
out = open('out.csv', 'w')
for row in l:
for column in row:
out.write('%d;' % column)
out.write('\n')
out.close()
Here is my data
100A7E54111FB143
100D11CF822BBBDB
1014120EE9CCB1E0
10276825CD5B4A26
10364F56076B46B7
103D1DDAD3064A66
103F4F66EEB54308
104310B0280E4F20
104E80752424B1C3
106BE9DBB186BEC5
10756F745D8A4123
107966C82D8BAD8
I want to save like this
input.csv
input_id data
number 107966C82D8BAD8 | 10756F745D8A4123 | 106BE9DBB186BEC5
The delmiter would be '|'.The data is in dtype:object
Any help will be appreciated.

Use a writer for the CSV object instead:
import csv
with open('out.csv', 'w', newline='') as out:
spamwriter = csv.writer(out, delimiter='|')
spamwriter.writerow(column)
I have omitted your for loop

One simple way is to use print of python 3.x, If you are using python 2.x then import print from future
from __future__ import print_function #Needed in python 2.x
print(value, ..., sep=' ', end='\n', file=sys.stdout)

You can use this snippet of code as an example. I did something similar in my program, creating a text file with a comma separator, no csv module involved. Not sure about newlines. I just used it for one line in my case...
cachef_h = [a,b,c,d]
f = open('cachef_h.txt', 'x')
f.write(cachef_h[0])
for column_headers in cachef_h[1:]:
f.write(',' + column_headers)

Related

Number formatting a CSV

I have developed a script that produces a CSV file. On inspection of the file, some cell's are being interpreted not the way I want..
E.g In my list in python, values that are '02e4' are being automatically formatted to be 2.00E+04.
table = [['aa02', 'fb4a82', '0a0009'], ['02e4, '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(0, len(table[i]):
ofile.write(table[i][j] + ",")
ofile.write("\n")
This gives me:
aa02 fb4a82 0a0009
2.00E+04 452ca2 0b0004
I've tried using the csv.writer instead where writer = csv.writer(ofile, ...)
and giving attributes from the lib (e.g csv.QUOTE_ALL)... but its the same output as before..
Is there a way using the CSV lib to automatically format all my values as strings before it's written?
Or is this not possible?
Thanks
Try setting the quoting parameter in your csv writer to csv.QUOTE_ALL.
See the doc for more info:
import csv
with open('myfile.csv', 'wb') as csvfile:
wtr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
wtr.writerow(...)
Although it sounds like the problem might lie with your csv viewer. Excel has a rather annoying habit of auto-formatting data like you describe.
If you want the '02e4' to show up in excel as "02e4" then annoyingly you have to write a csv with triple-double quotes: """02e4""". I don't know of a way to do this with the csv writer because it limits your quote character to a character. However, you can do something similar to your original attempt:
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(len(table[i])):
ofile.write('"""%s""",'%table[i][j])
ofile.write("\n")
If opened in a text editor your csv file will read:
"""aa02""","""fb4a82""","""0a0009""",
"""02e4""","""452ca2""","""0b0004""",
This produces the following result in Excel:
If you wanted to use any single character quotation you could use the csv module like so:
import csv
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
writer = csv.writer(ofile, delimiter=',', quotechar='|',quoting=csv.QUOTE_ALL)
for i in range(len(table)):
writer.writerow(table[i])
The output in the text editor will be:
|aa02|,|fb4a82|,|0a0009|
|02e4|,|452ca2|,|0b0004|
and Excel will show:

Python csv reader // how to ignore enclosing char (because sometimes it's missing)

I am trying to import csv data from files where sometimes the enclosing char " is missing.
So I have rows like this:
"ThinkPad";"2000.00";"EUR"
"MacBookPro";"2200.00;EUR"
# In the second row the closing " after 2200.00 is missing
# also the closing " before EUR" is missing
Now I am reading the csv data with this:
csv.reader(
codecs.open(filename, 'r', encoding='latin-1'),
delimiter=";",
dialect=csv.excel_tab)
And the data I get for the second row is this:
["MacBookPro", "2200.00;EUR"]
Aside from pre-processing my csv files with a unix command like sed and removing all closing chars " and relying on the semicolon to seperate the columns, what else can I do?
This might work:
import csv
import io
file = io.StringIO(u'''
"ThinkPad";"2000.00";"EUR"
"MacBookPro";"2200.00;EUR"
'''.strip())
reader = csv.reader((line.replace('"', '') for line in file), delimiter=';', quotechar='"')
for row in reader:
print(row)
The problem is that if there are any legitimate quoted line, e.g.
"MacBookPro;Awesome Edition";"2200.00";"EUR"
Or, worse:
"MacBookPro:
Description: Awesome Edition";"2200.00";"EUR"
Your output is going to produce too few/many columns. But if you know that's not a problem then it will work fine. You could pre-screen the file by adding this before the read part, which would give you the malformed line:
for line in file:
if line.count(';') != 2:
raise ValueError('No! This file has broken data on line {!r}'.format(line))
file.seek(0)
Or alternatively you could screen as you're reading:
for row in reader:
if any(';' in _ for _ in row):
print('Error:')
print(row)
Ultimately your best option is to fix whatever is producing your garbage csv file.
If you're looping through all the lines/rows of the file, you can use string's .replace() function to get rid off the quotes (if you don't need them later-on for other purposes.).
>>> import csv
>>> with open('eggs.csv', 'rb') as csvfile:
... my_file = csv.reader(codecs.open(filename, 'r', encoding='latin-1')
... delimiter=";",
... dialect=csv.excel_tab)
... )
... for row in my_file:
... (model,price,currency) = row
... model.replace('"','')
... price.replace('"','')
... currency.replace('"','')v
... print 'Model is: %s (costs %s%s).' % (model,price,currency)
>>>
Model is: MacBookPro (costs 2200.00EUR).

How to save output from python like tsv

I am using biopython package and I would like to save result like tsv file. This output from print to tsv.
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
print ("%s %s %s" % (record.id,record.seq, record.format("qual")))
Thank you.
My preferred solution is to use the CSV module. It's a standard module, so:
Somebody else has already done all the heavy lifting.
It allows you to leverage all the functionality of the CSV module.
You can be fairly confident it will function as expected (not always the case when I write it myself).
You're not going to have to reinvent the wheel, either when you write the file or when you read it back in on the other end (I don't know your record format, but if one of your records contains a TAB, CSV will escape it correctly for you).
It will be easier to support when the next person has to go in to update the code 5 years after you've left the company.
The following code snippet should do the trick for you:
#! /bin/env python3
import csv
with open('records.tsv', 'w', newline='') as tsvfile:
writer = csv.writer(tsvfile, delimiter='\t', lineterminator='\n')
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
writer.writerow([record.id, record.seq, record.format("qual")])
Note that this is for Python 3.x. If you're using 2.x, the open and writer = ... will be slightly different.
That is fairly simple , instead of printing it you need to write that to a file.
with open("records.tsv", "w") as record_file:
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
record_file.write("%s %s %s\n" % (record.id,record.seq, record.format("qual")))
And if you want to name the various columns in the file then you can use:
record_file.write("Record_Id Record_Seq Record_Qal\n")
So the complete code may look like:
with open("records.tsv", "w") as record_file:
record_file.write("Record_Id Record_Seq Record_Qal\n")
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
record_file.write(str(record.id)+" "+str(record.seq)+" "+ str(record.format("qual"))+"\n")
If you want to use the .tsv to label your word embeddings in TensorBoard, use the following snippet. It uses the CSV module (see Doug's answer).
# /bin/env python3
import csv
def save_vocabulary():
label_file = "word2context/labels.tsv"
with open(label_file, 'w', encoding='utf8', newline='') as tsv_file:
tsv_writer = csv.writer(tsv_file, delimiter='\t', lineterminator='\n')
tsv_writer.writerow(["Word", "Count"])
for word, count in word_count:
tsv_writer.writerow([word, count])
word_count is a list of tuples like this:
[('the', 222594), ('to', 61479), ('in', 52540), ('of', 48064) ... ]
The following snippet:
from __future__ import print_function
with open("output.tsv", "w") as f:
print ("%s\t%s\t%s" % ("asd", "sdf", "dfg"), file=f)
print ("%s\t%s\t%s" % ("sdf", "dfg", "fgh"), file=f)
Yields a file output.tsv containing
asd sdf dfg
sdf dfg fgh
So, in your case:
from __future__ import print_function
with open("output.tsv", "w") as f:
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
print ("%s %s %s" % (record.id,record.seq, record.format("qual")), file=f)
I prefer using join() in this type of code:
for record in SeqIO.parse("/home/fil/Desktop/420_2_03_074.fastq", "fastq"):
print ( '\t'.join((str(record.id), str(record.seq), str(record.format("qual"))) )
The 'tab' character is \t and the join function takes the (3) arguments and prints them with a tab in between.

Python: Read fields of CSV File with a list of list

i just wondering how i can read special field from a CVS File with next structure:
40.0070222,116.2968604,2008-10-28,[["route"], ["sublocality","political"]]
39.9759505,116.3272935,2008-10-29,[["route"], ["establishment"], ["sublocality", "political"]]
the way that on reading cvs files i used to work with:
with open('routes/stayedStoppoints', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
The problem with that is the first 3 fields no problem i can use:
for row in spamreader:
row[0],row[1],row[2] i can access without problem. but in the last field and i guess that with csv.reader(csvfile, delimiter=',', quotechar='"') split also for each sub-list:
so when i tried to access just show me:
[["route"]
Anyone has a solution to handle the last field has a full list ( list of list indeed)
[["route"], ["sublocality","political"]]
in order to can access to each category.
Thanks
Your format is close to json. You only need to wrap each line in brackets, and to quote the dates.
For each line l just do:
lst=json.loads(re.sub('([0-9]+-[0-9]+-[0-9]+)',r'"\1"','[%s]'%(l)))
results in lst being
[40.0070222, 116.2968604, u'2008-10-28', [[u'route'], [u'sublocality', u'political']]]
You need to import the json parser and regular expressions
import json
import re
edit: you asked how to access the element containing 'route'. the answer is
lst[3][0][0]
'political' is at
lst[3][1][1]
If the strings ('political' and others) may contain strings looking like dates, you should go with the solution by #unutbu
Use line.split(',', 3) to split on just the first 3 commas:
import json
with open(filename, 'rb') as csvfile:
for line in csvfile:
row = line.split(',', 3)
row[3] = json.loads(row[3])
print(row)
yields
['40.0070222', '116.2968604', '2008-10-28', [[u'route'], [u'sublocality', u'political']]]
['39.9759505', '116.3272935', '2008-10-29', [[u'route'], [u'establishment'], [u'sublocality', u'political']]]
That is not a valid CSV file. The csv module won't be able to read this.
If the line structure is always like this (two numbers, a date, and a nested list), you can do this:
import ast
result = []
with open('routes/stayedStoppoints') as infile:
for line in infile:
coord_x, coord_y, datestr, objstr = line.split(",", 3)
result.append([float(coord_x), float(coord_y),
datestr, ast.literal_eval(objstr)])
Result:
>>> result
[[40.0070222, 116.2968604, '2008-10-28', [['route'], ['sublocality', 'political']]],
[39.9759505, 116.3272935, '2008-10-29', [['route'], ['establishment'], ['sublocality', 'political']]]]

Read a CSV file and put double quote on each item in python

I'm really new to python and I have a simple question. I have a .csv file with the following content:
123,456,789
I want to read it and store it into a variable called "number" with the following format
"123","456","789"
So that when I do
print number
It will give the following output
"123","456","789"
Can anybody help?
Thanks!
Update:
The following is my code:
input = csv.reader(open('inputfile.csv', 'r'))
for item in input:
item = ['"' + item + '"' for item in item]
print item
It gave the following output:
['"123"', '"456"', '"789"']
Here's how to do it:
import csv
from io import StringIO
quotedData = StringIO()
with open('file.csv') as f:
reader = csv.reader(f)
writer = csv.writer(quotedData, quoting=csv.QUOTE_ALL)
for row in reader:
writer.writerow(row)
with reader=csv.reader(StringIO('1,2,3')) the output is:
print quotedData.getvalue()
"1","2","3"
Using the csv-module, you can read the .csv file line-by-line and process each element from a tuple you gain. You can then just enclose each element into double-quotes.
import csv
reader = csv.reader(open("file.csv"))
for line in reader:
# line is a tuple ...
If the whole file only contains numbers you can just open it as a regular file:
with open("file.csv") as f:
for line in f:
print ','.join('"{}"'.format(x) for x in line.rstrip().split(','))
It'd be better to append the lines to an array with append, tho. For example:
with open("file.csv") as f:
lines=[line.rstrip().split(',') for line in f]
There is a CSV module there that might help you as well.
import csv
spamReader = csv.reader(open('eggs.csv', 'rb'))
for row in spamReader:
this_row = ['"' + str(item) + '"' for item in row]
print this_row
import csv
csvr = csv.reader(open(<yourfile.csv>,'r'))
def gimenumbers():
for row in csvr:
yield '","'.join(row)

Categories