Number formatting a CSV - python

I have developed a script that produces a CSV file. On inspection of the file, some cell's are being interpreted not the way I want..
E.g In my list in python, values that are '02e4' are being automatically formatted to be 2.00E+04.
table = [['aa02', 'fb4a82', '0a0009'], ['02e4, '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(0, len(table[i]):
ofile.write(table[i][j] + ",")
ofile.write("\n")
This gives me:
aa02 fb4a82 0a0009
2.00E+04 452ca2 0b0004
I've tried using the csv.writer instead where writer = csv.writer(ofile, ...)
and giving attributes from the lib (e.g csv.QUOTE_ALL)... but its the same output as before..
Is there a way using the CSV lib to automatically format all my values as strings before it's written?
Or is this not possible?
Thanks

Try setting the quoting parameter in your csv writer to csv.QUOTE_ALL.
See the doc for more info:
import csv
with open('myfile.csv', 'wb') as csvfile:
wtr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)
wtr.writerow(...)
Although it sounds like the problem might lie with your csv viewer. Excel has a rather annoying habit of auto-formatting data like you describe.

If you want the '02e4' to show up in excel as "02e4" then annoyingly you have to write a csv with triple-double quotes: """02e4""". I don't know of a way to do this with the csv writer because it limits your quote character to a character. However, you can do something similar to your original attempt:
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
for i in range(0, len(table)):
for j in range(len(table[i])):
ofile.write('"""%s""",'%table[i][j])
ofile.write("\n")
If opened in a text editor your csv file will read:
"""aa02""","""fb4a82""","""0a0009""",
"""02e4""","""452ca2""","""0b0004""",
This produces the following result in Excel:
If you wanted to use any single character quotation you could use the csv module like so:
import csv
table = [['aa02', 'fb4a82', '0a0009'], ['02e4', '452ca2', '0b0004']]
ofile = open('test.csv', 'wb')
writer = csv.writer(ofile, delimiter=',', quotechar='|',quoting=csv.QUOTE_ALL)
for i in range(len(table)):
writer.writerow(table[i])
The output in the text editor will be:
|aa02|,|fb4a82|,|0a0009|
|02e4|,|452ca2|,|0b0004|
and Excel will show:

Related

decoding breaks lines into characters in python 3

I am reading a CSV file through samba share. My CSV file format
hello;world
1;2;
Python code
import urllib
from smb.SMBHandler import SMBHandler
PATH = 'smb://myusername:mypassword#192.168.1.200/myDir/'
opener = urllib.request.build_opener(SMBHandler)
fh = opener.open(PATH + 'myFileName')
data = fh.read().decode('utf-8')
print(data) // This prints the data right
csvfile = csv.reader(data, delimiter=';')
for myrow in csvfile:
print(myrow) // This just prints ['h']. however it should print(hello;world)
break
fh.close()
The problem is that after decoding to utf-8, the rows are not the actual lines in the file
Desired output of a row after reading the file: hello;world
Current output of a row after reading the file: h
Any help is appreciated.
csv.reader takes an iterable that returns lines. Strings, when iterated, yield characters. The fix is simple:
csvfile = csv.reader(data.splitlines(), delimiter=';')

Python how to get the tweet data using specific word in csv file and put it in new csv file

I have data twitter in a CSV file (that I'm mining with a Python API). I get around 1000 lines of data. Now I want to shorten the tweet data using the specific Indonesian words “macet” or “kecelakaan” (in English “traffic” or “accident”) and put the matching rows into a new separate CSV file, just like in Excel using find all.
The sample data twitter is example1.csv and the new file which will be created after the search of the word "macet" or "kecelakaan" is example2.csv. But there is no result.
import re
import csv
with open('example1.csv', 'r') as csvFile:
reader = csv.reader(csvFile)
if re.search(r'macet', reader):
for row in reader:
myData = list(row)
print(row)
newFile = open('example2.csv', 'w')
with newFile:
writer = csv.writer(newFile)
writer.writerows(myData)
print("Writing complete")
I use spyder for environment Python 3.6.
The CSV file is already in the same folder with Spyder. Here is the screen capture image of my CSV twitter data
myCSVtwitterData
updated : Sample of csv file. OS using : Windows
There are a couple of problems with your code.
In your reading loop you are passing a csv.reader object to re.search, but it doesn't know how to search that object. You need to pass it text or byte strings.
The line
myData = list(row)
converts row into a new list and saves it to myData, but it's already a list, so no conversion is necessary. And that line replaces the previous contents of myData, but you actually want to save all the matching rows. However, there's no need to save the rows, you can just write them to the new file as you go.
Anyway, here's a repaired version of your code. From the screen shot it looks like you only want to search the text in column 2 of the input data (which corresponds to column C in your spreadsheet). I've created a regex that searches for the whole words "macet" and "kecelakaan", the "\b" matches at word boundaries so we don't get a match if "macet" or "kecelakaan" is part of a larger word.
import re
import csv
# Make a case-insensitive regex to match the words "macet" or "kecelakaan"
pattern = re.compile(r'\bmacet\b|\bkecelakaan\b', re.I)
with open('example1.csv', 'r', newline='') as csvFile, open('example2.csv', 'w', newline='') as newFile:
reader = csv.reader(csvFile)
writer = csv.writer(newFile)
for row in reader:
# Skip empty rows
if not row:
continue
if pattern.search(row[2]):
print(row)
writer.writerow(row)
print("Writing complete")
I've just made a couple of improvements to that code. It now uses the newline='' arg to open the CSV files, and it skips any empty lines in the input CSV. And the regex now ignores the case when looking for matching words.
Not answering about Python. But if you have a Linux OS, you can do it in one command line :
grep -i "macet" exemple1.csv > exemple2.csv
-i is for ignore case, so it will also match "Macet"
how is it~?
this code visit rows one by one
and find cells that contain a word in word_list
and write the value list on the row
import re
import csv
word_list = ['macet', 'kecelakaan']
with open('example1.csv', 'r') as csvFile, open('example2.csv', 'w') as newFile:
reader = csv.reader(csvFile)
writer = csv.writer(newFile, lineterminator='\n')
for row in reader:
new_row = [content for content in row if any(map(lambda word: word in content, word_list))]
if(new_row != []):
print(new_row)
writer.writerow(new_row)
print("Writing complete")

Python csv write a list to file

I am writing a script to write a list with tab separated as below to a csv file. But i am not getting proper output on this.
out_l = ['host\tuptime\tnfsserver\tnfs status\n', 'node1\t2\tnfs_host\tok\n', 'node2\t100\tnfs_host\tna\n', 'node3\t59\tnfs_host\tok\n']
code:
out_f = open('test.csv', 'w')
w = csv.writer(out_f)
for l in out_l:
w.writerow(l)
out_f.close()
The output csv file reads as below.
h,o,s,t, ,s,s,h, , , , , ,s,u,d,o,_,h,o,s,t, , , , , , , ,n,f,s,"
"1,9,2,.,1,6,8,.,1,2,2,.,2,0,1, ,o,k, ,n,f,s,h,o,s,t, ,o,k,"
"1,9,2,.,1,6,8,.,1,2,2,.,2,0,2, ,f,a,i,l,e,d, ,n,a, ,n,a,"
"1,9,2,.,1,6,8,.,1,2,2,.,2,0,3, ,o,k, ,n,f,s,h,o,s,t, ,s,h,o,w,m,o,u,n,t, ,f,a,i,l,e,d,"
"
Also I have checked the csv.writer option like delimiter, dialect=excel, but no luck.
Can some one help to format the output?
With the formatting you have in out_l, you can just write it to a file:
out_l = ['host\tuptime\tnfsserver\tnfs status\n', 'node1\t2\tnfs_host\tok\n', 'node2\t100\tnfs_host\tna\n', 'node3\t59\tnfs_host\tok\n']
with open('test.csv', 'w') as out_f:
for l in out_l:
out_f.write(l)
To properly use csv, out_l should just be lists of the columns and let the csv module do the formatting with tabs and newlines:
import csv
out_l = [['host','uptime','nfsserver','nfs status'],
['node1','2','nfs_host','ok'],
['node2','100','nfs_host','na'],
['node3','59','nfs_host','ok']]
#with open('test.csv', 'wb') as out_f: # Python 2
with open('test.csv', 'w', newline='') as out_f: # Python 3
w = csv.writer(out_f, delimiter='\t') # override for tab delimiter
w.writerows(out_l) # writerows (plural) doesn't need for loop
Note that with will automatically close the file.
See the csv documentation for the correct way to open a file for use with csv.reader or csv.writer.
The csv.Writer.writerow method takes an iterable and writes the values said iterable produces into the csv fields separated by the specified delimeter:
out_f = open('test.csv', 'w')
w = csv.writer(out_f, delimiter='\t') # set tab as delimiter
for l in out_l: # l is string (iterable of chars!)
w.writerow(l.split('\t')) # split to get the correct tokens
out_f.close()
As the strings in your list already contain the necessary tabs, you could just write them directly to the file, no csv tools needed. If you have built/joined the strings in out_l manually, you can omit that step and just pass the original data structure to writerow.
The delimiter parameter
The delimiter parameter controls the delimiter in the output. It has nothing to do with the input out_l.
Why your output is garbled
csv.writer.writerow iterates the input. In your case you are giving it a string (host\tuptime\tnfsserver\tnfs status\n', etc.), therefore the function iterates the string, giving you a sequence of chars.
How to produce the correct output
Give it a list of fields instead of the full string by using str.split(). In your case the string ends with \n, so use str.strip() as well:
import csv
out_l = ['host\tuptime\tnfsserver\tnfs status\n',
'node1\t2\tnfs_host\tok\n',
'node2\t100\tnfs_host\tna\n',
'node3\t59\tnfs_host\tok\n']
out_f = open('test.csv', 'w')
w = csv.writer(out_f)
for l in out_l:
w.writerow(l.strip().split('\t'))
out_f.close()
This should be what you want:
host,uptime,nfsserver,nfs status
node1,2,nfs_host,ok
node2,100,nfs_host,na
node3,59,nfs_host,ok
Reference: https://docs.python.org/3/library/csv.html
Very simple:
with open("test.csv" , 'w') as csv_file:
writer = csv.writer(csv_file, delemeter='\t')
for item in out_l:
writer.writerow([item,])

Python: writing an entire row to a CSV file. Why does it work this way?

I had exported a csv from Nokia Suite.
"sms","SENT","","+12345678901","","2015.01.07 23:06","","Text"
Reading from the PythonDoc, I tried
import csv
with open(sourcefile,'r', encoding = 'utf8') as f:
reader = csv.reader(f, delimiter = ',')
for line in reader:
# write entire csv row
with open(filename,'a', encoding = 'utf8', newline='') as t:
a = csv.writer(t, delimiter = ',')
a.writerows(line)
It didn't work, until I put brackets around 'line' as so i.e. [line].
So at the last part I had
a.writerows([line])
Why is that so?
The writerows method accepts a container object. The line object isn't a container. [line] turns it into a list with one item in it.
What you probably want to use instead is writerow.

Python: Read fields of CSV File with a list of list

i just wondering how i can read special field from a CVS File with next structure:
40.0070222,116.2968604,2008-10-28,[["route"], ["sublocality","political"]]
39.9759505,116.3272935,2008-10-29,[["route"], ["establishment"], ["sublocality", "political"]]
the way that on reading cvs files i used to work with:
with open('routes/stayedStoppoints', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
The problem with that is the first 3 fields no problem i can use:
for row in spamreader:
row[0],row[1],row[2] i can access without problem. but in the last field and i guess that with csv.reader(csvfile, delimiter=',', quotechar='"') split also for each sub-list:
so when i tried to access just show me:
[["route"]
Anyone has a solution to handle the last field has a full list ( list of list indeed)
[["route"], ["sublocality","political"]]
in order to can access to each category.
Thanks
Your format is close to json. You only need to wrap each line in brackets, and to quote the dates.
For each line l just do:
lst=json.loads(re.sub('([0-9]+-[0-9]+-[0-9]+)',r'"\1"','[%s]'%(l)))
results in lst being
[40.0070222, 116.2968604, u'2008-10-28', [[u'route'], [u'sublocality', u'political']]]
You need to import the json parser and regular expressions
import json
import re
edit: you asked how to access the element containing 'route'. the answer is
lst[3][0][0]
'political' is at
lst[3][1][1]
If the strings ('political' and others) may contain strings looking like dates, you should go with the solution by #unutbu
Use line.split(',', 3) to split on just the first 3 commas:
import json
with open(filename, 'rb') as csvfile:
for line in csvfile:
row = line.split(',', 3)
row[3] = json.loads(row[3])
print(row)
yields
['40.0070222', '116.2968604', '2008-10-28', [[u'route'], [u'sublocality', u'political']]]
['39.9759505', '116.3272935', '2008-10-29', [[u'route'], [u'establishment'], [u'sublocality', u'political']]]
That is not a valid CSV file. The csv module won't be able to read this.
If the line structure is always like this (two numbers, a date, and a nested list), you can do this:
import ast
result = []
with open('routes/stayedStoppoints') as infile:
for line in infile:
coord_x, coord_y, datestr, objstr = line.split(",", 3)
result.append([float(coord_x), float(coord_y),
datestr, ast.literal_eval(objstr)])
Result:
>>> result
[[40.0070222, 116.2968604, '2008-10-28', [['route'], ['sublocality', 'political']]],
[39.9759505, 116.3272935, '2008-10-29', [['route'], ['establishment'], ['sublocality', 'political']]]]

Categories