I have a csv file I wish to apply a regex replacement to with python.
So far I have the following
reader = csv.reader(open('ffrk_inventory_relics.csv', 'r'))
writer = csv.writer(open('outfile.csv','w'))
for row in reader:
reader = re.sub(r'\+','z',reader)
Which is giving me the following error:
Script error: Traceback (most recent call last):
File "ffrk_inventory_tracker_v1.6.py", line 22, in response
getRelics(data['equipments'], 'ffrk_inventory_relics')
File "ffrk_inventory_tracker_v1.6.py", line 72, in getRelics
reader = re.sub(r'\+','z',reader)
File "c:\users\baconcatbug\appdata\local\programs\python\python36\lib\re.py",
line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
After googling to not much luck, I would like to ask the community here how to open the csv file correctly so I can use re.sub on it and then write out the altered csv file back to the same filename.
csv.reader(open('ffrk_inventory_relics.csv', 'r')) is creating a list of lists, and when you iterate over it and pass each value to re.sub, you are passing a list, not a string. Try this:
import re
import csv
final_data = [[re.sub('\+', 'z', b) for b in i] for i in csv.reader(open('ffrk_inventory_relics.csv', 'r'))]
write = csv.writer(open('ffrk_inventory_relics.csv'))
write.writerows(final_data)
If you don't need csv you can use replace with regular open:
with open('ffrk_inventory_relics.csv', 'r') as reader, open('outfile.csv','w') as writer:
for row in reader:
writer.write(row.replace('+','z'))
I'm trying to use prettytable module to print out data from csv file. But it failed with the following exception
Could not determine delimiter error for valid csv file
>>> import prettytable
>>> with file("/tmp/test.csv") as f:
... prettytable.from_csv(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "build/bdist.linux-x86_64/egg/prettytable.py", line 1337, in from_csv
File "/usr/lib/python2.7/csv.py", line 188, in sniff
raise Error, "Could not determine delimiter"
_csv.Error: Could not determine delimiter
The CSV file:
input_gps,1424185824460,1424185902788,1424185939525,1424186019313,1424186058952,1424186133797,1424186168766,1424186170214,1424186246354,1424186298434,1424186376789,1424186413625,1424186491453,1424186606143,1424186719394,1424186756366,1424186835829,1424186948532,1424187107293,1424187215557,1424187250693,1424187323097,1424187358989,1424187465475,1424187475824,1424187476738,1424187548602,1424187549228,1424187550690,1424187582866,1424187584248,1424187639923,1424187641623,1424187774567,1424187776418,1424187810376,1424187820238,1424187820998,1424187916896,1424187917472,1424187919241,1424188048340,dummy-0,dummy-1,Total
-73.958315%2C 40.815569,0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),13.0 (42%)
-76.932984%2C 38.992186,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),1.0(100%),0.0(nan%),1.0(100%),0.0(nan%),1.0(100%),0.0(nan%),0.0(nan%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),0.0(0%),17.0 (55%)
null_input-0,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0 (0%)
null_input-1,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),1.0(100%),1.0 (3%)
Total,0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),0.0(0%),1.0(3%),0.0(0%),0.0(0%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),31.0(100%)
If you anyone can inform me how to workaround the problem or other alternative alternatives, it will be very helpful.
According to pypi, prettytable is only alpha level. I could not find where you could give it the configuration to pass to the csv module. So in that case, you probably should read the csv file by explicitely declaring the delimiter, and build the PrettyTable line by line
pt = None # to avoid it vanished at end of block...
with open('/tmp/test.csv') as fd:
rd = csv.reader(fd, delimiter = ',')
pt = PrettyTable(next(rd))
for row in rd:
pt.add_row(row)
Got the same, working on some messier csv's. Ended up implementing fallback method with manual search
from string import punctuation, whitespace
from collections import Counter
def get_delimiter(self, contents: str):
# contents = f.read()
try:
sniffer = csv.Sniffer()
dialect = sniffer.sniff(contents)
return dialect.delimiter
except Error:
return fallback_delimiter_search(contents)
def fallback_delimiter_search(contents: str) -> str:
# eliminate space in case of a lot of text
content_chars = list(filter(lambda x: (x in punctuation or x in whitespace) and x!=' ', contents))
counts = Counter(content_chars)
tgt_delimiter = counts.most_common(1)[0][0]
return tgt_delimiter
I have an ascii file and I want to read it into a numpy array. But it was failing and for the first number in the file, it returns 'NaN' when I use numpy.genfromtxt. Then I tried to use the following way of reading the file into an array:
lines = file('myfile.asc').readlines()
X = []
for line in lines:
s = str.split(line)
X.append([float(s[i]) for i in range(len(s))])
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
ValueError: could not convert string to float: 15.514
when I printed the first line of the file it looks like :
>>> s
['\xef\xbb\xbf15.514', '15.433', '15.224', '14.998', '14.792', '15.564', '15.386', '15.293', '15.305', '15.132', '15.073', '15.005', '14.929', '14.823', '14.766', '14.768', '14.789']
how could I read such a file into a numpy array without problem and any presumption about the number of rows and columns?
Based on #falsetru's answer, I want to provide a solution with Numpy's file reading capabilities:
import numpy as np
import codecs
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = np.loadtxt(f)
It loads the file into an open file instance using the correct encoding. Numpy uses this kind of handle (it can also use handles from open() and works seemless like in every other case.
The file is encoded with utf-8 with BOM. Use codecs.open with utf-8-sig encoding to handle it correctly (To exclude BOM \xef\xbb\xbf).
import codecs
X = []
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
for line in f:
s = line.split()
X.append([float(s[i]) for i in range(len(s))])
UPDATE You don't need to use index at all:
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = [[float(x) for x in line.split()] for line in f]
BTW, instead of using the unbound method str.split(line), use line.split() if you have no special reason to do it.