How to apply regex sub to a csv file in python - python

I have a csv file I wish to apply a regex replacement to with python.
So far I have the following
reader = csv.reader(open('ffrk_inventory_relics.csv', 'r'))
writer = csv.writer(open('outfile.csv','w'))
for row in reader:
reader = re.sub(r'\+','z',reader)
Which is giving me the following error:
Script error: Traceback (most recent call last):
File "ffrk_inventory_tracker_v1.6.py", line 22, in response
getRelics(data['equipments'], 'ffrk_inventory_relics')
File "ffrk_inventory_tracker_v1.6.py", line 72, in getRelics
reader = re.sub(r'\+','z',reader)
File "c:\users\baconcatbug\appdata\local\programs\python\python36\lib\re.py",
line 191, in sub
return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object
After googling to not much luck, I would like to ask the community here how to open the csv file correctly so I can use re.sub on it and then write out the altered csv file back to the same filename.

csv.reader(open('ffrk_inventory_relics.csv', 'r')) is creating a list of lists, and when you iterate over it and pass each value to re.sub, you are passing a list, not a string. Try this:
import re
import csv
final_data = [[re.sub('\+', 'z', b) for b in i] for i in csv.reader(open('ffrk_inventory_relics.csv', 'r'))]
write = csv.writer(open('ffrk_inventory_relics.csv'))
write.writerows(final_data)

If you don't need csv you can use replace with regular open:
with open('ffrk_inventory_relics.csv', 'r') as reader, open('outfile.csv','w') as writer:
for row in reader:
writer.write(row.replace('+','z'))

Related

trim leading and trailing whitespaces along with commas in csv file with python

this is how my data looks like when opened in Microsoft Excel.
As can be seen all the contents of the cell except 218(aligned to the right of the cell) are parsed as strings(aligned to the left of the cell). It is because they start with white space(it is " 4,610" instead of "4610").
I would like to remove all those white spaces at the beginning and also replace those commas(not the ones that make csvs csvs) because if comma exists 4 and 610 may be read into different cells.
Here's what I tried:
this is what i tried with inspiration from this stackoverflow answer:
import csv
import string
with open("old_dirty_file.csv") as bad_file:
reader = csv.reader(bad_file, delimiter=",")
with open("new_clean_file.csv", "w", newline="") as clean_file:
writer = csv.writer(clean_file)
for rec in reader:
writer.writerow(map(str.replace(__old=',', __new='').strip, rec))
But, I get this error:
Traceback (most recent call last):
File "C:/..,,../clean_files.py", line 9, in <module>
writer.writerow(map(str.replace(__old=',', __new='').strip, rec))
TypeError: descriptor 'replace' of 'str' object needs an argument
How do I clean those files?
Just need to separate replacement from stripping because python doesn't know which string the replacement should be made in.
for rec in reader:
rec = (i.replace(__old=',', __new='') for i in rec)
writer.writerow(map(str.strip, rec))
or combine them into a single function:
repstr = lambda string, old=',', new='': string.replace(old, new).strip()
for rec in reader:
writer.writerow(map(repstr, rec))

How to read a CSV column as a string in Python

I wrote code with pandas in order to pass in a CSV and retrieve a column, and then I have more code that is supposed to split the data using the re library, but it throws an error stating "TypeError: expected string or bytes-like object."
I believe I just need to convert the CSV into a string before running re on it, but I can't figure out how.
The column in the CSV has data which look like: 'HB1.A1D62no.0016, HB31.N33NO.89, HB 54 .N338'
import pandas as pd
data = pd.read_csv('HB_Lib.csv', delimiter = ',')
s = [data[['Call Number']]]
import re
pattern = r"(^[a-z]+)\s*(\d+(?:\.\d+)?)"
print(list(map("".join, [re.findall(pattern, part, flags=re.I)[0] for part in s])))
Traceback:
Traceback (most recent call last):
File "C:/Python/test2.py", line 8, in <module>
print(list(map("".join, [re.findall(pattern, part, flags=re.I)[0] for part in s])))
File "C:/Python/test2.py", line 8, in <listcomp>
print(list(map("".join, [re.findall(pattern, part, flags=re.I)[0] for part in s])))
File "C:\Python37\lib\re.py", line 223, in findall
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
data['Call Number'] = data['Call Number'].astype(str)
I think the first thing you should do is to remove the external square brakets when declaring s.
So, obtaining something like:
a = data[['something']]

search for a word in file and replace it with the corresponding value from dictionary in python

I want to search for a word from some file and replace that word with different string from the dictionary of keywords. It's basically just text replacement thing.
My code below doesn't work:
keyword = {
"shortkey":"longer sentence",
"gm":"goodmorning",
"etc":"etcetera"
}
with open('find.txt', 'r') as file:
lines = file.readlines()
for line in lines:
if re.search(keyword.keys(), line):
file.write(line.replace(keyword.keys(), keyword.values()))
break
The error message when I write print:
Traceback (most recent call last):
File "py.py", line 42, in <module>
if re.search(keyword.keys(), line):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 146, in search
return _compile(pattern, flags).search(string)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 237, in _compile
p, loc = _cache[cachekey]
TypeError: unhashable type: 'list'
Editing a file in place is dirty; you would be better off writing to a new file, and, afterward, replacing your old file.
You are attempting to use a list as a regex, which isn't valid. I'm not sure why you are using regex in the first place, as it's not necessary. You also cannot pass a list into str.replace.
You can iterate over the list of keywords and check each one against the string.
keyword = {
"shortkey": "longer sentence",
"gm": "goodmorning",
"etc": "etcetera"
}
with open('find.txt', 'r') as file, open('find.txt.new', 'w+') as newfile:
for line in file:
for word, replacement in keyword.items():
newfile.write(line.replace(word, replacement))
# Replace your old file afterwards with the new one
import os
os.rename('find.txt.new', 'find.txt')

python prettytable module raise Could not determine delimiter error for valid csv file

I'm trying to use prettytable module to print out data from csv file. But it failed with the following exception
Could not determine delimiter error for valid csv file
>>> import prettytable
>>> with file("/tmp/test.csv") as f:
... prettytable.from_csv(f)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "build/bdist.linux-x86_64/egg/prettytable.py", line 1337, in from_csv
File "/usr/lib/python2.7/csv.py", line 188, in sniff
raise Error, "Could not determine delimiter"
_csv.Error: Could not determine delimiter
The CSV file:
input_gps,1424185824460,1424185902788,1424185939525,1424186019313,1424186058952,1424186133797,1424186168766,1424186170214,1424186246354,1424186298434,1424186376789,1424186413625,1424186491453,1424186606143,1424186719394,1424186756366,1424186835829,1424186948532,1424187107293,1424187215557,1424187250693,1424187323097,1424187358989,1424187465475,1424187475824,1424187476738,1424187548602,1424187549228,1424187550690,1424187582866,1424187584248,1424187639923,1424187641623,1424187774567,1424187776418,1424187810376,1424187820238,1424187820998,1424187916896,1424187917472,1424187919241,1424188048340,dummy-0,dummy-1,Total
-73.958315%2C 40.815569,0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),13.0 (42%)
-76.932984%2C 38.992186,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),1.0(100%),0.0(nan%),1.0(100%),0.0(nan%),1.0(100%),0.0(nan%),0.0(nan%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),1.0(100%),1.0(100%),0.0(nan%),0.0(0%),17.0 (55%)
null_input-0,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0 (0%)
null_input-1,0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(0%),0.0(nan%),0.0(nan%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),0.0(0%),0.0(0%),0.0(nan%),1.0(100%),1.0 (3%)
Total,0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),0.0(0%),1.0(3%),0.0(0%),0.0(0%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),1.0(3%),0.0(0%),1.0(3%),31.0(100%)
If you anyone can inform me how to workaround the problem or other alternative alternatives, it will be very helpful.
According to pypi, prettytable is only alpha level. I could not find where you could give it the configuration to pass to the csv module. So in that case, you probably should read the csv file by explicitely declaring the delimiter, and build the PrettyTable line by line
pt = None # to avoid it vanished at end of block...
with open('/tmp/test.csv') as fd:
rd = csv.reader(fd, delimiter = ',')
pt = PrettyTable(next(rd))
for row in rd:
pt.add_row(row)
Got the same, working on some messier csv's. Ended up implementing fallback method with manual search
from string import punctuation, whitespace
from collections import Counter
def get_delimiter(self, contents: str):
# contents = f.read()
try:
sniffer = csv.Sniffer()
dialect = sniffer.sniff(contents)
return dialect.delimiter
except Error:
return fallback_delimiter_search(contents)
def fallback_delimiter_search(contents: str) -> str:
# eliminate space in case of a lot of text
content_chars = list(filter(lambda x: (x in punctuation or x in whitespace) and x!=' ', contents))
counts = Counter(content_chars)
tgt_delimiter = counts.most_common(1)[0][0]
return tgt_delimiter

read an ascii file into a numpy array

I have an ascii file and I want to read it into a numpy array. But it was failing and for the first number in the file, it returns 'NaN' when I use numpy.genfromtxt. Then I tried to use the following way of reading the file into an array:
lines = file('myfile.asc').readlines()
X = []
for line in lines:
s = str.split(line)
X.append([float(s[i]) for i in range(len(s))])
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
ValueError: could not convert string to float: 15.514
when I printed the first line of the file it looks like :
>>> s
['\xef\xbb\xbf15.514', '15.433', '15.224', '14.998', '14.792', '15.564', '15.386', '15.293', '15.305', '15.132', '15.073', '15.005', '14.929', '14.823', '14.766', '14.768', '14.789']
how could I read such a file into a numpy array without problem and any presumption about the number of rows and columns?
Based on #falsetru's answer, I want to provide a solution with Numpy's file reading capabilities:
import numpy as np
import codecs
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = np.loadtxt(f)
It loads the file into an open file instance using the correct encoding. Numpy uses this kind of handle (it can also use handles from open() and works seemless like in every other case.
The file is encoded with utf-8 with BOM. Use codecs.open with utf-8-sig encoding to handle it correctly (To exclude BOM \xef\xbb\xbf).
import codecs
X = []
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
for line in f:
s = line.split()
X.append([float(s[i]) for i in range(len(s))])
UPDATE You don't need to use index at all:
with codecs.open('myfile.asc', encoding='utf-8-sig') as f:
X = [[float(x) for x in line.split()] for line in f]
BTW, instead of using the unbound method str.split(line), use line.split() if you have no special reason to do it.

Categories