This is my first attempt about subclassing, so I need some hints from you experts..
I'm trying to subclass csv.DictReader / Writer to have a higher level class to do something like this :
a = CsvRdr('filename.csv')
for row in a.rows:
# do something with dict returned in row
a.close()
I've come up with a subclass like this :
class CsvRdr(csv.DictReader):
def __init__(self, filename):
self.__fo = open(filename, 'rb')
self.__delim = '\t'
self.rows = csv.DictReader(self.__fo, self.__delim)
self.rows.__init__(self.__fo, self.__del)
def close(self):
self.__fo.close()
but when I do :
for i in a.rows:
print i
it returns an unformatted dict containing the delimiter \t as key :
{'\t': 'seriesid\tseriesname\tstatus\tquality\tgroup\tpath'}
{'\t': '80337\tMad Men\tAiring\thdtv\tTV Shows\t/share/MD0_DATA/SORT/TV Shows/Mad Men'}
{'\t': '271910\tHalt and Catch Fire\tHiatus\thdtv\tTV Shows\t/share/MD0_DATA/SORT/TV
instead of a dict containing the proper fieldnames and the relative values splitted by delimiter
But when I'm going to instantiate DictReader from another function, all that i need to do is :
fo = open(filename, 'rb')
reader = csv.DictReader(fo, delimiter='\t')
and the reader object correctly gives you the desired output.
Any suggestion ?
I've not clear in my mind the subclassing process, and what'ive found online till now didn't help me.
TIA
Enrico
Your posted code would barf with an AttributeError, you have self._del when you mean to have self._delim.
Beyond that, your other issue is that you invoke the constructor incorrectly:
self.rows = csv.DictReader(self.__fo, self.__delim)
should be
self.rows = csv.DictReader(self.__fo, delimiter = self.__delim)
Looking at the constructor signature for DictReader we see what actually happened:
csv.DictReader(self, f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
Your self.__delim argument was set to the fieldnames parameter. This is what Python (2.7 anyway) does by default when you give a non-keyword position argument to a function that has only keyword arguments remaining. It fills in the next keyword argument in the signature using the positional argument.
So you're telling DictReader "Hey this CSV has only one column, and it's name is '\t'". So DictReader does the logical thing which is to only have one value per row, that value being the entire line.
Finally this line:
self.rows.__init__(self.__fo, self.__del)
Isn't doing anything, you are just repeating the constructor call in a more explicit way.
Here's how I would re-write what you were trying to do:
class CsvRdr(object):
def __init__(self, filename):
self.__fo = open(filename, 'rb')
self.__delim = '\t'
self.rows = csv.DictReader(self.__fo, delimiter = self.__delim)
def close(self):
self.__fo.close()
Notice I change csv.DictReader to object, this is because this pattern you are using is actually delegation and not subclassing or inheritance. You are just setting one of your objects attributes to an instance of the class you are interested in using, and your methods just call that instance's methods in more convenient ways.
In the end, I solved in this way :
class CsvRdr(object):
def __init__(self, filename, delimiter=None):
self.__fo = open(filename, 'rb')
self.__delim = ( delimiter if delimiter else '\t' )
self.__rows = csv.DictReader(self.__fo, delimiter = self.__delim)
def __iter__(self):
return self.__rows
def close(self):
self.__fo.close()
Class called by this function :
def CsvRead(filename):
try:
reader = CsvRdr(filename)
return reader
except IOError, e:
print "Error reading file : %s ERROR=%s" % (filename, e)
sys.exit(2)
In this second attempt, I added the iter magic method to mimic the original behaviour of Csv.DictReader, so you can loop thru data in the usual way, instead of using object.rows method :
reader = CsvRead(catalog)
seriesnames = [ row['seriesname'].lower() for row in reader ]
reader.close()
Related
I need to stripe the white spaces from a CSV file that I read
import csv
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
# I need to strip the extra white space from each string in the row
return(aList)
There's also the embedded formatting parameter: skipinitialspace (the default is false)
http://docs.python.org/2/library/csv.html#csv-fmt-params
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
return(aList)
In my case, I only cared about stripping the whitespace from the field names (aka column headers, aka dictionary keys), when using csv.DictReader.
Create a class based on csv.DictReader, and override the fieldnames property to strip out the whitespace from each field name (aka column header, aka dictionary key).
Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnames attribute to this new list.
import csv
class DictReaderStrip(csv.DictReader):
#property
def fieldnames(self):
if self._fieldnames is None:
# Initialize self._fieldnames
# Note: DictReader is an old-style class, so can't use super()
csv.DictReader.fieldnames.fget(self)
if self._fieldnames is not None:
self._fieldnames = [name.strip() for name in self._fieldnames]
return self._fieldnames
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
return [[x.strip() for x in row] for row in reader]
You can do:
aList.append([element.strip() for element in row])
The most memory-efficient method to format the cells after parsing is through generators. Something like:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
yield (cell.strip() for cell in row)
But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:
nulls = {'NULL', 'null', 'None', ''}
def clean(reader):
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield clean(row)
Or it can be used to factorize a class:
def factory(reader):
fields = next(reader)
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield dict(zip(fields, clean(row)))
You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.
import re
class CSVSpaceStripper:
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile("\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def next(self):
line = self.fh.next()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
Then use it like this:
o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")
I hardcoded ";" to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.
Read a CSV (or Excel file) using Pandas and trim it using this custom function.
#Definition for strippping whitespace
def trim(dataset):
trim = lambda x: x.strip() if type(x) is str else x
return dataset.applymap(trim)
You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)
dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))
and here is Daniel Kullmann excellent solution adapted to Python3:
import re
class CSVSpaceStripper:
"""strip whitespaces around delimiters in the file
NB has hardcoded delimiter ";"
"""
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile(r"\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile(r"^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def __next__(self):
line = self.fh.readline()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
I figured out a very simple solution:
import csv
with open('filename.csv') as f:
reader = csv.DictReader(f)
rows = [ { k.strip(): v.strip() for k,v in row.items() } for row in reader ]
The following code may help you:
import pandas as pd
aList = pd.read_csv(r'filename.csv', sep='\s*,\s*', engine='python')
I have a file, memory.txt, and I want to store an instance of the class Weapon() in a dictionary, on the second line.
with open(memorypath(), "r") as f:
lines = f.readlines()
inv = inventory()
if "MAINWEAPON" not in inv or inv["MAINWEAPON"] == "":
inv["MAINWEAPON"] = f"""Weapon(sw, 0, Ability(0, "0"), ["{name}'s first weapon."], dmg=30, cc=20, str=15)"""
lines[1] = str(inv) + "\n"
with open(memorypath(), "w") as f:
f.writelines(lines)
(inventory and memorypath are from another file I have for utility functions)
Though, with what I have, if I get inv["MAINWEAPON"] I'll just get the string, not the class. And I have to store it like a string, or else I'll be getting something like <__main\__.Weapon object at (hexadecimal path thing)>.
How do I get the class itself upon getting inv["MAINWEAPON"]?
Another thing, too, I feel like I'm making such confusion with newlines, because file memory.txt has 6 lines but gets shortened to 5, please tell me if I'm doing anything wrong.
If you have a class then you can represent it as a dict and save it as json format.
class Cat:
name: str
def __init__(self, name: str):
self.name = name
def dict(self):
return {'name': self.name}
#classmethod
def from_dict(cls, d):
return cls(name = d['name'])
Now you can save the class as a json to a file like this:
import json
cat = Cat('simon')
with open('cat.json', 'w') as f:
json.dump(cat.dict(), f)
And you can load the json again like this:
with open('cat.json', 'r') as f:
d = json.load(f)
cat = Cat.from_dict(d)
Update
Since python 3.7 the possilility to make dataclasses has been made, and I am here giving an example of how you can use that to save the classes into a json format.
If you want to use the json file as a database and be able to append new entities to it then you will have to load the file into memory and append the new data and finally override the old json file, the code below will do exactly that.
from dataclasses import dataclass, asdict
import json
#dataclass
class Cat:
name: str
def load_cats() -> list[Cat]:
try:
with open('cats.json', 'r') as fd:
return [Cat(**x) for x in json.load(fd)]
except FileNotFoundError:
return []
def save_cat(c):
data = [asdict(x) for x in load_cats() + [c]]
with open('cats.json', 'w') as fd:
json.dump(data, fd)
c = Cat(name='simon')
save_cat(c)
cats = load_cats()
print(cats)
A simplest approach I can suggest would be dataclasses.asdict as mentioned; or else, using a serialization library that supports dataclasses. There are a lot of good ones out there, but for this purpose I might suggest dataclass-wizard. Further, if you want to transform an arbitrary JSON object to dataclass structure, you can use the included CLI tool. When serializing, it will autoamtically apply a key transform (snake_case to camelCase) but this is easily customizable as well.
Disclaimer: I am the creator (and maintener) of this library.
I've done the following CSV reader class:
class CSVread(object):
filtered = []
def __init__(self, file):
self.file = file
def get_file(self):
try:
with open(self.file, "r") as f:
self.reader = [row for row in csv.reader(f, delimiter = ";")]
return self.reader
except IOError as err:
print("I/O error({0}): {1}".format(errno, strerror))
return
def get_num_rows(self):
print(sum(1 for row in self.reader))
Which can be used with the following example:
datacsv = CSVread("data.csv") # ; seperated file
for row in datacsv.get_file(): # prints all the rows
print(row)
datacsv.get_num_rows() # number of rows in data.csv
My goal is to filter out the content of the csv file (data.csv) by filtering column 12 by the keyword "00GG". I can get it to work outside the class like this:
with open("data.csv") as csvfile:
reader = csv.reader(csvfile, delimiter = ";")
filtered = []
filtered = filter((lambda row: row[12] in ("00GG")), list(reader))
Code below returns an empty list (filtered) when it's defined inside the class:
def filter_data(csv_file):
filtered = filter((lambda row: row[12] in ("00GGL")), self.reader)
return filtered
Feedback for the existing code is also appreciated.
Could it be that in the first filter example you are searching for 00GG whereas in the second one you are searching for 00GGL?
Regardless, if you want to define filter_data() within the class you should write is as a method of the class. That means that it takes a self parameter, not a csv_file:
def filter_data(self):
filtered = filter((lambda row: row[12] in ("00GGL")), self.reader)
return filtered
Making it more general:
def filter_data(self, column, values):
return filter((lambda row: row[column] in values), self.reader)
Now you can call it like this:
datacsv.filter_data(12, ('00GGL',))
which should work if the input data does indeed contain rows with 00GGL in column 12.
Note that filter_data() should only be called after get_file() otherwise there is no self.reader. Unless you have a good reason not to read in the data when the CSVread object is created (e.g. you are aiming for lazy evaluation), you should read it in then. Otherwise, set self.reader = [] which will prevent failure in other methods.
I'd like to yield results from a large array (coming from the database) to the browser (with Flask) using the method shared in their documentation :
#app.route('/large.csv')
def generate_large_csv():
def generate():
for row in iter_all_rows():
yield ','.join(row) + '\n'
return Response(generate(), mimetype='text/csv')
With a twist : Instead of generating the csv myself (join with ',', adding a breakline), I'd like to use the csv package.
Now, the only way I found to return only one written line is to do the following :
#app.route('/large.csv')
def generate_large_csv():
def generate():
for row in iter_all_rows():
dest = io.StringIO()
writer = csv.writer(dest)
writer.writerow(row)
yield dest.getvalue()
return Response(generate(), mimetype='text/csv')
But creating a new io.StringIO & csv.writer for every row just does not seems right at all!
I took a look at the documentation of the package, but I wasn't able to find something that would only return one line.
You can to it easily with a custom file object. If you create an object with a write method that simply stores its input and give it to a csv writer, it is done :
class keep_writer:
def write(self, txt):
self.txt = txt
#app.route('/large.csv')
def generate_large_csv():
def generate():
kw = keep_writer()
wr = csv.writer(kw) # add optional configuration for the csv.writer
for row in iter_all_rows():
wr.writerow(row) # just write the row
yield kw.txt # and yield the line build by the csv.writer
return Response(generate(), mimetype='text/csv')
I need to stripe the white spaces from a CSV file that I read
import csv
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
# I need to strip the extra white space from each string in the row
return(aList)
There's also the embedded formatting parameter: skipinitialspace (the default is false)
http://docs.python.org/2/library/csv.html#csv-fmt-params
aList=[]
with open(self.filename, 'r') as f:
reader = csv.reader(f, skipinitialspace=False,delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
aList.append(row)
return(aList)
In my case, I only cared about stripping the whitespace from the field names (aka column headers, aka dictionary keys), when using csv.DictReader.
Create a class based on csv.DictReader, and override the fieldnames property to strip out the whitespace from each field name (aka column header, aka dictionary key).
Do this by getting the regular list of fieldnames, and then iterating over it while creating a new list with the whitespace stripped from each field name, and setting the underlying _fieldnames attribute to this new list.
import csv
class DictReaderStrip(csv.DictReader):
#property
def fieldnames(self):
if self._fieldnames is None:
# Initialize self._fieldnames
# Note: DictReader is an old-style class, so can't use super()
csv.DictReader.fieldnames.fget(self)
if self._fieldnames is not None:
self._fieldnames = [name.strip() for name in self._fieldnames]
return self._fieldnames
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
return [[x.strip() for x in row] for row in reader]
You can do:
aList.append([element.strip() for element in row])
The most memory-efficient method to format the cells after parsing is through generators. Something like:
with open(self.filename, 'r') as f:
reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
for row in reader:
yield (cell.strip() for cell in row)
But it may be worth moving it to a function that you can use to keep munging and to avoid forthcoming iterations. For instance:
nulls = {'NULL', 'null', 'None', ''}
def clean(reader):
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield clean(row)
Or it can be used to factorize a class:
def factory(reader):
fields = next(reader)
def clean(row):
for cell in row:
cell = cell.strip()
yield None if cell in nulls else cell
for row in reader:
yield dict(zip(fields, clean(row)))
You can create a wrapper object around your file that strips away the spaces before the CSV reader sees them. This way, you can even use the csv file with cvs.DictReader.
import re
class CSVSpaceStripper:
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile("\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile("^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def next(self):
line = self.fh.next()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
Then use it like this:
o = csv.reader(CSVSpaceStripper(filename), delimiter=";")
o = csv.DictReader(CSVSpaceStripper(filename), delimiter=";")
I hardcoded ";" to be the delimiter. Generalising the code to any delimiter is left as an exercise to the reader.
Read a CSV (or Excel file) using Pandas and trim it using this custom function.
#Definition for strippping whitespace
def trim(dataset):
trim = lambda x: x.strip() if type(x) is str else x
return dataset.applymap(trim)
You can now apply trim(CSV/Excel) to your code like so (as part of a loop, etc.)
dataset = trim(pd.read_csv(dataset))
dataset = trim(pd.read_excel(dataset))
and here is Daniel Kullmann excellent solution adapted to Python3:
import re
class CSVSpaceStripper:
"""strip whitespaces around delimiters in the file
NB has hardcoded delimiter ";"
"""
def __init__(self, filename):
self.fh = open(filename, "r")
self.surroundingWhiteSpace = re.compile(r"\s*;\s*")
self.leadingOrTrailingWhiteSpace = re.compile(r"^\s*|\s*$")
def close(self):
self.fh.close()
self.fh = None
def __iter__(self):
return self
def __next__(self):
line = self.fh.readline()
line = self.surroundingWhiteSpace.sub(";", line)
line = self.leadingOrTrailingWhiteSpace.sub("", line)
return line
I figured out a very simple solution:
import csv
with open('filename.csv') as f:
reader = csv.DictReader(f)
rows = [ { k.strip(): v.strip() for k,v in row.items() } for row in reader ]
The following code may help you:
import pandas as pd
aList = pd.read_csv(r'filename.csv', sep='\s*,\s*', engine='python')