Fabrica for different parsers - python

Problem:
Write a class Fabrica what give: way to file and/or format file,
return: data from this file in dict.
Write a abstract class Reader what have just one method "reader"
without implementation.
Write 3 classes CSVReader, XMLReader, JSONReader. They
inherited from Reader, must implementation method "reader" with
functionality for parse csv, json, xml. Must return data in
dict format to fabrica.
So i have next problem. I don`t understand how to correctly write this all classes. I wrote solution, but have error (code below).
My question is: how to correctly write this all classes?
And recommend me a some good book about OOP please)
class FactoryRader:
def __init__(self, fileName,frmt=None):
self.frmt = frmt
self.fileName = fileName
def __str__(self):
return Reader.openFile(self.fileName, self.frmt)
class Reader:
def openFile(fileName, frmt):
try:
with open(fileName, 'rU') as data:
if fileName.endswith('.csv') or frmt == 'csv':
return CSVReader.reader(data)
if fileName.endswith('.xml') or frmt == 'xml':
return XMLReader()
if fileName.endswith('.js') or (frmt == 'json' or frmt == 'js'):
return JSONReader()
else:
return 'Incorrect File!'
except IOError:
print('Cant open')
def reader(data):
pass
class CSVReader(Reader):
def reader(data):
dialect = csv.Sniffer().sniff(data.readline(), [',',';'])
data.seek(0)
reader = csv.DictReader(data, dialect=dialect)
for row in reader:
print (row)
class JSONReader(Reader):
def reader(data):
pass
class XMLReader(Reader):
def reader(data):
pass
if __name__ == '__main__':
data = FactoryRader('CsvExamples/price.csv')
print(data)
Error
Traceback (most recent call last):
File "ClassParsers.py", line 62, in <module>
print(data)
TypeError: __str__ returned non-string (type NoneType)

These is by no means a complete solution; it's just a bunch of remarks.
As I understand, the try/except block in Reader should go in FactoryReader. There is no such thing as abstract class in Python, so your class Reader could be empty. Or, if you prefer, just
class Reader:
def reader(data):
pass
(if you are using Python2, it's better to use new style classes: class Reader(object)).
You are asked for FactoryReader to return a dict, not a string, so the __str__ function is not important. The Error you get is telling you that the __str__ method in FactoryReader should return a string. It would be better to not implement __str__ and use another name for that function; say get_reader. And then, you should return the data in a dict way. So it would be
def get_reader(self):
if self.fileName.endswith('.csv') or self.frmt == 'csv':
return CSVReader(self.filename)
etc
Then,
class CSVReader(Reader):
def reader(filename):
code to open filename, read it and parse it
code to convert parsed code into dict
return dict
Similarly for JSONReader and XMLReader.

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

Suppose you have the following:
file = 'hey.py'
class hey:
def __init__(self):
self.you =1
ins = hey()
temp = open("cool_class", "wb")
pickle.dump(ins, temp)
temp.close()
Now suppose you delete the file hey.py and you run the following code:
pkl_file = open("cool_class", 'rb')
obj = pickle.load(pkl_file)
pkl_file.close()
You'll get an error. I get that it's probably the case that you can't work around the problem of if you don't have the file hey.py with the class and the attributes of that class in the top level then you can't open the class with pickle. But it has to be the case that I can find out what the attributes of the serialized class are and then I can reconstruct the deleted file and open the class. I have pickles that are 2 years old and I have deleted the file that I used to construct them and I just have to find out what what the attributes of those classes are so that I can reopen these pickles
#####UPDATE
I know from the error messages that the module that originally contained the old class, let's just call it 'hey.py'. And I know the name of the class let's call it 'you'. But even after recreating the module and building a class called 'you' I still can't get the pickle to open. So I wrote this code on the hey.py module like so:
class hey:
def __init__(self):
self.hey = 1
def __setstate__(self):
self.__dict__ = ''
self.you = 1
But I get the error message: TypeError: init() takes 1 positional argument but 2 were given
#########UPDATE 2:
I Changed the code from
class hey:
to
class hey():
I then got an AttributeError but it doesn't tell me what attribute is missing. I then performed
obj= pickletools.dis(file)
And got an error on the pickletools.py file here
def _genops(data, yield_end_pos=False):
if isinstance(data, bytes_types):
data = io.BytesIO(data)
if hasattr(data, "tell"):
getpos = data.tell
else:
getpos = lambda: None
while True:
pos = getpos()
code = data.read(1)
opcode = code2op.get(code.decode("latin-1"))
if opcode is None:
if code == b"":
raise ValueError("pickle exhausted before seeing STOP")
else:
raise ValueError("at position %s, opcode %r unknown" % (
"<unknown>" if pos is None else pos,
code))
if opcode.arg is None:
arg = None
else:
arg = opcode.arg.reader(data)
if yield_end_pos:
yield opcode, arg, pos, getpos()
else:
yield opcode, arg, pos
if code == b'.':
assert opcode.name == 'STOP'
break
At this line:
code = data.read(1)
saying: AttributeError: 'str' object has no attribute 'read'
I will now try the other methods in the pickletools
########### UPDATE 3
I wanted to see what happened when I saved an object composed mostly of dictionary but some of the values in the dictionaries were classes. This is the class that was saved:
so here is the class in question:
class fss(frozenset):
def __init__(self, *args, **kwargs):
super(frozenset, self).__init__()
def __str__(self):
str1 = lbr + "{}" + rbr
return str1.format(','.join(str(x) for x in self))
Now keep in mind that the object pickled is mostly a dictionary and that class exists within the dictionary. After performing
obj= pickletools.genops(file)
I get the following output:
image
image2
I don't see how I would be able to construct the class referred to with that data if I hadn't known what the class was.
############### UPDATE #4
#AKK
Thanks for helping me out. I am able to see how your code works but my pickled file saved from 2 years ago and whose module and class have long since been deleted, I cannot open it into a bytes-like object which to me seems to be a necessity.
So the path of the file is
file ='hey.pkl'
pkl_file = open(file, 'rb')
x = MagicUnpickler(io.BytesIO(pkl_file)).load()
This returns the error:
TypeError: a bytes-like object is required, not '_io.BufferedReader'
But I thought the object was a bytes object since I opened it with open(file, 'rb')
############ UPDATE #5
Actually, I think with AKX's help I've solved the problem.
So using the code:
pkl_file = open(name, 'rb')
x = MagicUnpickler(pkl_file).load()
I then created two blank modules which once contained the classes found in the save pickle, but I did not have to put the classes on them. I was getting an error in the file pickle.py here:
def load_reduce(self):
stack = self.stack
args = stack.pop()
func = stack[-1]
try:
stack[-1] = func(*args)
except TypeError:
pass
dispatch[REDUCE[0]] = load_reduce
So after excepting that error, everything worked. I really want to thank AKX for helping me out. I have actually been trying to solve this problem for about 5 years because I use pickles far more often than most programmers. I used to not understand that if you alter a class then that ruins any pickled files saved with that class so I ran into this problem again and again. But now that I'm going back over some code which is 2 years old and it looks like some of the files were deleted, I'm going to need this code a lot in the future. So I really appreciate your help in getting this problem solved.
Well, with a bit of hacking and magic, sure, you can hydrate missing classes, but I'm not guaranteeing this will work for all pickle data you may encounter; for one, this doesn't touch the __setstate__/__reduce__ protocols, so I don't know if they work.
Given a script file (so72863050.py in my case):
import io
import pickle
import types
from logging import Formatter
# Create a couple empty classes. Could've just used `class C1`,
# but we're coming back to this syntax later.
C1 = type('C1', (), {})
C2 = type('C2', (), {})
# Create an instance or two, add some data...
inst = C1()
inst.child1 = C2()
inst.child1.magic = 42
inst.child2 = C2()
inst.child2.mystery = 'spooky'
inst.child2.log_formatter = Formatter('heyyyy %(message)s') # To prove we can unpickle regular classes still
inst.other_data = 'hello'
inst.some_dict = {'a': 1, 'b': 2}
# Pickle the data!
pickle_bytes = pickle.dumps(inst)
# Let's erase our memory of these two classes:
del C1
del C2
try:
print(pickle.loads(pickle_bytes))
except Exception as exc:
pass # Can't get attribute 'C1' on <module '__main__'> – yep, it certainly isn't there!
we now have successfully created some pickle data that we can't load anymore, since we forgot about those two classes. Now, since the unpickling mechanism is customizable, we can derive a magic unpickler, that in the face of certain defeat (or at least an AttributeError), synthesizes a simple class from thin air:
# Could derive from Unpickler, but that may be a C class, so our tracebacks would be less helpful
class MagicUnpickler(pickle._Unpickler):
def __init__(self, fp):
super().__init__(fp)
self._magic_classes = {}
def find_class(self, module, name):
try:
return super().find_class(module, name)
except AttributeError:
return self._create_magic_class(module, name)
def _create_magic_class(self, module, name):
cache_key = (module, name)
if cache_key not in self._magic_classes:
cls = type(f'<<Emulated Class {module}:{name}>>', (types.SimpleNamespace,), {})
self._magic_classes[cache_key] = cls
return self._magic_classes[cache_key]
Now, when we run that magic unpickler against a stream from the aforebuilt pickle_bytes that plain ol' pickle.loads() couldn't load...
x = MagicUnpickler(io.BytesIO(pickle_bytes)).load()
print(x)
print(x.child1.magic)
print(x.child2.mystery)
print(x.child2.log_formatter._style._fmt)
prints out
<<Emulated Class __main__:C1>>(child1=<<Emulated Class __main__:C2>>(magic=42), child2=<<Emulated Class __main__:C2>>(mystery='spooky'), other_data='hello', some_dict={'a': 1, 'b': 2})
42
spooky
heyyyy %(message)s
Hey, magic!
The error in function load_reduce(self) can be re-created by:
class Y(set):
pass
pickle_bytes = io.BytesIO(pickle.dumps(Y([2, 3, 4, 5])))
del Y
print(MagicUnpickler(pickle_bytes).load())
AKX's answer do not solve cases when the class inherit from base classes as set, dict, list,...

exposing the inner methods of a class, and using them

Let's say I have a class like so:
class Shell:
def cat(self, file):
try:
with open(file, 'r') as f:
print f.read()
except IOError:
raise IOError('invalid file location: {}'.format(f))
def echo(self, message):
print message
def ls(self, path):
print os.listdir(path)
In a javascript context, you might be able to do something like "Class"[method_name](), depending on how things were structured. I am looking for something similar in python to make this a "simulated operating system". EG:
import os
def runShell(user_name):
user_input = None
shell = Shell()
while(user_input != 'exit' or user_input != 'quit'):
user_input = raw_input('$'+ user_name + ': ')
...
now, the idea is they can type in something like this...
$crow: cat ../my_text
... and behind the scenes, we get this:
shell.cat('../my_text')
Similarly, I would like to be able to print all method definitions that exist within that class when they type help. EG:
$crow: help\n
> cat (file)
> echo (message)
> ls (path)
is such a thing achievable in python?
You can use the built-in function vars to expose all the members of an object. That's maybe the simplest way to list those for your users. If you're only planning to print to stdout, you could also just call help(shell), which will print your class members along with docstrings and so on. help is really only intended for the interactive interpreter, though, so you'd likely be better off writing your own help-outputter using vars and the __doc__ attribute that's magically added to objects with docstrings. For example:
class Shell(object):
def m(self):
'''Docstring of C#m.'''
return 1
def t(self, a):
'''Docstring of C#t'''
return 2
for name, obj in dict(vars(Shell)).items():
if not name.startswith('__'): #filter builtins
print(name, '::', obj.__doc__)
To pick out and execute a particular method of your object, you can use getattr, which grabs an attribute (if it exists) from an object, by name. For example, to select and run a simple function with no arguments:
fname = raw_input()
if hasattr(shell, fname):
func = getattr(shell, fname)
result = func()
else:
print('That function is not defined.')
Of course you could first tokenize the user input to pass arguments to your function as needed, like for your cat example:
user_input = raw_input().split() # tokenize
fname, *args = user_input #This use of *args syntax is not available prior to Py3
if hasattr(shell, fname):
func = getattr(shell, fname)
result = func(*args) #The *args syntax here is available back to at least 2.6
else:
print('That function is not defined.')

Error CSV: coercing to Unicode: need string or buffer, S3BotoStorageFile found

I'm getting the following error when trying to read the row and column count of a CSV:
> coercing to Unicode: need string or buffer, S3BotoStorageFile found
import csv
class CSV:
def __init__(self, file=None):
self.file = file
def read_file(self):
data = []
file_read = read_file(self.file)
return file_read
def get_row_count(self):
return len(self.read_file())
def get_column_count(self):
new_data = self.read_file()
return len(new_data[0])
def get_data(self, rows=1):
data = self.read_file()
return data[:rows]
def read_file(self):
with open(self.file, 'r') as f:
data = [row for row in csv.reader(f.read().splitlines())]
return data
How do I resolve?
well, after reading your code my first reaction was OMG! How many does he open that poor file?
Here's a new version of your class
class CSV:
def __init__(self, file=None):
self.file = file
with open(self.file, 'r') as f:
self.data = [row for row in csv.reader(f)]
def get_row_count(self):
return len(self.data)
def get_column_count(self):
return len(self.data[0])
def get_data(self, rows=1):
return self.data
I also fixed your csv.reader() handling. It accepts a file object, no need to .read() or .read().splitlines(), it can only lead to errors. Which may be the reason why it failed.
Ok, given from what you say, you're working on AWS, and your file is not a string path to a file, but already a file object. So you don't need the open() part as is. You may want to modify your code so it is as follows:
class CSV:
def __init__(self, f=None):
self.file = f
if isinstance(self.file, str): # if the file is a string, it's a path that has to be opened
with open(self.file, 'r') as f:
self.data = [row for row in csv.reader(f)]
elif isinstance(self.file, File) or isinstance(self.file, file): # if that's a file object, no need to open
self.data = [row for row in csv.reader(self.file)]
else: # otherwise, I don't know what to do, so aaaaaaaargh!
raise Exception("File object type unknown: %s %s" % (type(file), file,))
def get_row_count(self):
return len(self.data)
def get_column_count(self):
return len(self.data[0])
def get_data(self, rows=1):
return self.data
Reading the S3BotoStorage.py, the S3BotoStorage class inherits from django.core.files.base.File, which inherits from django.core.files.utils.FileProxyMixin, which is a composition of attributes of the global python file class.
So a File object is not an instance of file, but it has a compatible interface. Therefore, in the previous code I have tested whether the self.file is a str, then it shall be a path that we open() so we get a file() and parse it. Otherwise, self.file is a File object or a file() object, and we just need to parse it. If it's neither of those, then it's an error, and we shall except.

conditionally close file on exit from function

I have a (recursive) function which I would like to accept either a string or an opened file object. If the argument is a string, then the function opens a file and uses that file object. It seems best to close this opened file object explicitly when I return from the function, but only if a string was passed in. (Imagine the surprise from the user when they pass in an opened file object and find that their file object was closed somewhere). Here's what I'm currently using:
def read_file(f, param):
do_close = isinstance(f,basestring)
f = open(f, 'rb') if do_close else f
try:
info = f.read(4)
#check info here
if info == Info_I_Want(param):
return f.read(get_data(info))
else:
f.seek(goto_new_position(info))
return read_file(f,param)
except IKnowThisError:
return None
finally:
if do_close:
f.close()
You can assume that IKnowThisError will be raised at some point if I don't find the info I want.
This feels very kludgy. Is there a better way?
Why not wrapping your recursive function with a wrapper to avoid overhead ?
def read_file(f, param):
if isinstance(f, basestring):
with open(f, 'rb') as real_f:
return read_file2(real_f, param)
else:
return read_file2(real_f, param)
def read_file2(f, param):
# Now f should be a file object
...
How about calling your function recursively?
def read_file(f, param):
if isinstance(f, basestring):
with open(f, 'rb') as real_f:
return read_file(real_f, param)
else:
# normal path
The upcoming Python 3.3 offers a more general solution for this kind of problem, namely contextlib.ExitStack. This allow to conditionally add context managers to the current with-block:
def read_file(f, param):
with ExitStack() as stack:
if isinstance(f, basestring):
f = stack.enter_context(open(f, 'rb'))
# Your code here

Using one class from another class

I wrote a simple program to read through a log and to parse through and obtain the lowest beginning number (the head) and to print it. I am now editing that program and combining it with a class I wrote to parse an actual logfile. Essentially, as opposed to sorting based off of the simple number from the log from my previous program, I now need to reference the parsed information from one class into another class. I was wondering what the most convenient way to do this. I am a beginner programmer in python and don't know if I can explicitly reference the class.
Here are the classes.
Parser
class LogLine:
SEVERITIES = ['EMERG','ALERT','CRIT','ERR','WARNING','NOTICE','INFO','DEBUG']
severity = 1
def __init__(self, line):
try:
m = re.match(r"^(\d{4}-\d{2}-\d{2}\s*\d{2}:\d{2}:\d{2}),?(\d{3}),?(\s+\[(?:[^\]]+)\])+\s+[A-Z]+\s+(\s?[a-zA-Z0-9\.])+\s?(\((?:\s?\w)+\))\s?(\s?.)+", line)
timestr, msstr, sevstr, self.filename, linestr, self.message = m.groups()
self.line = int(linestr)
self.sev = self.SEVERITIES.index(sevstr)
self.time = float(calendar.timegm(time.strptime(timestr, "%Y-%m-%d %H:%M:%S,%f"))) + float(msstr)/1000.0
dt = datetime.strptime(t, "%Y-%m-%d %H:%M:%S,%f")
except Exception:
print 'error',self.filename
def get_time(self):
return self.time
def get_severity(self):
return self.sev
def get_message(self):
return self.message
def get_filename(self):
return self.filename
def get_line(self):
return self.line
Sorter
class LogFile:
def __init__(self,filepath):
self.logfile = open(filepath, "r")
self.head = None
def __str__(self):
return "x=" + str(self.x) + "y="+str(self.y)
def readline(self):
if self.head != None:
h = self.head
self.head = None
return h
else:
return self.logfile.readline().rstrip(' ')
def get_line(self):
if self.head == None:
self.head = self.readline().rstrip(' ')
return self.head.get.line()
else:
return self.head.get.line()
def close (self):
self.logfile.close()
I have begun to edit my second class by adding the get_line function. Don't know if I'm on the right track.
In simpler terms, I need the head to become "LogLine"
It is okay to use one class from another class. You have one class that parses a single line from a log file and builds an object that represents the line; and you have another class that reads lines from a log file. It would be very natural for the second class to call the first class.
Here is a very simple class that reads all lines from a log file and builds a list:
class LogFile(object):
def __init__(self,filepath):
with open(filepath, "r") as f:
self.lst = [LogLine(line) for line in f]
You can see that self.lst is being set to a list of lines from the input log file, but not just the text of the line; the code is calling LogLine(line) to store instances of LogLine. If you want, you can sort the list after you build it:
self.lst.sort(key=LogLine.get_line)
If the log files are very large, it might not be practical to build the list. You have a .get_line() method function, and we can use that:
class LogFile(object):
def __init__(self,filepath):
self.logfile = open(filepath, "r")
def get_line(self):
try:
line = next(self.logfile) # get next line from open file object
return LogLine(line)
except StopIteration: # next() raises this when you reach the end of the file
return None # return
def close(self):
self.logfile.close()
An open file object (returned by the open() function) can be iterated. We can call next() on this object and it will give us the next input line. When the end of file is reached, Python will raise StopIteration to signal the end of the file.
Here the code will catch the StopIteration exception and return None when the end of the log file is reached. But I think this isn't the best way to handle this problem. Let's make the LogFile class work in for loops and such:
class LogFile(object):
def __init__(self,filepath):
self.f = open(filepath)
def __next__(self): # Python 3.x needs this to be named "__next__"
try:
line = next(self.f)
return LogLine(line)
except StopIteration:
# when we reach the end of input, close the file object
self.f.close()
# re-raise the exception
raise
next = __next__ # Python 2.x needs this to be named "next"
A for loop in Python will repeatedly call the .__next__() method function (Python 3.x) or else the .next() method function (Python 2.x) until the StopIteration exception is raised. Here we have defined both method function names so this code should work in Python 2.x or in Python 3.x.
Now you can do this:
for ll in LogFile("some_log_file"):
... # do something with ll, which will always be a LogLine instance

Categories