Best practice for class used as functions collection with shared data - python

I have a code that gets some dirty data as input, then parse them, clean them, munge them, etc and then should return a value.
At the moment I structured it as a class where the __init__ method receives the input and calls the other methods in a giving sequence.
My class at the moment looks something like this:
class myProcedure:
def __init__(self, dirty_data, file_name):
self.variable1, self.variable2 = self.clean_data(dirty_data)
self.variable3 = self.get_data_from_file(file_name)
self.do_something()
def clean_data(self, dirty_data):
#clean the data
return variable1, variable2
def get_data_from_file(self, file_name):
#load some data
return loaded_data
def do_something(self):
#the interesting part goes here
self.result = the_result
Using a class instead of sparse functions allows to share data more easily. In my real code I have few tens of variable that get shared. The alternative would be to put them all in a dict or having each function to take 10-20 inputs. I find both this solutions a bit cumbersome
At the moment I must call it as:
useless_class_obj = myProcedure(dirty_data, file_name)
interesting_stuff = useless_class_obj.result
My concerns come form the fact that, once run, useless_class_obj does not have any purpose anymore and is just a useless piece of junk.
I think it would be more elegant to be able to use the class as:
interesting_stuff = myProcedure(dirty_data, file_name)
however this would require __init__ to return something different than self.
Is there a better way to do this?
Am I doing this in a bad or hard-to-read way?

Well... you could also do...
interesting_stuff = myProcedure(dirty_data, file_name).result

Related

Elegant way of doing post-processing using Python

Considering the following example of post-processing using inheritance in python (from this website):
import os
class FileCat(object):
def cat(self, filepath):
f = file(filepath)
lines = f.readlines()
f.close()
return lines
class FileCatNoEmpty(FileCat):
def cat(self, filepath):
lines = super(FileCatNoEmpty, self).cat(filepath)
nonempty_lines = [l for l in lines if l != '\n']
return nonempty_lines
Basically, when we are post-processing, we don't really care about the original invocation, we just want to work with the data returned by the function.
So ideally, in my opinion, there should be no need for us to have redeclare the original function signature, just to be able to forward it to the original function.
If FileCat class had 100 different functions (cat1,cat2,cat3,...) that returned the same type of data and we wanted to use a post-processed NoEmpty version, then we would have to define the same 100 functions signatures in FileCatNoEmpty just to forward the calls.
So the question is: Is there a more elegant way of solving this problem?
That is, something like the FileCatNoEmpty class that would automatically make available all methods from FileCat but that still allows us to process the returned value?
Something like
class FileCatNoEmpty(FileCat):
# Any method with whatever arguments
def f(self,args):
lines = super(FileCatNoEmpty, self).f(args)
nonempty_lines = [l for l in lines if l != '\n']
return nonempty_lines
Or maybe even another solution that does not uses inheritance.
Thanks!
This answer, using a wrapper class that receives the original one in the constructor (instead of inheriting from it), solves the problem:
https://stackoverflow.com/a/4723921/3444175

Call function from class without declaring name object

We have a Tree, each node is an object.
The function that this tree has are 3, add(x);getmin();getmax()
The tree works perfectly; for example if i write
a = Heap()
a.add(5)
a.add(15)
a.add(20)
a.getmin()
a.getmax()
the stack look like this [5,15,20], now if i call getmin() it will print min element = 5 and the stack will look like [15,20] and so on.
The problem comes now;
the professor asked us to submit two files which are already created: main.py and minmaxqueue.py
main.py starts like this from minmaxqueue import add, getmin, getmax, and then is has already a list of functions calls of the kind
add(5)
add(15)
add(20)
getmin()
getmax()
in order to make work my script i had to do a=Heap() and then call always a.add(x). Since the TA's are going to run the script from a common file, i cant modify main.py such that it creates an object a=Heap(). It should run directly with add(5) and not with a.add(5)
Is there a way to fix this?
You can modify your module to create a global Heap instance, and define functions that forward everything to that global instance. Like this:
class Heap(object):
# all of your existing code
_heap = Heap()
def add(n):
return _heap.add(n)
def getmin():
return _heap.getmin()
def getmax():
return _heap.getmax()
Or, slightly more briefly:
_heap = Heap()
add = _heap.add
getmin = _heap.getmin
getmax = _heap.getmax
If you look at the standard library, there are modules that do exactly this, like random. If you want to create multiple Random instances, you can; if you don't care about doing that, you can just call random.choice and it works on the hidden global instance.
Of course for Random it makes sense; for Heap, it's a lot more questionable. But if that's what the professor demands, what can you do?
You can use this function to do that more quickly:
def make_attrs_global(obj):
for attr in dir(obj):
if not attr.startswith('__'):
globals()[attr] = getattr(obj, attr)
It makes all attributes of obj defined in global scope.
Just put this code at the end of your minmaxqueue.py file:
a = Heap()
make_attrs_global(a)
Now you should be able to call add directly without a. This is ugly but well...

How to turn these functions generic

I wanted to shorten my code, since i`m having more functions like this. I was wondering if I could use getattr() to do something like this guy asked.
Well, here it goes what I`ve got:
def getAllMarkersFrom(db, asJSON=False):
'''Gets all markers from given database. Returns list or Json string'''
markers = []
for marker in db.markers.find():
markers.append(marker)
if not asJSON:
return markers
else:
return json.dumps(markers, default=json_util.default)
def getAllUsersFrom(db, asJSON=False):
'''Gets all users from given database. Returns list or Json string'''
users = []
for user in db.users.find():
users.append(user)
if not asJSON:
return users
else:
return json.dumps(users, default=json_util.default)
I`m using pymongo and flask helpers on JSON.
What I wanted is to make a single getAllFrom(x,db) function that accepts any type of object. I don`t know how to do this, but I wanted to call db.X.find() where X is passed through the function.
Well, there it is. Hope you can help me. Thank you!
There's hardly any real code in either of those functions. Half of each is a slow recreation of the list() constructor. Once you get rid of that, you're left with a conditional, which can easily be condensed to a single line. So:
def getAllUsersFrom(db, asJSON=False):
users = list(db.users.find())
return json.dumps(users, default=json_util.default) if asJSON else users
This seems simple enough to me to not bother refactoring. There are some commonalities between the two functions, but breaking them out wouldn't reduce the number of lines of code any further.
One direction for possible simplification, however, is to not pass in a flag to tell the function what format to return. Let the caller do that. If they want it as a list, there's list(). For JSON, you can provide your own helper function. So, just write your functions to return the desired iterator:
def getAllUsersFrom(db):
return db.users.find()
def getAllMarkersFrom(db):
return db.markers.find()
And the helper function to convert the result to JSON:
def to_json(cur):
return json.dumps(list(cur), default=json_util.default)
So then, putting it all together, you just call:
markers = list(getAllMarkersFrom(mydb))
or:
users = to_json(getAllUsersFrom(mydb))
As you need.
If you really want a generic function for requesting various types of records, that'd be:
def getAllRecordsFrom(db, kind):
return getattr(db, kind).find()
Then call it:
users = list(getAllRecordsFrom(mydb, "users"))
etc.
I would say that its better to have separate functions for each task. And then you can have decorators for common functionality between different functions. For example:
#to_json
def getAllUsersFrom(db):
return list(db.users.find())
enjoy!

How can I get Python jsonpickle to work recursively?

I'm having trouble getting Python's jsonpickle 0.4.0 to "recurse" in to custom objects that contain custom objects. Here's sample code that shows my problem.
import jsonpickle
import jsonpickle.handlers
class Ball(object):
def __init__(self, color):
self.color = color
class Box(object):
def __init__(self, *args):
self.contents = args
class BallHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
data['color'] = obj.color
return data
class BoxHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
data['contents'] = obj.contents
return data
jsonpickle.handlers.registry.register(Ball, BallHandler)
jsonpickle.handlers.registry.register(Box, BoxHandler)
# works OK -- correctly prints: {"color": "white"}
white_ball = Ball('white')
print jsonpickle.encode(white_ball, unpicklable=False)
# works OK -- correctly prints: [{"color": "white"}, {"color": "green"}]
green_ball = Ball('green')
balls = [white_ball, green_ball]
print jsonpickle.encode(balls, unpicklable=False)
# works OK -- correctly prints: {"contents": [1, 2, 3, 4]}
box_1 = Box(1, 2, 3, 4)
print jsonpickle.encode(box_1, unpicklable=False)
# dies with "Ball object is not JSON serializable"
box_2 = Box(white_ball, green_ball)
print jsonpickle.encode(box_2, unpicklable=False)
Balls have "color", Boxes have "contents". If I have a [native] array of Balls, then jsonpickle works. If I have a Box of [native] ints, then jsonpickle works.
But if I have a Box of Balls, jsonpickle bombs with "Ball object is not JSON serializable".
From the stacktrace, I have the hunch that the encoder is leaving jsonpickle and going off to some other JSON library... that apparently doesn't know that I've registered the BallHandler.
How can I fix this up?
By the way, my sample is NOT expressly using any part of Django, but I will be needing this to work in a Django app.
THANKS IN ADVANCE FOR ANY INPUT!
I think you can call back to the pickling context to continue the pickling.
class BoxHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
return [self.context.flatten(x,reset=False) for x in obj.contents]
This seems to be similar to how the built in _list_recurse() function handles this case in pickler.py:44, as flatten() just calls self._flatten (after optionally resetting the state variables).
def _list_recurse(self, obj):
return [self._flatten(v) for v in obj]
I'm just testing on this now, and the _depth seems to be maintained as expected.
First, why are you creating custom handlers in the first place? You're attempting to do exactly the same thing the default handlers already do. Remove those two register lines and call encode with and without unpicklable=False with all of those objects, and you will get the same results—except that it will work exactly the way you want with boxes full of balls, instead of failing.
If you look through the tutorial, API, test cases, and samples, they never create a custom handler to simulate a collection like this. (For example, take a look at the Node/Document/Section classes in the test suites (samples.py and document_test.py).) So, I think you're trying to do something that you weren't expected to do, and isn't intended to be doable.
But, let's look at your actual question: Why doesn't it work?
Well, that one's easy. You're doing it wrong. According to the documentation for BaseHandler.flatten, you're supposed to:
Flatten obj into a json-friendly form.
So, given this:
class BoxHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
data['contents'] = obj.contents
return data
You're effectively promising that obj.contents is in JSON-friendly form. But it's not; it's a list of Ball objects.
So, what's the right answer? Well, you could flatten each element in contents the same way you're being flattened. You'd think there must be some easy way to do that, but honestly, I don't see anything in the API, docs, samples, or unit tests, so I guess there isn't, so you'll have to do it manually. Presumably something like this (untested):
class BoxHandler(jsonpickle.handlers.BaseHandler):
def flatten(self, obj, data):
p = jsonpickle.Pickler()
data['contents'] = [p.flatten(elem) for elem in obj.contents]
return data
But… since you're not getting the same Pickler that's being used to pickle you—and I don't see any way that you can—this is likely going to violate the maxdepth and unpicklable parameters of encode.
So, maybe there is no right way to do this.
Looks like a bug to me and a principle one at that. If jsonpickle is about adding custom objects handling to json, it should integrate into the latter rather than attempting to "preprocess" the content for it. The present state of demanding users to handle this themselves in whatever way as abarnert said is laying blame on another's door IMO.
If I were you, I'd go for fixing this myself or make my objects JSON-friendly as they are - e.g. making them look like native Python data structures (which JSON is an alternative representation of). An easier way is to avoid such constructs which is a kludge, of course.

Overwriting class methods without inheritance (python)

First, if you guys think the way I'm trying to do things is not Pythonic, feel free to offer alternative suggestions.
I have an object whose functionality needs to change based on outside events. What I've been doing originally is create a new object that inherits from original (let's call it OrigObject()) and overwrites the methods that change (let's call the new object NewObject()). Then I modified both constructors such that they can take in a complete object of the other type to fill in its own values based on the passed in object. Then when I'd need to change functionality, I'd just execute myObject = NewObject(myObject).
I'm starting to see several problems with that approach now. First of all, other places that reference the object need to be updated to reference the new type as well (the above statement, for example, would only update the local myObject variable). But that's not hard to update, only annoying part is remembering to update it in other places each time I change the object in order to prevent weird program behavior.
Second, I'm noticing scenarios where I need a single method from NewObject(), but the other methods from OrigObject(), and I need to be able to switch the functionality on the fly. It doesn't seem like the best solution anymore to be using inheritance, where I'd need to make M*N different classes (where M is the number of methods the class has that can change, and N is the number of variations for each method) that inherit from OrigObject().
I was thinking of using attribute remapping instead, but I seem to be running into issues with it. For example, say I have something like this:
def hybrid_type2(someobj, a):
#do something else
...
class OrigObject(object):
...
def hybrid_fun(self, a):
#do something
...
def switch(type):
if type == 1:
self.hybrid_fun = OrigObject.hybrid_fun
else:
self.fybrid_fun = hybrid_type2
Problem is, after doing this and trying to call the new hybrid_fun after switching it, I get an error saying that hybrid_type2() takes exactly 2 arguments, but I'm passing it one. The object doesn't seem to be passing itself as an argument to the new function anymore like it does with its own methods, anything I can do to remedy that?
I tried including hybrid_type2 inside the class as well and then using self.hybrid_fun = self.hybrid_type2 works, but using self.hybrid_fun = OrigObject.hybrid_fun causes a similar error (complaining that the first argument should be of type OrigObject). I know I can instead define OrigObject.hybrid_fun() logic inside OrigObject.hybrid_type1() so I can revert it back the same way I'm setting it (relative to the instance, rather than relative to the class to avoid having object not be the first argument). But I wanted to ask here if there is a cleaner approach I'm not seeing here? Thanks
EDIT:
Thanks guys, I've given points for several of the solutions that worked well. I essentially ended up using a Strategy pattern using types.MethodType(), I've accepted the answer that explained how to do the Strategy pattern in python (the Wikipedia article was more general, and the use of interfaces is not needed in Python).
Use the types module to create an instance method for a particular instance.
eg.
import types
def strategyA(possible_self):
pass
instance = OrigObject()
instance.strategy = types.MethodType(strategyA, instance)
instance.strategy()
Note that this only effects this specific instance, no other instances will be effected.
You want the Strategy Pattern.
Read about descriptors in Python. The next code should work:
else:
self.fybrid_fun = hybrid_type2.__get__(self, OrigObject)
What about defining it like so:
def hybrid_type2(someobj, a):
#do something else
...
def hybrid_type1(someobj, a):
#do something
...
class OrigObject(object):
def __init__(self):
...
self.run_the_fun = hybrid_type1
...
def hybrid_fun(self, a):
self.run_the_fun(self, a)
def type_switch(self, type):
if type == 1:
self.run_the_fun = hybrid_type1
else:
self.run_the_fun = hybrid_type2
You can change class at runtime:
class OrigObject(object):
...
def hybrid_fun(self, a):
#do something
...
def switch(self):
self.__class__ = DerivedObject
class DerivedObject(OrigObject):
def hybrid_fun(self, a):
#do the other thing
...
def switch(self):
self.__class__ = OrigObject

Categories