Elegant way of doing post-processing using Python - python

Considering the following example of post-processing using inheritance in python (from this website):
import os
class FileCat(object):
def cat(self, filepath):
f = file(filepath)
lines = f.readlines()
f.close()
return lines
class FileCatNoEmpty(FileCat):
def cat(self, filepath):
lines = super(FileCatNoEmpty, self).cat(filepath)
nonempty_lines = [l for l in lines if l != '\n']
return nonempty_lines
Basically, when we are post-processing, we don't really care about the original invocation, we just want to work with the data returned by the function.
So ideally, in my opinion, there should be no need for us to have redeclare the original function signature, just to be able to forward it to the original function.
If FileCat class had 100 different functions (cat1,cat2,cat3,...) that returned the same type of data and we wanted to use a post-processed NoEmpty version, then we would have to define the same 100 functions signatures in FileCatNoEmpty just to forward the calls.
So the question is: Is there a more elegant way of solving this problem?
That is, something like the FileCatNoEmpty class that would automatically make available all methods from FileCat but that still allows us to process the returned value?
Something like
class FileCatNoEmpty(FileCat):
# Any method with whatever arguments
def f(self,args):
lines = super(FileCatNoEmpty, self).f(args)
nonempty_lines = [l for l in lines if l != '\n']
return nonempty_lines
Or maybe even another solution that does not uses inheritance.
Thanks!

This answer, using a wrapper class that receives the original one in the constructor (instead of inheriting from it), solves the problem:
https://stackoverflow.com/a/4723921/3444175

Related

Best practice for class used as functions collection with shared data

I have a code that gets some dirty data as input, then parse them, clean them, munge them, etc and then should return a value.
At the moment I structured it as a class where the __init__ method receives the input and calls the other methods in a giving sequence.
My class at the moment looks something like this:
class myProcedure:
def __init__(self, dirty_data, file_name):
self.variable1, self.variable2 = self.clean_data(dirty_data)
self.variable3 = self.get_data_from_file(file_name)
self.do_something()
def clean_data(self, dirty_data):
#clean the data
return variable1, variable2
def get_data_from_file(self, file_name):
#load some data
return loaded_data
def do_something(self):
#the interesting part goes here
self.result = the_result
Using a class instead of sparse functions allows to share data more easily. In my real code I have few tens of variable that get shared. The alternative would be to put them all in a dict or having each function to take 10-20 inputs. I find both this solutions a bit cumbersome
At the moment I must call it as:
useless_class_obj = myProcedure(dirty_data, file_name)
interesting_stuff = useless_class_obj.result
My concerns come form the fact that, once run, useless_class_obj does not have any purpose anymore and is just a useless piece of junk.
I think it would be more elegant to be able to use the class as:
interesting_stuff = myProcedure(dirty_data, file_name)
however this would require __init__ to return something different than self.
Is there a better way to do this?
Am I doing this in a bad or hard-to-read way?
Well... you could also do...
interesting_stuff = myProcedure(dirty_data, file_name).result

Is there a way to append the name of a function to a list automatically?

The idea is that when a new function is written, it's variable name is appended to a list automatically.
Just to note, I realise I can just use mylist.append(whatever) but I'm specifically looking for a way to automatically append, rather than manually.
So, if we start with...
def function1(*args):
print "string"
def function2(*args):
print "string 2"
mylist = []
...is there a way to append 'function1' and 'function2' to mylist automatically so that it would end up like this...
mylist = [function1, function2]
Specifically, I'd like to have the variable name listed, not a string (e.g. "function1").
I'm learning Python and just experimenting, so this doesn't serve any particular purpose at the moment, I just want to know if it's possible.
Thanks in advance for any suggestions and happy answer any questions if I've not been clear.
**
Just add the function object to the list:
mylist = [function1, function2]
or use .append():
mylist.append(function1)
mylist.append(function2)
Python functions are first-class objects. They are values, just like classes and strings and integers.
If you want to automate this for a whole module, you can use the globals() function to quickly list all functions defined in the module so far, with a little help from the inspect.isfunction() predicate:
import inspect
mylist = [v for v globals().itervalues() if inspect.isfunction(v) and v.__module__ == __name__]
The v.__module__ == __name__ test ensures we only list functions from the current module, not anything we imported.
However, explicit is still better than implicit. Either add mylist.append(functionname) below each function, or use a decorator:
mylist = []
def listed(func):
mylist.append(func)
return func
#listed
def function1():
pass
#listed
def function2():
pass
Each function you 'mark' with the #listed decorator is added to the mylist list.
In principle, you could do that with a decorator, which would probably qualify as a semi-automatic solution:
#gather
def function1():
print "function 1"
#gather
def function2():
print "function 2"
One implementation of such a decorator is essentially a function which gets a function as a parameter:
function_list = []
def gather(func):
function_list.append(func) # or .append(func.__name__)
return func
In this simple incarnation it is probably not useful at all, but popular libraries and frameworks often employ a somewhat enhanced version of this technique. As an example, see the Flask's #app.route decorator for specifying functions that handle specific HTTP requests.

How to turn these functions generic

I wanted to shorten my code, since i`m having more functions like this. I was wondering if I could use getattr() to do something like this guy asked.
Well, here it goes what I`ve got:
def getAllMarkersFrom(db, asJSON=False):
'''Gets all markers from given database. Returns list or Json string'''
markers = []
for marker in db.markers.find():
markers.append(marker)
if not asJSON:
return markers
else:
return json.dumps(markers, default=json_util.default)
def getAllUsersFrom(db, asJSON=False):
'''Gets all users from given database. Returns list or Json string'''
users = []
for user in db.users.find():
users.append(user)
if not asJSON:
return users
else:
return json.dumps(users, default=json_util.default)
I`m using pymongo and flask helpers on JSON.
What I wanted is to make a single getAllFrom(x,db) function that accepts any type of object. I don`t know how to do this, but I wanted to call db.X.find() where X is passed through the function.
Well, there it is. Hope you can help me. Thank you!
There's hardly any real code in either of those functions. Half of each is a slow recreation of the list() constructor. Once you get rid of that, you're left with a conditional, which can easily be condensed to a single line. So:
def getAllUsersFrom(db, asJSON=False):
users = list(db.users.find())
return json.dumps(users, default=json_util.default) if asJSON else users
This seems simple enough to me to not bother refactoring. There are some commonalities between the two functions, but breaking them out wouldn't reduce the number of lines of code any further.
One direction for possible simplification, however, is to not pass in a flag to tell the function what format to return. Let the caller do that. If they want it as a list, there's list(). For JSON, you can provide your own helper function. So, just write your functions to return the desired iterator:
def getAllUsersFrom(db):
return db.users.find()
def getAllMarkersFrom(db):
return db.markers.find()
And the helper function to convert the result to JSON:
def to_json(cur):
return json.dumps(list(cur), default=json_util.default)
So then, putting it all together, you just call:
markers = list(getAllMarkersFrom(mydb))
or:
users = to_json(getAllUsersFrom(mydb))
As you need.
If you really want a generic function for requesting various types of records, that'd be:
def getAllRecordsFrom(db, kind):
return getattr(db, kind).find()
Then call it:
users = list(getAllRecordsFrom(mydb, "users"))
etc.
I would say that its better to have separate functions for each task. And then you can have decorators for common functionality between different functions. For example:
#to_json
def getAllUsersFrom(db):
return list(db.users.find())
enjoy!

Overwriting class methods without inheritance (python)

First, if you guys think the way I'm trying to do things is not Pythonic, feel free to offer alternative suggestions.
I have an object whose functionality needs to change based on outside events. What I've been doing originally is create a new object that inherits from original (let's call it OrigObject()) and overwrites the methods that change (let's call the new object NewObject()). Then I modified both constructors such that they can take in a complete object of the other type to fill in its own values based on the passed in object. Then when I'd need to change functionality, I'd just execute myObject = NewObject(myObject).
I'm starting to see several problems with that approach now. First of all, other places that reference the object need to be updated to reference the new type as well (the above statement, for example, would only update the local myObject variable). But that's not hard to update, only annoying part is remembering to update it in other places each time I change the object in order to prevent weird program behavior.
Second, I'm noticing scenarios where I need a single method from NewObject(), but the other methods from OrigObject(), and I need to be able to switch the functionality on the fly. It doesn't seem like the best solution anymore to be using inheritance, where I'd need to make M*N different classes (where M is the number of methods the class has that can change, and N is the number of variations for each method) that inherit from OrigObject().
I was thinking of using attribute remapping instead, but I seem to be running into issues with it. For example, say I have something like this:
def hybrid_type2(someobj, a):
#do something else
...
class OrigObject(object):
...
def hybrid_fun(self, a):
#do something
...
def switch(type):
if type == 1:
self.hybrid_fun = OrigObject.hybrid_fun
else:
self.fybrid_fun = hybrid_type2
Problem is, after doing this and trying to call the new hybrid_fun after switching it, I get an error saying that hybrid_type2() takes exactly 2 arguments, but I'm passing it one. The object doesn't seem to be passing itself as an argument to the new function anymore like it does with its own methods, anything I can do to remedy that?
I tried including hybrid_type2 inside the class as well and then using self.hybrid_fun = self.hybrid_type2 works, but using self.hybrid_fun = OrigObject.hybrid_fun causes a similar error (complaining that the first argument should be of type OrigObject). I know I can instead define OrigObject.hybrid_fun() logic inside OrigObject.hybrid_type1() so I can revert it back the same way I'm setting it (relative to the instance, rather than relative to the class to avoid having object not be the first argument). But I wanted to ask here if there is a cleaner approach I'm not seeing here? Thanks
EDIT:
Thanks guys, I've given points for several of the solutions that worked well. I essentially ended up using a Strategy pattern using types.MethodType(), I've accepted the answer that explained how to do the Strategy pattern in python (the Wikipedia article was more general, and the use of interfaces is not needed in Python).
Use the types module to create an instance method for a particular instance.
eg.
import types
def strategyA(possible_self):
pass
instance = OrigObject()
instance.strategy = types.MethodType(strategyA, instance)
instance.strategy()
Note that this only effects this specific instance, no other instances will be effected.
You want the Strategy Pattern.
Read about descriptors in Python. The next code should work:
else:
self.fybrid_fun = hybrid_type2.__get__(self, OrigObject)
What about defining it like so:
def hybrid_type2(someobj, a):
#do something else
...
def hybrid_type1(someobj, a):
#do something
...
class OrigObject(object):
def __init__(self):
...
self.run_the_fun = hybrid_type1
...
def hybrid_fun(self, a):
self.run_the_fun(self, a)
def type_switch(self, type):
if type == 1:
self.run_the_fun = hybrid_type1
else:
self.run_the_fun = hybrid_type2
You can change class at runtime:
class OrigObject(object):
...
def hybrid_fun(self, a):
#do something
...
def switch(self):
self.__class__ = DerivedObject
class DerivedObject(OrigObject):
def hybrid_fun(self, a):
#do the other thing
...
def switch(self):
self.__class__ = OrigObject

pythonic way to rewrite an assignment in an if statement

Is there a pythonic preferred way to do this that I would do in C++:
for s in str:
if r = regex.match(s):
print r.groups()
I really like that syntax, imo it's a lot cleaner than having temporary variables everywhere. The only other way that's not overly complex is
for s in str:
r = regex.match(s)
if r:
print r.groups()
I guess I'm complaining about a pretty pedantic issue. I just miss the former syntax.
How about
for r in [regex.match(s) for s in str]:
if r:
print r.groups()
or a bit more functional
for r in filter(None, map(regex.match, str)):
print r.groups()
Perhaps it's a bit hacky, but using a function object's attributes to store the last result allows you to do something along these lines:
def fn(regex, s):
fn.match = regex.match(s) # save result
return fn.match
for s in strings:
if fn(regex, s):
print fn.match.groups()
Or more generically:
def cache(value):
cache.value = value
return value
for s in strings:
if cache(regex.match(s)):
print cache.value.groups()
Note that although the "value" saved can be a collection of a number of things, this approach is limited to holding only one such at a time, so more than one function may be required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. So, in accordance with the DRY principle, rather than writing each one, a factory function can help:
def Cache():
def cache(value):
cache.value = value
return value
return cache
cache1 = Cache()
for s in strings:
if cache1(regex.match(s)):
# use another at same time
cache2 = Cache()
if cache2(somethingelse) != cache1.value:
process(cache2.value)
print cache1.value.groups()
...
There's a recipe to make an assignment expression but it's very hacky. Your first option doesn't compile so your second option is the way to go.
## {{{ http://code.activestate.com/recipes/202234/ (r2)
import sys
def set(**kw):
assert len(kw)==1
a = sys._getframe(1)
a.f_locals.update(kw)
return kw.values()[0]
#
# sample
#
A=range(10)
while set(x=A.pop()):
print x
## end of http://code.activestate.com/recipes/202234/ }}}
As you can see, production code shouldn't touch this hack with a ten foot, double bagged stick.
This might be an overly simplistic answer, but would you consider this:
for s in str:
if regex.match(s):
print regex.match(s).groups()
There is no pythonic way to do something that is not pythonic. It's that way for a reason, because 1, allowing statements in the conditional part of an if statement would make the grammar pretty ugly, for instance, if you allowed assignment statements in if conditions, why not also allow if statements? how would you actually write that? C like languages don't have this problem, because they don't have assignment statements. They make do with just assignment expressions and expression statements.
the second reason is because of the way
if foo = bar:
pass
looks very similar to
if foo == bar:
pass
even if you are clever enough to type the correct one, and even if most of the members on your team are sharp enough to notice it, are you sure that the one you are looking at now is exactly what is supposed to be there? it's not unreasonable for a new dev to see this and just fix it (one way or the other) and now its definitely wrong.
Whenever I find that my loop logic is getting complex I do what I would with any other bit of logic: I extract it to a function. In Python it is a lot easier than some other languages to do this cleanly.
So extract the code that just generates the items of interest:
def matching(strings, regex):
for s in strings:
r = regex.match(s)
if r: yield r
and then when you want to use it, the loop itself is as simple as they get:
for r in matching(strings, regex):
print r.groups()
Yet another answer is to use the "Assign and test" recipe for allowing assigning and testing in a single statement published in O'Reilly Media's July 2002 1st edition of the Python Cookbook and also online at Activestate. It's object-oriented, the crux of which is this:
# from http://code.activestate.com/recipes/66061
class DataHolder:
def __init__(self, value=None):
self.value = value
def set(self, value):
self.value = value
return value
def get(self):
return self.value
This can optionally be modified slightly by adding the custom __call__() method shown below to provide an alternative way to retrieve instances' values -- which, while less explicit, seems like a completely logical thing for a 'DataHolder' object to do when called, I think.
def __call__(self):
return self.value
Allowing your example to be re-written:
r = DataHolder()
for s in strings:
if r.set(regex.match(s))
print r.get().groups()
# or
print r().groups()
As also noted in the original recipe, if you use it a lot, adding the class and/or an instance of it to the __builtin__ module to make it globally available is very tempting despite the potential downsides:
import __builtin__
__builtin__.DataHolder = DataHolder
__builtin__.data = DataHolder()
As I mentioned in my other answer to this question, it must be noted that this approach is limited to holding only one result/value at a time, so more than one instance is required to handle situations where multiple values need to be saved simultaneously, such as in nested function calls, loops or other threads. That doesn't mean you should use it or the other answer, just that more effort will be required.

Categories