I have a series of functions, where the result of one is fed into the next, plus other inputs:
result = function1(
function2(
function3(
function4(x, y, z),
a, b, c),
d, e, f),
g, h, i)
This is ugly and hard to understand. In particular it isn't obvious that function1 is actually the last one to be called.
How can this code be written nicer, in a Pythonic way?
I could e.g. assign intermediate results to variables:
j = function4(x, y, z)
k = function3(j, a, b, c)
l = function2(k, d, e, f)
result = function1(l, g, h, i)
But this also puts additional variables for things I don't need into the namespace, and may keep a large amount of memory from being freed – unless I add a del j, k, l, which is its own kind of ugly. Plus, I have to come up with names.
Or I could use the name of the final result also for the intermediate results:
result = function4(x, y, z)
result = function3(result, a, b, c)
result = function2(result, d, e, f)
result = function1(result, g, h, i)
The disadvantage here is that the same name is used for possibly wildly different things, which may make reading and debugging harder.
Then maybe _ or __?
__ = function4(x, y, z)
__ = function3(__, a, b, c)
__ = function2(__, d, e, f)
result = function1(__, g, h, i)
A bit better, but again not super clear. And I might have to add a del __ at the end.
Is there a better, more Pythonic way?
I think nesting function calls and assigning results to variables is not a bad solution, especially if you capture the whole in a single function with descriptive name. The name of the function is self-documenting, allows for reuse of the complex structure, and the local variables created in the function only exist for the lifetime of the function execution, so your namespace remains clean.
Having said that, depending on the type of the values being passed around, you could subclass whatever that type is and add the functions as methods on the class, which would allow you to chain the calls, which I'd consider very Pythonic if the superclass lends itself to it.
An example of that is the way the pandas library works and has changed recently. In the past, many pandas.DataFrame methods would take an optional parameter called inplace to allow a user to change this:
import pandas as pd
df = pd.DataFrame(my_data)
df = df.some_operation(arg1, arg2)
df = df.some_other_operation(arg3, arg4)
...
To this:
import pandas as pd
df = pd.DataFrame(my_data)
df.some_operation(arg1, arg2, inplace=True)
df.some_other_operation(arg3, arg4, inplace=True)
With inplace=True, the original DataFrame would be modified (or at least, the end result would suggest that's what happened), and in some cases it might still be returned, in other cases not at all. The idea being that it avoided the creation of additional data; also, many users would use multiple variables (df1, df2, etc.) to keep all the intermediate results around, often for no good reason.
However, as you may know that pattern is a bad idea in pandas for several reasons.
Instead, modern pandas methods consistently return the DataFrame itself, and inplace is being deprecated where it still exists. But because all functions now return the actual resulting DataFrame, this works:
import pandas as pd
df = pd.DataFrame(my_data)\
.some_operation(arg1, arg2)
.some_other_operation(arg3, arg4)
(Note that this originally worked for many cases already in pandas, but consistency is not the point here, it's the pattern that matters)
Your situation could be similar, depending on what you're passing exactly. From the example you provided, it's unclear what the types might be. But given that you consistently pass the function result as the first parameter, it seems likely that your implementation is something like:
def fun1(my_type_var, a, b, c):
# perform some operation on my_type_var and return result of same type
return operations_on(my_type_var, a, b, c)
def fun2(my_type_var, a, b, c):
# similar
return other_operations_on(my_type_var, a, b, c)
...
In that case, it would make more sense to:
class MyType:
def method1(self, a, b, c):
# perform some operation on self, then return the result, or self
mt = operations_on(self, a, b, c)
return mt
...
Because this would allow:
x = MyType().method1(arg1, arg2, arg3).method2(...)
And depending on what your type is exactly, you could choose to modify self and return that, or perform the operation with a new instance as a result like in the example above. What the better idea is depends on what you're doing, how your type works internally, etc.
Fantastic answer by #Grismar
One more option to consider, if the functions are things you are creating, instead of transformations of data via, e.g., numpy or pandas or whatever, is to create functions where each function itself, on the surface, is only one level deep.
This would be in line with how Robert "Bob" Martin advises to write functions in his book, Clean Code.
The idea is that with each function, you descend one level of abstraction.
Lacking the context for what you're doing, I can't demonstrate this convincingly, i.e., with function names that actually make sense. But mechanically, here is how it would look:
result = make_result(a,b,c,i,j,k,x,y,z)
def make_result(a,b,c,i,j,k,x,y,z):
return function1(make_input_for_1(a,b,c,i,j,k,x,y,z))
def make_input_for_1(...):
return function2(make_input_for_2(...)
and so on and so forth.
This only makes sense if the nesting of the functions is indeed an ever-more-detailed implementation of the general task.
You can create a class that aids in chaining these functions together. The constructor can apply a higher-order function to each input function to store the output of the function as an instance field and return self, then set each of those wrapped functions as methods of the instance. Something like this:
import functools
class Chainable():
def __init__(self, funcs):
self.value = None
def wrapper(func):
#functools.wraps(func)
def run(*args, **kwargs):
if self.value is None:
self.value = func(*args, **kwargs)
return self
else:
self.value = func(self.value, *args, **kwargs)
return self
return run
for func in funcs:
setattr(self, func.__qualname__, wrapper(func))
def f1(a, b):
return a + b
def f2(a, b):
return a * b
res = Chainable([f1, f2]).f1(1, 2).f2(5).value
print(res)
Related
I have a function called preset_parser in mylibrary.py that takes argument filename, i.e. preset_parser(filename), and returns a long list of variables, e.g.
def preset_parser(filename):
... defines variable values based on reading the file
return presetsdf, preset_name, preset_description, preset_instructions, preset_additional_notes, preset_placeholder, pre_user_input, post_user_input, prompt, engine, finetune_model, temperature, max_tokens, top_p, fp, pp, stop_sequence, echo_on, preset_pagetype, preset_db, user, organization
So then I call this function from many other programs, where I do this:
from mylibrary import presets_parser
presetsdf, preset_name, preset_description, preset_instructions, preset_additional_notes, preset_placeholder, pre_user_input, post_user_input, prompt, engine, finetune_model, temperature, max_tokens, top_p, fp, pp, stop_sequence, echo_on, preset_pagetype, preset_db, user, organization = presets_parser(filename)
This is redundant and fragile (it breaks if the list of variables changes). What is the better way to do it? I know it must be simple.
The "general" solution to your problem is to make a class.
class ParseResult:
def __init__(self, a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
There are several ways in Python to automate this pattern. If your return values have some natural sort of "sequencing", you might use namedtuple or its typed cousin NamedTuple.
ParseResult = namedtuple("ParseResult", 'a b c d')
or
class ParseResult(NamedTuple):
a: TypeOfA
b: TypeOfB
c: TypeOfC
d: TypeOfD
This creates a richer "tuple-like" construct with named arguments.
If your return values don't make sense as a tuple and don't have any sort of natural notion of sequence, you can use the more general-purpose dataclass decorator.
#dataclass(frozen=True)
class ParseResult:
a: TypeOfA
b: TypeOfB
c: TypeOfC
d: TypeOfD
Then, in any case, return a value of this new type which has rich names (and possibly types) for its values.
You could return a dictionary instead:
def myfunction():
...
mydict = {
'variable1': value1,
'variable2': value2,
'anothervariable': anothervalue,
'somevariable': somevalue,
}
return mydict
I could think of a couple better ways, but I'm not sure what your limitations are? You could create a class then return the instance you would need to initiate in the parser function:
class Preset:
__init__(self, **kwargs)
self.sdf
self.name
self.description
self.etc...
then in your function:
preset = Preset
preset.sdf = file_data[0] //however you've parsed you data?
preset.name = file_data[1]
or you could make it a dictionary:
preset {
sdf: file_data[0]
name: file_data[1]
}
not sure if you have some limitation on what is returned from your function?
How can I dynamically get the names and values of all arguments to a class method? (For debugging).
The following code works, but it would need to be repeated a few dozen times (one for each method). Is there a simpler, more Pythonic way to do this?
class Foo:
def foo(self, a, b):
myself = getattr(self, inspect.stack()[0][3])
argnames = inspect.getfullargspec(myself).args[1:]
d = {}
for argname in argnames:
d[argname] = locals()[argname]
log.debug(d)
That's six lines of code for something that should be a lot simpler.
Sure, I can hardcode the debugging code separately for each method, but it seems easier to use copy/paste. Besides, it's way too easy to leave out an argument or two when hardcoding, which could make the debugging more confusing.
I would also prefer to assign local variables instead of accessing the values using a kwargs dict, because the rest of the code (not shown) could get clunky real fast, and is partially copied/pasted.
What is the simplest way to do this?
An alternative:
from collections import OrderedDict
class Foo:
def foo(self, *args):
argnames = 'a b'.split()
kwargs = OrderedDict(zip(argnames, args))
log.debug(kwargs)
for argname, argval in kwargs.items():
locals()[argname] = argval
This saves one line per method, but at the expense of IDE autocompete/intellisense when calling the method.
As wpercy wrote, you can reduce the last three lines to a single line using a dict comprehension. The caveat is that it only works in some versions of Python.
However, in Python 3, a dict comprehension has its own namespace and locals wouldn't work. So a workaround is to put the locals func after the in:
from itertools import repeat
class Foo:
def foo(self, a, b):
myname = inspect.stack()[0][3]
argnames = inspect.getfullargspec(getattr(self, myname)).args[1:]
args = [(x, parent[x]) for x, parent in zip(argnames, repeat(locals()))]
log.debug('{}: {!s}'.format(myname, args))
This saves two lines per method.
I currently have the following structure:
Inside a class I need to handle several types of functions with two special variables and an arbitrary number of parameters. To wrap these for the methods I apply them on I scan the function signatures first (that works very reliable) and decide what the parameters and what my variables are.
I then bind them back with a lambda expression in the following way. Let func(x, *args) be my function, then I'll bind
f = lambda x, t: func(x=x, **func_parameter)
In the case that I get func(x, t, *args) I bind
f = lambda x, t: func(x=x, t=t, **func_parameter)
and similar if I have neither variables.
It is essential that I hand a function of the form f(x,t) to my methods inside that class.
I would like to use functools.partial for that - it is the more pythonic way to do it and the performance when executing is better (the function f is potentially called a couple of million times...). The problem that I have is that I don't know what to do if I have a basis function which is independent of one of the variables t and x, that's why I went with lambda functions at all, they just map the other variable 'blind'. It's still two function calls and while definitions with lambda and partial take the same time, execution is a lot faster with partial.
Does anyone knoe how to use partial in that case? Performance is kind of an issue here.
EDIT: A little later. I figured out that function evaluation with tuple arguments are faster than with keyword arguments, so that was a plus.
And then, in the end, as a user I would just take some of the guess work from Python, i.e. directly define
def func(x):
return 2*x
instead of
def func(x, a):
return a*x
And call it directly. In that way I can use the function directly. Second case would be if I implement the case where x and t are both present as partial mapping.
That might be a compromise.
You could write adapter classes that have an f(x,t) call signature. The result is similar to functools.partial but much more flexible. __call__ gives you a consistent call signature and lets you add, drop, and map parameters. Arguments can be fixed when an instance is made. It seems like it should execute as fast as a normal function, but I have no basis for that.
A toy version:
class Adapt:
'''return a function with call signature f(x,t)'''
def __init__(self, func, **kwargs):
self.func = func
self.kwargs = kwargs
def __call__(self, x, t):
# mapping magic goes here
return self.func(x, t, **self.kwargs)
#return self.func(a=x, b=t, **self.kwargs)
def f(a, b, c):
print(a, b, c)
Usage:
>>> f_xt = Adapt(f, c = 4)
>>> f_xt(3, 4)
3 4 4
>>>
Don't know how you could make that generic for arbitrary parameters and mappings, maybe someone will chime in with an idea or an edit.
So if you end up writing an adapter specific to each function, the function can be embedded in the class instead of an instance parameter.
class AdaptF:
'''return a function with call signature f(x,t)'''
def __init__(self, **kwargs):
self.kwargs = kwargs
def __call__(self, x, t):
'''does stuff with x and t'''
# mapping magic goes here
return self.func(a=x, b=t, **self.kwargs)
def func(self, a, b, c):
print(a, b, c)
>>> f_xt = AdaptF(c = 4)
>>> f_xt(x = 3, t = 4)
3 4 4
>>>
I just kinda made this up from stuff I have read so I don't know if it is viable. I feel like I should give credit to the source I read but for the life of me I can't find it - I probably saw it on a pyvideo.
.
Is there a Pythonic way to encapsulate a lazy function call, whereby on first use of the function f(), it calls a previously bound function g(Z) and on the successive calls f() returns a cached value?
Please note that memoization might not be a perfect fit.
I have:
f = g(Z)
if x:
return 5
elif y:
return f
elif z:
return h(f)
The code works, but I want to restructure it so that g(Z) is only called if the value is used. I don't want to change the definition of g(...), and Z is a bit big to cache.
EDIT: I assumed that f would have to be a function, but that may not be the case.
I'm a bit confused whether you seek caching or lazy evaluation. For the latter, check out the module lazy.py by Alberto Bertogli.
Try using this decorator:
class Memoize:
def __init__ (self, f):
self.f = f
self.mem = {}
def __call__ (self, *args, **kwargs):
if (args, str(kwargs)) in self.mem:
return self.mem[args, str(kwargs)]
else:
tmp = self.f(*args, **kwargs)
self.mem[args, str(kwargs)] = tmp
return tmp
(extracted from dead link: http://snippets.dzone.com/posts/show/4840 / https://web.archive.org/web/20081026130601/http://snippets.dzone.com/posts/show/4840)
(Found here: Is there a decorator to simply cache function return values? by Alex Martelli)
EDIT: Here's another in form of properties (using __get__) http://code.activestate.com/recipes/363602/
You can employ a cache decorator, let see an example
from functools import wraps
class FuncCache(object):
def __init__(self):
self.cache = {}
def __call__(self, func):
#wraps(func)
def callee(*args, **kwargs):
key = (args, str(kwargs))
# see is there already result in cache
if key in self.cache:
result = self.cache.get(key)
else:
result = func(*args, **kwargs)
self.cache[key] = result
return result
return callee
With the cache decorator, here you can write
my_cache = FuncCache()
#my_cache
def foo(n):
"""Expensive calculation
"""
sum = 0
for i in xrange(n):
sum += i
print 'called foo with result', sum
return sum
print foo(10000)
print foo(10000)
print foo(1234)
As you can see from the output
called foo with result 49995000
49995000
49995000
The foo will be called only once. You don't have to change any line of your function foo. That's the power of decorators.
There are quite a few decorators out there for memoization:
http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize
http://code.activestate.com/recipes/498110-memoize-decorator-with-o1-length-limited-lru-cache/
http://code.activestate.com/recipes/496879-memoize-decorator-function-with-cache-size-limit/
Coming up with a completely general solution is harder than you might think. For instance, you need to watch out for non-hashable function arguments and you need to make sure the cache doesn't grow too large.
If you're really looking for a lazy function call (one where the function is only actually evaluated if and when the value is needed), you could probably use generators for that.
EDIT: So I guess what you want really is lazy evaluation after all. Here's a library that's probably what you're looking for:
http://pypi.python.org/pypi/lazypy/0.5
Just for completness, here is a link for my lazy-evaluator decorator recipe:
https://bitbucket.org/jsbueno/metapython/src/f48d6bd388fd/lazy_decorator.py
Here's a pretty brief lazy-decorator, though it lacks using #functools.wraps (and actually returns an instance of Lazy plus some other potential pitfalls):
class Lazy(object):
def __init__(self, calculate_function):
self._calculate = calculate_function
def __get__(self, obj, _=None):
if obj is None:
return self
value = self._calculate(obj)
setattr(obj, self._calculate.func_name, value)
return value
# Sample use:
class SomeClass(object):
#Lazy
def someprop(self):
print 'Actually calculating value'
return 13
o = SomeClass()
o.someprop
o.someprop
Curious why you don't just use a lambda in this scenario?
f = lambda: g(z)
if x:
return 5
if y:
return f()
if z:
return h(f())
Even after your edit, and the series of comments with detly, I still don't really understand. In your first sentence, you say the first call to f() is supposed to call g(), but subsequently return cached values. But then in your comments, you say "g() doesn't get called no matter what" (emphasis mine). I'm not sure what you're negating: Are you saying g() should never be called (doesn't make much sense; why does g() exist?); or that g() might be called, but might not (well, that still contradicts that g() is called on the first call to f()). You then give a snippet that doesn't involve g() at all, and really doesn't relate to either the first sentence of your question, or to the comment thread with detly.
In case you go editing it again, here is the snippet I am responding to:
I have:
a = f(Z)
if x:
return 5
elif y:
return a
elif z:
return h(a)
The code works, but I want to
restructure it so that f(Z) is only
called if the value is used. I don't
want to change the definition of
f(...), and Z is a bit big to cache.
If that is really your question, then the answer is simply
if x:
return 5
elif y:
return f(Z)
elif z:
return h(f(Z))
That is how to achieve "f(Z) is only called if the value is used".
I don't fully understand "Z is a bit big to cache". If you mean there will be too many different values of Z over the course of program execution that memoization is useless, then maybe you have to resort to precalculating all the values of f(Z) and just looking them up at run time. If you can't do this (because you can't know the values of Z that your program will encounter) then you are back to memoization. If that's still too slow, then your only real option is to use something faster than Python (try Psyco, Cython, ShedSkin, or hand-coded C module).
I want to achieve the following:
Have a AxBxC matrix (where A,B,C are
integers).
Access that matrix not as matrix[a,
b, c] but as matrix[(a, b), c], this
is, I have two variables, var1 = (x,
y) and var2 = z and want access my
matrix as matrix[var1, var2].
How can this be done? I am using numpy matrix, if it makes any difference.
I know I could use matrix[var1[0], var1[1], var2], but if possible I'd like to know if there is any other more elegant way.
Thanks!
If var1 = (x,y), and var2 = z, you can use
matrix[var1][var2]
I think you can simply subclass the NumPy matrix type, with a new class of your own; and overload the __getitem__() nethod to accept a tuple. Something like this:
class SpecialMatrix(np.matrix):
def __getitem__(self, arg1, arg2, arg3=None):
try:
i, j = arg1
k = arg2
assert(arg3 is None)
x = super(SpecialMatrix, self).__getitem__(i, j, k)
except TypeError:
assert(arg3 is not None)
return super(SpecialMatrix, self).__getitem__(arg1, arg2, arg3)
And do something similar with __setitem__().
I'm not sure if __getitem__() takes multiple arguments like I'm showing here, or if it takes a tuple, or what. I don't have NumPy available as I write this answer, sorry.
EDIT: I re-wrote the example to use super() instead of directly calling the base class. It has been a while since I did anything with subclassing in Python.
EDIT: I just looked at the accepted answer. That's totally the way to do it. I'll leave this up in case anyone finds it educational, but the simple way is best.