Python decorator with multiprocessing fails - python

I would like to use a decorator on a function that I will subsequently pass to a multiprocessing pool. However, the code fails with "PicklingError: Can't pickle : attribute lookup __builtin__.function failed". I don't quite see why it fails here. I feel certain that it's something simple, but I can't find it. Below is a minimal "working" example. I thought that using the functools function would be enough to let this work.
If I comment out the function decoration, it works without an issue. What is it about multiprocessing that I'm misunderstanding here? Is there any way to make this work?
Edit: After adding both a callable class decorator and a function decorator, it turns out that the function decorator works as expected. The callable class decorator continues to fail. What is it about the callable class version that keeps it from being pickled?
import random
import multiprocessing
import functools
class my_decorator_class(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, elements):
f = []
for element in elements:
f.append(self.target([element])[0])
return f
def my_decorator_function(target):
#functools.wraps(target)
def inner(elements):
f = []
for element in elements:
f.append(target([element])[0])
return f
return inner
#my_decorator_function
def my_func(elements):
f = []
for element in elements:
f.append(sum(element))
return f
if __name__ == '__main__':
elements = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([e],)) for e in elements]
pool.close()
f = [r.get()[0] for r in results]
print(f)

The problem is that pickle needs to have some way to reassemble everything that you pickle. See here for a list of what can be pickled:
http://docs.python.org/library/pickle.html#what-can-be-pickled-and-unpickled
When pickling my_func, the following components need to be pickled:
An instance of my_decorator_class, called my_func.
This is fine. Pickle will store the name of the class and pickle its __dict__ contents. When unpickling, it uses the name to find the class, then creates an instance and fills in the __dict__ contents. However, the __dict__ contents present a problem...
The instance of the original my_func that's stored in my_func.target.
This isn't so good. It's a function at the top-level, and normally these can be pickled. Pickle will store the name of the function. The problem, however, is that the name "my_func" is no longer bound to the undecorated function, it's bound to the decorated function. This means that pickle won't be able to look up the undecorated function to recreate the object. Sadly, pickle doesn't have any way to know that object it's trying to pickle can always be found under the name __main__.my_func.
You can change it like this and it will work:
import random
import multiprocessing
import functools
class my_decorator(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, candidates, args):
f = []
for candidate in candidates:
f.append(self.target([candidate], args)[0])
return f
def old_my_func(candidates, args):
f = []
for c in candidates:
f.append(sum(c))
return f
my_func = my_decorator(old_my_func)
if __name__ == '__main__':
candidates = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([c], {})) for c in candidates]
pool.close()
f = [r.get()[0] for r in results]
print(f)
You have observed that the decorator function works when the class does not. I believe this is because functools.wraps modifies the decorated function so that it has the name and other properties of the function it wraps. As far as the pickle module can tell, it is indistinguishable from a normal top-level function, so it pickles it by storing its name. Upon unpickling, the name is bound to the decorated function so everything works out.

I also had some problem using decorators in multiprocessing. I'm not sure if it's the same problem as yours:
My code looked like this:
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
#decorate_func
def actual_func(x):
return x ** 2
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(actual_func,(2,))
print result.get()
and when I run the code I get this:
Traceback (most recent call last):
File "test.py", line 15, in <module>
print result.get()
File "somedirectory_too_lengthy_to_put_here/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I fixed it by defining a new function to wrap the function in the decorator function, instead of using the decorator syntax
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
def actual_func(x):
return x ** 2
def wrapped_func(*args, **kwargs):
return decorate_func(actual_func)(*args, **kwargs)
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(wrapped_func,(2,))
print result.get()
The code ran perfectly and I got:
I'm decorating
4
I'm not very experienced at Python, but this solution solved my problem for me

If you want the decorators too bad (like me), you can also use the exec() command on the function string, to circumvent the mentioned pickling.
I wanted to be able to pass all the arguments to an original function and then use them successively. The following is my code for it.
At first, I made a make_functext() function to convert the target function object to a string. For that, I used the getsource() function from the inspect module (see doctumentation here and note that it can't retrieve source code from compiled code etc.). Here it is:
from inspect import getsource
def make_functext(func):
ft = '\n'.join(getsource(func).split('\n')[1:]) # Removing the decorator, of course
ft = ft.replace(func.__name__, 'func') # Making function callable with 'func'
ft = ft.replace('#§ ', '').replace('#§', '') # For using commented code starting with '#§'
ft = ft.strip() # In case the function code was indented
return ft
It is used in the following _worker() function that will be the target of the processes:
def _worker(functext, args):
scope = {} # This is needed to keep executed definitions
exec(functext, scope)
scope['func'](args) # Using func from scope
And finally, here's my decorator:
from multiprocessing import Process
def parallel(num_processes, **kwargs):
def parallel_decorator(func, num_processes=num_processes):
functext = make_functext(func)
print('This is the parallelized function:\n', functext)
def function_wrapper(funcargs, num_processes=num_processes):
workers = []
print('Launching processes...')
for k in range(num_processes):
p = Process(target=_worker, args=(functext, funcargs[k])) # use args here
p.start()
workers.append(p)
return function_wrapper
return parallel_decorator
The code can finally be used by defining a function like this:
#parallel(4)
def hello(args):
#§ from time import sleep # use '#§' to avoid unnecessary (re)imports in main program
name, seconds = tuple(args) # unpack args-list here
sleep(seconds)
print('Hi', name)
... which can now be called like this:
hello([['Marty', 0.5],
['Catherine', 0.9],
['Tyler', 0.7],
['Pavel', 0.3]])
... which outputs:
This is the parallelized function:
def func(args):
from time import sleep
name, seconds = tuple(args)
sleep(seconds)
print('Hi', name)
Launching processes...
Hi Pavel
Hi Marty
Hi Tyler
Hi Catherine
Thanks for reading, this is my very first post. If you find any mistakes or bad practices, feel free to leave a comment. I know that these string conversions are quite dirty, though...

If you use this code for your decorator:
import multiprocessing
from types import MethodType
DEFAULT_POOL = []
def run_parallel(_func=None, *, name: str = None, context_pool: list = DEFAULT_POOL):
class RunParallel:
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
process = multiprocessing.Process(target=self.func, name=name, args=args, kwargs=kwargs)
context_pool.append(process)
process.start()
def __get__(self, instance, owner):
return self if instance is None else MethodType(self, instance)
if _func is None:
return RunParallel
else:
return RunParallel(_func)
def wait_context(context_pool: list = DEFAULT_POOL, kill_others_if_one_fails: bool = False):
finished = []
for process in context_pool:
process.join()
finished.append(process)
if kill_others_if_one_fails and process.exitcode != 0:
break
if kill_others_if_one_fails:
# kill unfinished processes
for process in context_pool:
if process not in finished:
process.kill()
# wait for every process to be dead
for process in context_pool:
process.join()
Then you can use it like this, in these 4 examples:
#run_parallel
def m1(a, b="b"):
print(f"m1 -- {a=} {b=}")
#run_parallel(name="mym2", context_pool=DEFAULT_POOL)
def m2(d, cc="cc"):
print(f"m2 -- {d} {cc=}")
a = 1/0
class M:
#run_parallel
def c3(self, k, n="n"):
print(f"c3 -- {k=} {n=}")
#run_parallel(name="Mc4", context_pool=DEFAULT_POOL)
def c4(self, x, y="y"):
print(f"c4 -- {x=} {y=}")
if __name__ == "__main__":
m1(11)
m2(22)
M().c3(33)
M().c4(44)
wait_context(kill_others_if_one_fails=True)
The output will be:
m1 -- a=11 b='b'
m2 -- 22 cc='cc'
c3 -- k=33 n='n'
(followed by the exception raised in method m2)

Related

Call class in function during multiprocessing

I am trying to modify a existing big code to a multiprocessing way. I simplied the question.
Class A is a big external class which I cannot change. I would like to run the class in different cores so I use multiprocessing.Pool. But I got an error as AttributeError: 'int' object has no attribute 'fun' of the line return input_class.fun(x)
How can I fix the problem?
from multiprocessing import Pool
from functools import partial
class A(object):
def __init__(self,value):
self.value = value
def fun(self,x):
return self.value**x
def B(x,input_class):
return input_class.fun(x)
if __name__ == '__main__':
l = range(10)
p = Pool(4)
input_class = A(3)
input_function = partial(B,input_class)
op = p.map(input_function,l)
print(op)
p.close()
p.join()
Your usage of partial doesn't do what you want:
input_function = partial(B,input_class)
This is equivalent to input_function = lambda input_class__: B(input_class, input_class__). So, the first argument of the function is now equal to input_class, not the second one, and now calls to input_function will feed integers as the second argument to B, so you'll be getting <integer>.fun(<input_class>).
Change this to:
input_function = partial(B, input_class=input_class)

Multiprocessing pool: How to call an arbitrary list of methods on a list of class objects

A cleaned up version of the code including the solution to the problem (thanks #JohanL!) can be found as a Gist on GitHub.
The following code snipped (CPython 3.[4,5,6]) illustrates my intention (as well as my problem):
from functools import partial
import multiprocessing
from pprint import pprint as pp
NUM_CORES = multiprocessing.cpu_count()
class some_class:
some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
def run_routines_on_objects_in_parallel_and_return(in_object_list, routine_list):
func_handle = partial(__run_routines_on_object_and_return__, routine_list)
with multiprocessing.Pool(processes = NUM_CORES) as p:
out_object_list = list(p.imap_unordered(
func_handle,
(in_object for in_object in in_object_list)
))
return out_object_list
def __run_routines_on_object_and_return__(routine_list, in_object):
for routine_name in routine_list:
getattr(in_object, routine_name)()
return in_object
object_list = [some_class() for item in range(20)]
pp([item.some_dict for item in object_list])
new_object_list = run_routines_on_objects_in_parallel_and_return(
object_list,
['some_routine', 'some_other_routine']
)
pp([item.some_dict for item in new_object_list])
verification_object_list = [
__run_routines_on_object_and_return__(
['some_routine', 'some_other_routine'],
item
) for item in object_list
]
pp([item.some_dict for item in verification_object_list])
I am working with a list of objects of type some_class. some_class has a property, a dictionary, named some_dict and a few methods, which can modify the dict (some_routine and some_other_routine). Sometimes, I want to call a sequence of methods on all the objects in the list. Because this is computationally intensive, I intend to distribute the objects over multiple CPU cores (using multiprocessing.Pool and imap_unordered - the list order does not matter).
The routine __run_routines_on_object_and_return__ takes care of calling the list of methods on one individual object. From what I can tell, this is working just fine. I am using functools.partial for simplifying the structure of the code a bit - the multiprocessing pool therefore has to handle the list of objects as an input parameter only.
The problem is ... it does not work. The objects contained in the list returned by imap_unordered are identical to the objects I fed into it. The dictionaries within the objects look just like before. I have used similar mechanisms for working on lists of dictionaries directly without a glitch, so I somehow suspect that there is something wrong with modifying an object property which happens to be a dictionary.
In my example, verification_object_list contains the correct result (though it is generated in a single process/thread). new_object_list is identical to object_list, which should not be the case.
What am I doing wrong?
EDIT
I found the following question, which has an actually working and applicable answer. I modified it a bit following my idea of calling a list of methods on every object and it works:
import random
from multiprocessing import Pool, Manager
class Tester(object):
def __init__(self, num=0.0, name='none'):
self.num = num
self.name = name
def modify_me(self):
self.num += random.normalvariate(mu=0, sigma=1)
self.name = 'pla' + str(int(self.num * 100))
def __repr__(self):
return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
for method_name in nn:
getattr(t, method_name)()
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([Tester(num=i) for i in range(num)])
print(tests[:2])
args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
Now, I went a bit further and introduced my original some_class into the game, which contains a the described dictionary property some_dict. It does NOT work:
import random
from multiprocessing import Pool, Manager
from pprint import pformat as pf
class some_class:
some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
def __repr__(self):
return pf(self.some_dict)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
for method_name in nn:
getattr(t, method_name)()
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([some_class() for i in range(num)])
print(tests[:2])
args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
The diff between working and not working is really small, but I still do not get it:
diff --git a/test.py b/test.py
index b12eb56..0aa6def 100644
--- a/test.py
+++ b/test.py
## -1,15 +1,15 ##
import random
from multiprocessing import Pool, Manager
+from pprint import pformat as pf
-class Tester(object):
- def __init__(self, num=0.0, name='none'):
- self.num = num
- self.name = name
- def modify_me(self):
- self.num += random.normalvariate(mu=0, sigma=1)
- self.name = 'pla' + str(int(self.num * 100))
+class some_class:
+ some_dict = {'some_key': None, 'some_other_key': None}
+ def some_routine(self):
+ self.some_dict.update({'some_key': 'some_value'})
+ def some_other_routine(self):
+ self.some_dict.update({'some_other_key': 77})
def __repr__(self):
- return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
+ return pf(self.some_dict)
def init(L):
global tests
## -25,10 +25,10 ## def modify(i_t_nn):
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
- tests = manager.list([Tester(num=i) for i in range(num)])
+ tests = manager.list([some_class() for i in range(num)])
print(tests[:2])
- args = ((i, t, ['modify_me']) for i, t in enumerate(tests))
+ args = ((i, t, ['some_routine', 'some_other_routine']) for i, t in enumerate(tests))
What is happening here?
Your problem is due to two things; namely that you are using a class variable and that you are running your code in different processes.
Since different processes do not share memory, all objects and parameters must be pickled and sent from the original process to the process that executes it. When the parameter is an object, its class is not sent with it. Instead the receiving process uses its own blueprint (i.e. class).
In your current code, you pass the object as a parameter, update it and return it. However, the updates are not made to the object, but rather to the class itself, since you are updating a class variable. However, this update is not sent back to your main process, and therefore you are left with your not updated class.
What you want to do, is to make some_dict a part of your object, rather than of your class. This is easily done by an __init__() method. Thus modify some_class as:
class some_class:
def __init__(self):
self.some_dict = {'some_key': None, 'some_other_key': None}
def some_routine(self):
self.some_dict.update({'some_key': 'some_value'})
def some_other_routine(self):
self.some_dict.update({'some_other_key': 77})
This will make your program work as you intend it to. You almost always want to setup your object in an __init__() call, rather than as class variables, since in the latter case the data will be shared between all instances (and can be updated by all). That is not normally what you want, when you encapsulate data and state in an object of a class.
EDIT: It seems I was mistaken in whether the class is sent with the pickled object. After further inspection of what happens, I think also the class itself, with its class variables are pickled. Since, if the class variable is updated before sending the object to the new process, the updated value is available. However it is still the case that the updates done in the new process are not relayed back to the original class.

Python multiprocessing apply_async never returns result on Windows 7

I am trying to follow a very simple multiprocessing example:
import multiprocessing as mp
def cube(x):
return x**3
pool = mp.Pool(processes=2)
results = [pool.apply_async(cube, args=x) for x in range(1,7)]
However, on my windows machine, I am not able to get the result (on ubuntu 12.04LTS it runs perfectly).
If I inspect results, I see the following:
[<multiprocessing.pool.ApplyResult object at 0x01FF0910>,
<multiprocessing.pool.ApplyResult object at 0x01FF0950>,
<multiprocessing.pool.ApplyResult object at 0x01FF0990>,
<multiprocessing.pool.ApplyResult object at 0x01FF09D0>,
<multiprocessing.pool.ApplyResult object at 0x01FF0A10>,
<multiprocessing.pool.ApplyResult object at 0x01FF0A50>]
If I run results[0].ready() I always get False.
If I run results[0].get() the python interpreter freezes, waiting to get the result that never comes.
The example is as simple as it gets, so I am thinking this is a low level bug relating to the OS (I am on Windows 7). But perhaps someone else has a better idea?
There are a couple of mistakes here. First, you must declare the Pool inside an if __name__ == "__main__": guard when running on Windows. Second, you have to pass the args keyword argument a sequence, even if you're only passing one argument. So putting that together:
import multiprocessing as mp
def cube(x):
return x**3
if __name__ == "__main__":
pool = mp.Pool(processes=2)
results = [pool.apply_async(cube, args=(x,)) for x in range(1,7)]
print([result.get() for result in results])
Output:
[1, 8, 27, 64, 125, 216]
Edit:
Oh, as moarningsun mentions, multiprocessing does not work well in the interactive interpreter:
Note
Functionality within this package requires that the __main__ module be
importable by the children. This is covered in Programming guidelines
however it is worth pointing out here. This means that some examples,
such as the multiprocessing.Pool examples will not work in the
interactive interpreter.
So you'll need to actually execute the code as a script to test it properly.
I was running python 3 and the IDE was spyder in anaconda (windows ) and so this trick doesn't work for me. I tried a lot but couldn't make any difference. I got the reason for my problem and is the same listed by dano in his note. But after a long day of searching I got some solution and it helped me to run the same code my windows machine. This website helped me to get the solution:
http://python.6.x6.nabble.com/Multiprocessing-Pool-woes-td5047050.html
Since I was using the python 3, I changed the program a little like this:
from types import FunctionType
import marshal
def _applicable(*args, **kwargs):
name = kwargs['__pw_name']
code = marshal.loads(kwargs['__pw_code'])
gbls = globals() #gbls = marshal.loads(kwargs['__pw_gbls'])
defs = marshal.loads(kwargs['__pw_defs'])
clsr = marshal.loads(kwargs['__pw_clsr'])
fdct = marshal.loads(kwargs['__pw_fdct'])
func = FunctionType(code, gbls, name, defs, clsr)
func.fdct = fdct
del kwargs['__pw_name']
del kwargs['__pw_code']
del kwargs['__pw_defs']
del kwargs['__pw_clsr']
del kwargs['__pw_fdct']
return func(*args, **kwargs)
def make_applicable(f, *args, **kwargs):
if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
kwargs['__pw_name'] = f.__name__ # edited
kwargs['__pw_code'] = marshal.dumps(f.__code__) # edited
kwargs['__pw_defs'] = marshal.dumps(f.__defaults__) # edited
kwargs['__pw_clsr'] = marshal.dumps(f.__closure__) # edited
kwargs['__pw_fdct'] = marshal.dumps(f.__dict__) # edited
return _applicable, args, kwargs
def _mappable(x):
x,name,code,defs,clsr,fdct = x
code = marshal.loads(code)
gbls = globals() #gbls = marshal.loads(gbls)
defs = marshal.loads(defs)
clsr = marshal.loads(clsr)
fdct = marshal.loads(fdct)
func = FunctionType(code, gbls, name, defs, clsr)
func.fdct = fdct
return func(x)
def make_mappable(f, iterable):
if not isinstance(f, FunctionType): raise ValueError('argument must be a function')
name = f.__name__ # edited
code = marshal.dumps(f.__code__) # edited
defs = marshal.dumps(f.__defaults__) # edited
clsr = marshal.dumps(f.__closure__) # edited
fdct = marshal.dumps(f.__dict__) # edited
return _mappable, ((i,name,code,defs,clsr,fdct) for i in iterable)
After this function , the above problem code is also changed a little like this:
from multiprocessing import Pool
from poolable import make_applicable, make_mappable
def cube(x):
return x**3
if __name__ == "__main__":
pool = Pool(processes=2)
results = [pool.apply_async(*make_applicable(cube,x)) for x in range(1,7)]
print([result.get(timeout=10) for result in results])
And I got the output as :
[1, 8, 27, 64, 125, 216]
I am thinking that this post may be useful for some of the windows users.

How to pass a list of parameters to a function in Python

I warped a class in this way:
import Queue
import threading
class MyThread():
q = Queue.Queue()
content = []
result = {}
t_num = 0
t_func = None
def __init__ (self, t_num, content, t_func):
for item in content:
self.q.put(item)
self.t_num = t_num
self.t_func = t_func
def start(self):
for i in range(self.t_num):
t = threading.Thread(target=self.worker)
t.daemon = True
t.start()
self.q.join()
return self.result
def worker(self):
while True:
item = self.q.get()
value = self.t_func(item)
self.result[item] = value
self.q.task_done()
x = [5, 6, 7, 8, 9]
def func(i):
return i + 1
m = MyThread(4, x, func)
print m.start()
It works well. If I design the function func with 2 or more parameters, and pass these parameters in a list to the class, how can I call the func function in the function worker properly?
eg.
def __init__ (self, t_num, content, t_func, t_func_p):
for item in content:
self.q.put(item)
self.t_num = t_num
self.t_func = t_func
self.t_func_p = t_func_p
def func(i, j, k):
m = MyThread(4, x, func, [j, k])
You need to use *args and **kwargs to pass any number of parameters to a function.
Here is more info: http://www.saltycrane.com/blog/2008/01/how-to-use-args-and-kwargs-in-python/
Maybe this might help:
def __init__(self, t_num, content, func, *params):
func(*params) # params is a list here [param1, param2, param3....]
def func(param1, param2, param3):
# or
def func(*params): # for arbitrary number of params
m = MyThread(4, x, func, param1, param2, param3....)
As a general rule, if you are going to be passing many parameters to a particular function, you may consider wrapping them into a simple object, the reasons are
If you ever need to add/remove parameters, you just need to modify the object, and the function itself, the method signature (and all its references) will remain untouched
When working with objects, you will always know what your function is receiving (this is specially useful if you are working on a team, where more people will use that function).
Finally, because you control the creation of the object on its constructor, you can ensure that the values associated with the object are correct (for example, in the constructor you can make sure that you have no empty values, or that the types are correct).
If still you want to go with multiple parameters, check the *args and **kwargs, although I personally do not like that, as it may end up forcing people to read the function's source in order to use it.
Good luck :)

Printing names of variables passed to a function

In some circumstances, I want to print debug-style output like this:
# module test.py
def f()
a = 5
b = 8
debug(a, b) # line 18
I want the debug function to print the following:
debug info at test.py: 18
function f
a = 5
b = 8
I am thinking it should be possible by using inspect module to locate the stack frame, then finding the appropriate line, looking up the source code in that line, getting the names of the arguments from there. The function name can be obtained by moving one stack frame up. (The values of the arguments is easy to obtain: they are passed directly to the function debug.)
Am I on the right track? Is there any recipe I can refer to?
You could do something along the following lines:
import inspect
def debug(**kwargs):
st = inspect.stack()[1]
print '%s:%d %s()' % (st[1], st[2], st[3])
for k, v in kwargs.items():
print '%s = %s' % (k, v)
def f():
a = 5
b = 8
debug(a=a, b=b) # line 12
f()
This prints out:
test.py:12 f()
a = 5
b = 8
You're generally doing it right, though it would be easier to use AOP for this kinds of tasks. Basically, instead of calling "debug" every time with every variable, you could just decorate the code with aspects which do certain things upon certain events, like upon entering the function to print passed variables and it's name.
Please refer to this site and old so post for more info.
Yeah, you are in the correct track. You may want to look at inspect.getargspec which would return a named tuple of args, varargs, keywords, defaults passed to the function.
import inspect
def f():
a = 5
b = 8
debug(a, b)
def debug(a, b):
print inspect.getargspec(debug)
f()
This is really tricky. Let me try and give a more complete answer reusing this code, and the hint about getargspec in Senthil's answer which got me triggered somehow. Btw, getargspec is deprecated in Python 3.0 and getfullarcspec should be used instead.
This works for me on a Python 3.1.2 both with explicitly calling the debug function and with using a decorator:
# from: https://stackoverflow.com/a/4493322/923794
def getfunc(func=None, uplevel=0):
"""Return tuple of information about a function
Go's up in the call stack to uplevel+1 and returns information
about the function found.
The tuple contains
name of function, function object, it's frame object,
filename and line number"""
from inspect import currentframe, getouterframes, getframeinfo
#for (level, frame) in enumerate(getouterframes(currentframe())):
# print(str(level) + ' frame: ' + str(frame))
caller = getouterframes(currentframe())[1+uplevel]
# caller is tuple of:
# frame object, filename, line number, function
# name, a list of lines of context, and index within the context
func_name = caller[3]
frame = caller[0]
from pprint import pprint
if func:
func_name = func.__name__
else:
func = frame.f_locals.get(func_name, frame.f_globals.get(func_name))
return (func_name, func, frame, caller[1], caller[2])
def debug_prt_func_args(f=None):
"""Print function name and argument with their values"""
from inspect import getargvalues, getfullargspec
(func_name, func, frame, file, line) = getfunc(func=f, uplevel=1)
argspec = getfullargspec(func)
#print(argspec)
argvals = getargvalues(frame)
print("debug info at " + file + ': ' + str(line))
print(func_name + ':' + str(argvals)) ## reformat to pretty print arg values here
return func_name
def df_dbg_prt_func_args(f):
"""Decorator: dpg_prt_func_args - Prints function name and arguments
"""
def wrapped(*args, **kwargs):
debug_prt_func_args(f)
return f(*args, **kwargs)
return wrapped
Usage:
#df_dbg_prt_func_args
def leaf_decor(*args, **kwargs):
"""Leaf level, simple function"""
print("in leaf")
def leaf_explicit(*args, **kwargs):
"""Leaf level, simple function"""
debug_prt_func_args()
print("in leaf")
def complex():
"""A complex function"""
print("start complex")
leaf_decor(3,4)
print("middle complex")
leaf_explicit(12,45)
print("end complex")
complex()
and prints:
start complex
debug info at debug.py: 54
leaf_decor:ArgInfo(args=[], varargs='args', keywords='kwargs', locals={'args': (3, 4), 'f': <function leaf_decor at 0x2aaaac048d98>, 'kwargs': {}})
in leaf
middle complex
debug info at debug.py: 67
leaf_explicit:ArgInfo(args=[], varargs='args', keywords='kwargs', locals={'args': (12, 45), 'kwargs': {}})
in leaf
end complex
The decorator cheats a bit: Since in wrapped we get the same arguments as the function itself it doesn't matter that we find and report the ArgSpec of wrapped in getfunc and debug_prt_func_args. This code could be beautified a bit, but it works alright now for the simple debug testcases I used.
Another trick you can do: If you uncomment the for-loop in getfunc you can see that inspect can give you the "context" which really is the line of source code where a function got called. This code is obviously not showing the content of any variable given to your function, but sometimes it already helps to know the variable name used one level above your called function.
As you can see, with the decorator you don't have to change the code inside the function.
Probably you'll want to pretty print the args. I've left the raw print (and also a commented out print statement) in the function so it's easier to play around with.

Categories