Why Python pickling library complain about class member that doesn't exist?

Why Python pickling library complain about class member that doesn't exist? - python

I have the following simple class definition:
def apmSimUp(i):
return APMSim(i)
def simDown(sim):
sim.close()
class APMSimFixture(TestCase):
def setUp(self):
self.pool = multiprocessing.Pool()
self.sims = self.pool.map(
apmSimUp,
range(numCores)
)
def tearDown(self):
self.pool.map(
simDown,
self.sims
)
Where class APMSim is defined purely by plain simple python primitive types (string, list etc.) the only exception is a static member, which is a multiprocessing manager.list
However, when I try to execute this class, I got the following error information:
Error
Traceback (most recent call last):
File "/home/peng/git/datapassport/spookystuff/mav/pyspookystuff_test/mav/__init__.py", line 77, in setUp
range(numCores)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
raise self._value
MaybeEncodingError: Error sending result: '[<pyspookystuff.mav.sim.APMSim object at 0x7f643c4ca8d0>]'. Reason: 'TypeError("can't pickle thread.lock objects",)'
Which is strange as thread.lock cannot be found anywhere, I strictly avoid any multithreading component (as you can see, only multiprocessing component is used). And none of these component exist in my class, or only as static member, what should I do to make this class picklable?
BTW, is there a way to exclude a black sheep member from pickling? Like Java's #transient annotation?
Thanks a lot for any help!
UPDATE: The following is my full APMSim class, please see if you find anything that violates it picklability:
usedINums = mav.manager.list()
class APMSim(object):
global usedINums
#staticmethod
def nextINum():
port = mav.nextUnused(usedINums, range(0, 254))
return port
def __init__(self, iNum):
# type: (int) -> None
self.iNum = iNum
self.args = sitl_args + ['-I' + str(iNum)]
#staticmethod
def create():
index = APMSim.nextINum()
try:
result = APMSim(index)
return result
except Exception as ee:
usedINums.remove(index)
raise
#lazy
def _sitl(self):
sitl = SITL()
sitl.download('copter', '3.3')
sitl.launch(self.args, await_ready=True, restart=True)
print("launching .... ", sitl.p.pid)
return sitl
#lazy
def sitl(self):
self.setParamAndRelaunch('SYSID_THISMAV', self.iNum + 1)
return self._sitl
def _getConnStr(self):
return tcp_master(self.iNum)
#lazy
def connStr(self):
self.sitl
return self._getConnStr()
def setParamAndRelaunch(self, key, value):
wd = self._sitl.wd
print("relaunching .... ", self._sitl.p.pid)
v = connect(self._getConnStr(), wait_ready=True) # if use connStr will trigger cyclic invocation
v.parameters.set(key, value, wait_ready=True)
v.close()
self._sitl.stop()
self._sitl.launch(self.args, await_ready=True, restart=True, wd=wd, use_saved_data=True)
v = connect(self._getConnStr(), wait_ready=True)
# This fn actually rate limits itself to every 2s.
# Just retry with persistence to get our first param stream.
v._master.param_fetch_all()
v.wait_ready()
actualValue = v._params_map[key]
assert actualValue == value
v.close()
def close(self):
self._sitl.stop()
usedINums.remove(self.iNum)
lazy decorator is from this library:
https://docs.python.org/2/tutorial/classes.html#generator-expressions

It would help to see how your class looks, but if it has methods from multiprocessing you may have issues just pickling it by default. Multiprocessing objects can use locks as well, and these are (obviously) unpickle-able.
You can customize pickling with the __getstate__ method, or __reduce__ (documented in the same place).

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

Suppose you have the following:
file = 'hey.py'
class hey:
def __init__(self):
self.you =1
ins = hey()
temp = open("cool_class", "wb")
pickle.dump(ins, temp)
temp.close()
Now suppose you delete the file hey.py and you run the following code:
pkl_file = open("cool_class", 'rb')
obj = pickle.load(pkl_file)
pkl_file.close()
You'll get an error. I get that it's probably the case that you can't work around the problem of if you don't have the file hey.py with the class and the attributes of that class in the top level then you can't open the class with pickle. But it has to be the case that I can find out what the attributes of the serialized class are and then I can reconstruct the deleted file and open the class. I have pickles that are 2 years old and I have deleted the file that I used to construct them and I just have to find out what what the attributes of those classes are so that I can reopen these pickles
#####UPDATE
I know from the error messages that the module that originally contained the old class, let's just call it 'hey.py'. And I know the name of the class let's call it 'you'. But even after recreating the module and building a class called 'you' I still can't get the pickle to open. So I wrote this code on the hey.py module like so:
class hey:
def __init__(self):
self.hey = 1
def __setstate__(self):
self.__dict__ = ''
self.you = 1
But I get the error message: TypeError: init() takes 1 positional argument but 2 were given
#########UPDATE 2:
I Changed the code from
class hey:
to
class hey():
I then got an AttributeError but it doesn't tell me what attribute is missing. I then performed
obj= pickletools.dis(file)
And got an error on the pickletools.py file here
def _genops(data, yield_end_pos=False):
if isinstance(data, bytes_types):
data = io.BytesIO(data)
if hasattr(data, "tell"):
getpos = data.tell
else:
getpos = lambda: None
while True:
pos = getpos()
code = data.read(1)
opcode = code2op.get(code.decode("latin-1"))
if opcode is None:
if code == b"":
raise ValueError("pickle exhausted before seeing STOP")
else:
raise ValueError("at position %s, opcode %r unknown" % (
"<unknown>" if pos is None else pos,
code))
if opcode.arg is None:
arg = None
else:
arg = opcode.arg.reader(data)
if yield_end_pos:
yield opcode, arg, pos, getpos()
else:
yield opcode, arg, pos
if code == b'.':
assert opcode.name == 'STOP'
break
At this line:
code = data.read(1)
saying: AttributeError: 'str' object has no attribute 'read'
I will now try the other methods in the pickletools
########### UPDATE 3
I wanted to see what happened when I saved an object composed mostly of dictionary but some of the values in the dictionaries were classes. This is the class that was saved:
so here is the class in question:
class fss(frozenset):
def __init__(self, *args, **kwargs):
super(frozenset, self).__init__()
def __str__(self):
str1 = lbr + "{}" + rbr
return str1.format(','.join(str(x) for x in self))
Now keep in mind that the object pickled is mostly a dictionary and that class exists within the dictionary. After performing
obj= pickletools.genops(file)
I get the following output:
image
image2
I don't see how I would be able to construct the class referred to with that data if I hadn't known what the class was.
############### UPDATE #4
#AKK
Thanks for helping me out. I am able to see how your code works but my pickled file saved from 2 years ago and whose module and class have long since been deleted, I cannot open it into a bytes-like object which to me seems to be a necessity.
So the path of the file is
file ='hey.pkl'
pkl_file = open(file, 'rb')
x = MagicUnpickler(io.BytesIO(pkl_file)).load()
This returns the error:
TypeError: a bytes-like object is required, not '_io.BufferedReader'
But I thought the object was a bytes object since I opened it with open(file, 'rb')
############ UPDATE #5
Actually, I think with AKX's help I've solved the problem.
So using the code:
pkl_file = open(name, 'rb')
x = MagicUnpickler(pkl_file).load()
I then created two blank modules which once contained the classes found in the save pickle, but I did not have to put the classes on them. I was getting an error in the file pickle.py here:
def load_reduce(self):
stack = self.stack
args = stack.pop()
func = stack[-1]
try:
stack[-1] = func(*args)
except TypeError:
pass
dispatch[REDUCE[0]] = load_reduce
So after excepting that error, everything worked. I really want to thank AKX for helping me out. I have actually been trying to solve this problem for about 5 years because I use pickles far more often than most programmers. I used to not understand that if you alter a class then that ruins any pickled files saved with that class so I ran into this problem again and again. But now that I'm going back over some code which is 2 years old and it looks like some of the files were deleted, I'm going to need this code a lot in the future. So I really appreciate your help in getting this problem solved.

Well, with a bit of hacking and magic, sure, you can hydrate missing classes, but I'm not guaranteeing this will work for all pickle data you may encounter; for one, this doesn't touch the __setstate__/__reduce__ protocols, so I don't know if they work.
Given a script file (so72863050.py in my case):
import io
import pickle
import types
from logging import Formatter
# Create a couple empty classes. Could've just used `class C1`,
# but we're coming back to this syntax later.
C1 = type('C1', (), {})
C2 = type('C2', (), {})
# Create an instance or two, add some data...
inst = C1()
inst.child1 = C2()
inst.child1.magic = 42
inst.child2 = C2()
inst.child2.mystery = 'spooky'
inst.child2.log_formatter = Formatter('heyyyy %(message)s') # To prove we can unpickle regular classes still
inst.other_data = 'hello'
inst.some_dict = {'a': 1, 'b': 2}
# Pickle the data!
pickle_bytes = pickle.dumps(inst)
# Let's erase our memory of these two classes:
del C1
del C2
try:
print(pickle.loads(pickle_bytes))
except Exception as exc:
pass # Can't get attribute 'C1' on <module '__main__'> – yep, it certainly isn't there!
we now have successfully created some pickle data that we can't load anymore, since we forgot about those two classes. Now, since the unpickling mechanism is customizable, we can derive a magic unpickler, that in the face of certain defeat (or at least an AttributeError), synthesizes a simple class from thin air:
# Could derive from Unpickler, but that may be a C class, so our tracebacks would be less helpful
class MagicUnpickler(pickle._Unpickler):
def __init__(self, fp):
super().__init__(fp)
self._magic_classes = {}
def find_class(self, module, name):
try:
return super().find_class(module, name)
except AttributeError:
return self._create_magic_class(module, name)
def _create_magic_class(self, module, name):
cache_key = (module, name)
if cache_key not in self._magic_classes:
cls = type(f'<<Emulated Class {module}:{name}>>', (types.SimpleNamespace,), {})
self._magic_classes[cache_key] = cls
return self._magic_classes[cache_key]
Now, when we run that magic unpickler against a stream from the aforebuilt pickle_bytes that plain ol' pickle.loads() couldn't load...
x = MagicUnpickler(io.BytesIO(pickle_bytes)).load()
print(x)
print(x.child1.magic)
print(x.child2.mystery)
print(x.child2.log_formatter._style._fmt)
prints out
<<Emulated Class __main__:C1>>(child1=<<Emulated Class __main__:C2>>(magic=42), child2=<<Emulated Class __main__:C2>>(mystery='spooky'), other_data='hello', some_dict={'a': 1, 'b': 2})
42
spooky
heyyyy %(message)s
Hey, magic!

The error in function load_reduce(self) can be re-created by:
class Y(set):
pass
pickle_bytes = io.BytesIO(pickle.dumps(Y([2, 3, 4, 5])))
del Y
print(MagicUnpickler(pickle_bytes).load())
AKX's answer do not solve cases when the class inherit from base classes as set, dict, list,...

Understanding class variable behavior

We came across the need to have a dynamic class variable in the following code in python 2.
from datetime import datetime
from retrying import retry
class TestClass(object):
SOME_VARIABLE = None
def __init__(self, some_arg=None):
self.some_arg = some_arg
#retry(retry_on_exception=lambda e: isinstance(e, EnvironmentError), wait_fixed=3000 if SOME_VARIABLE == "NEEDED" else 1000, stop_max_attempt_number=3)
def some_func(self):
print("Running {} at {}".format(self.some_arg, datetime.now()))
if self.some_arg != "something needed":
raise EnvironmentError("Unexpected value")
TestClass.SOME_VARIABLE = "NEEDED"
x = TestClass()
x.some_func()
Output:
Running None at 2021-07-26 19:40:22.374736
Running None at 2021-07-26 19:40:23.376027
Running None at 2021-07-26 19:40:24.377523
Traceback (most recent call last):
File "/home/raj/tmp/test_test.py", line 19, in <module>
x.some_func()
File "/home/raj/.local/share/virtualenvs/test-DzpjW1fZ/lib/python2.7/site-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/home/raj/.local/share/virtualenvs/test-DzpjW1fZ/lib/python2.7/site-packages/retrying.py", line 212, in call
raise attempt.get()
File "/home/raj/.local/share/virtualenvs/test-DzpjW1fZ/lib/python2.7/site-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/home/raj/.local/share/virtualenvs/test-DzpjW1fZ/lib/python2.7/site-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/home/raj/tmp/test_test.py", line 14, in some_func
raise EnvironmentError("Unexpected value")
EnvironmentError: Unexpected value
We can see that the value of SOME_VARIABLE is not being updated.
Trying to understand if there is way in which we can update SOME_VARIABLE dynamically. The use case is to have dynamic timings in the retry function based on SOME_VARIABLE value at runtime.

Your class definition is equivalent, based on the definition of decorator syntax, to
class TestClass(object):
SOME_VARIABLE = None
def __init__(self, some_arg=None):
self.some_arg = some_arg
decorator = retry(retry_on_exception=lambda e: isinstance(e, EnvironmentError),
wait_fixed=3000 if SOME_VARIABLE == "NEEDED" else 1000,
stop_max_attempt_number=3)
def some_func(self):
...
some_func = decorator(some_func)
Note that retry is called long before you change the value of TestClass.SOME_VARIABLE (indeed, before the class object that will be bound to TestClass even exists), so the comparison SOME_VARIABLE == "NEEDED" is evaluated when SOME_VARIABLE still equals None.
To have the retry behavior configured at run-time, try something like
class TestClass(object):
SOME_VARIABLE = None
def __init__(self, some_arg=None):
self.some_arg = some_arg
def _some_func_implemenation(self):
print("Running {} at {}".format(self.some_arg, datetime.now()))
if self.some_arg != "something needed":
raise EnvironmentError("Unexpected value")
def some_func(self):
wait = 3000 if self.SOME_VARIABLE == "NEEDED" else 1000
impl = retry(retry_on_exception=lambda e: isinstance(e, EnvironmentError),
wait_fixed=wait,
stop_max_attempt_number=3)(self._some_func)
return impl()
some_func becomes a function that, at runtime, creates a function (based on the private _some_func) with the appropriate retry behavior, then calls it.
(Not tested; I may have gotten the interaction between the bound method self._some_func and retry wrong.)

Faking a traceback in Python

I'm writing a test runner. I have an object that can catch and store exceptions, which will be formatted as a string later as part of the test failure report. I'm trying to unit-test the procedure that formats the exception.
In my test setup, I don't want to actually throw an exception for my object to catch, mainly because it means that the traceback won't be predictable. (If the file changes length, the line numbers in the traceback will change.)
How can I attach a fake traceback to an exception, so that I can make assertions about the way it's formatted? Is this even possible? I'm using Python 3.3.
Simplified example:
class ExceptionCatcher(object):
def __init__(self, function_to_try):
self.f = function_to_try
self.exception = None
def try_run(self):
try:
self.f()
except Exception as e:
self.exception = e
def format_exception_catcher(catcher):
pass
# No implementation yet - I'm doing TDD.
# This'll probably use the 'traceback' module to stringify catcher.exception
class TestFormattingExceptions(unittest.TestCase):
def test_formatting(self):
catcher = ExceptionCatcher(None)
catcher.exception = ValueError("Oh no")
# do something to catcher.exception so that it has a traceback?
output_str = format_exception_catcher(catcher)
self.assertEquals(output_str,
"""Traceback (most recent call last):
File "nonexistent_file.py", line 100, in nonexistent_function
raise ValueError("Oh no")
ValueError: Oh no
""")

Reading the source of traceback.py pointed me in the right direction. Here's my hacky solution, which involves faking the frame and code objects which the traceback would normally hold references to.
import traceback
class FakeCode(object):
def __init__(self, co_filename, co_name):
self.co_filename = co_filename
self.co_name = co_name
class FakeFrame(object):
def __init__(self, f_code, f_globals):
self.f_code = f_code
self.f_globals = f_globals
class FakeTraceback(object):
def __init__(self, frames, line_nums):
if len(frames) != len(line_nums):
raise ValueError("Ya messed up!")
self._frames = frames
self._line_nums = line_nums
self.tb_frame = frames[0]
self.tb_lineno = line_nums[0]
#property
def tb_next(self):
if len(self._frames) > 1:
return FakeTraceback(self._frames[1:], self._line_nums[1:])
class FakeException(Exception):
def __init__(self, *args, **kwargs):
self._tb = None
super().__init__(*args, **kwargs)
#property
def __traceback__(self):
return self._tb
#__traceback__.setter
def __traceback__(self, value):
self._tb = value
def with_traceback(self, value):
self._tb = value
return self
code1 = FakeCode("made_up_filename.py", "non_existent_function")
code2 = FakeCode("another_non_existent_file.py", "another_non_existent_method")
frame1 = FakeFrame(code1, {})
frame2 = FakeFrame(code2, {})
tb = FakeTraceback([frame1, frame2], [1,3])
exc = FakeException("yo").with_traceback(tb)
print(''.join(traceback.format_exception(FakeException, exc, tb)))
# Traceback (most recent call last):
# File "made_up_filename.py", line 1, in non_existent_function
# File "another_non_existent_file.py", line 3, in another_non_existent_method
# FakeException: yo
Thanks to #User for providing FakeException, which is necessary because real exceptions type-check the argument to with_traceback().
This version does have a few limitations:
It doesn't print the lines of code for each stack frame, as a real
traceback would, because format_exception goes off to look for the
real file that the code came from (which doesn't exist in our case).
If you want to make this work, you need to insert fake data into
linecache's
cache (because traceback uses linecache to get hold of the source
code), per #User's answer
below.
You also can't actually raise exc and expect the fake traceback
to survive.
More generally, if you have client code that traverses tracebacks in
a different manner than traceback does (such as much of the inspect
module), these fakes probably won't work. You'd need to add whatever
extra attributes the client code expects.
These limitations are fine for my purposes - I'm just using it as a test double for code that calls traceback - but if you want to do more involved traceback manipulation, it looks like you might have to go down to the C level.

EDIT2:
That is the code of linecache.. I will comment on it.
def updatecache(filename, module_globals=None): # module_globals is a dict
# ...
if module_globals and '__loader__' in module_globals:
name = module_globals.get('__name__')
loader = module_globals['__loader__']
# module_globals = dict(__name__ = 'somename', __loader__ = loader)
get_source = getattr(loader, 'get_source', None)
# loader must have a 'get_source' function that returns the source
if name and get_source:
try:
data = get_source(name)
except (ImportError, IOError):
pass
else:
if data is None:
# No luck, the PEP302 loader cannot find the source
# for this module.
return []
cache[filename] = (
len(data), None,
[line+'\n' for line in data.splitlines()], fullname
)
return cache[filename][2]
That means before you testrun just do:
class Loader:
def get_source(self):
return 'source of the module'
import linecache
linecache.updatecache(filename, dict(__name__ = 'modulename without <> around',
__loader__ = Loader()))
and 'source of the module' is the source of the module you test.
EDIT1:
My solution so far:
class MyExeption(Exception):
_traceback = None
#property
def __traceback__(self):
return self._traceback
#__traceback__.setter
def __traceback__(self, value):
self._traceback = value
def with_traceback(self, tb_or_none):
self.__traceback__ = tb_or_none
return self
Now you can set the custom tracebacks of the exception:
e = MyExeption().with_traceback(1)
What you usually do if you reraise an exception:
raise e.with_traceback(fake_tb)
All exception prints walk through this function:
import traceback
traceback.print_exception(_type, _error, _traceback)
Hope it helps somehow.

You should be able to simply raise whatever fake exception you want where you want it in your test runs. The python exception docs suggest you create a class and raise that as your exception. It's section 8.5 of the docs.
http://docs.python.org/2/tutorial/errors.html
Should be pretty straightforward once you've got the class created.

Unbound method TypeError

I've just been reading an article that talks about implementing a parser in python:
http://effbot.org/zone/simple-top-down-parsing.htm
The general idea behind the code is described in this paper: http://mauke.hopto.org/stuff/papers/p41-pratt.pdf
Being fairly new to writing parsers in python so I'm trying to write something similar as a learning exercise. However when I attempted to try to code up something similar to what was found in the article I am getting an TypeError: unbound method TypeError. This is the first time I've encountered such an error and I've spent all day trying to figure this out but I haven't solved the issue. Here is a minimal code example (in it's entirety) that has this problem:
import re
class Symbol_base(object):
""" A base class for all symbols"""
id = None # node/token type name
value = None #used by literals
first = second = third = None #used by tree nodes
def nud(self):
""" A default implementation for nud """
raise SyntaxError("Syntax error (%r)." % self.id)
def led(self,left):
""" A default implementation for led """
raise SyntaxError("Unknown operator (%r)." % self.id)
def __repr__(self):
if self.id == "(name)" or self.id == "(literal)":
return "(%s %s)" % (self.id[1:-1], self.value)
out = [self.id, self.first, self.second, self.third]
out = map(str, filter(None,out))
return "(" + " ".join(out) + ")"
symbol_table = {}
def symbol(id, bindingpower=0):
""" If a given symbol is found in the symbol_table return it.
If the symblo cannot be found theni create the appropriate class
and add that to the symbol_table."""
try:
s = symbol_table[id]
except KeyError:
class s(Symbol_base):
pass
s.__name__ = "symbol:" + id #for debugging purposes
s.id = id
s.lbp = bindingpower
symbol_table[id] = s
else:
s.lbp = max(bindingpower,s.lbp)
return s
def infix(id, bp):
""" Helper function for defining the symbols for infix operations """
def infix_led(self, left):
self.first = left
self.second = expression(bp)
return self
symbol(id, bp).led = infix_led
#define all the symbols
infix("+", 10)
symbol("(literal)").nud = lambda self: self #literal values must return the symbol itself
symbol("(end)")
token_pat = re.compile("\s*(?:(\d+)|(.))")
def tokenize(program):
for number, operator in token_pat.findall(program):
if number:
symbol = symbol_table["(literal)"]
s = symbol()
s.value = number
yield s
else:
symbol = symbol_table.get(operator)
if not symbol:
raise SyntaxError("Unknown operator")
yield symbol
symbol = symbol_table["(end)"]
yield symbol()
def expression(rbp = 0):
global token
t = token
token = next()
left = t.nud()
while rbp < token.lbp:
t = token
token = next()
left = t.led(left)
return left
def parse(program):
global token, next
next = tokenize(program).next
token = next()
return expression()
def __main__():
print parse("1 + 2")
if __name__ == "__main__":
__main__()
When I try to run this with pypy:
Traceback (most recent call last):
File "app_main.py", line 72, in run_toplevel
File "parser_code_issue.py", line 93, in <module>
__main__()
File "parser_code_issue.py", line 90, in __main__
print parse("1 + 2")
File "parser_code_issue.py", line 87, in parse
return expression()
File "parser_code_issue.py", line 81, in expression
left = t.led(left)
TypeError: unbound method infix_led() must be called with symbol:+ instance as first argument (got symbol:(literal) instance instead)
I'm guessing this happens because I don't create an instance for the infix operations but I'm not really wanting to create an instance at that point. Is there some way I can change those methods without creating instances?
Any help explaining why this is happening and what I can do to fix the code is greatly appreciated!
Also is this behaviour going to change in python 3?

You forgot to create an instance of the symbol in your tokenize() function; when not a number, yield symbol(), not symbol:
else:
symbol = symbol_table.get(operator)
if not symbol:
raise SyntaxError("Unknown operator")
yield symbol()
With that one change your code prints:
(+ (literal 1) (literal 2))

You haven't bound new function to the instance of your object.
import types
obj = symbol(id, bp)
obj.led = types.MethodType(infix_led, obj)
See accepted answer to another SO question

Python decorator with multiprocessing fails

I would like to use a decorator on a function that I will subsequently pass to a multiprocessing pool. However, the code fails with "PicklingError: Can't pickle : attribute lookup __builtin__.function failed". I don't quite see why it fails here. I feel certain that it's something simple, but I can't find it. Below is a minimal "working" example. I thought that using the functools function would be enough to let this work.
If I comment out the function decoration, it works without an issue. What is it about multiprocessing that I'm misunderstanding here? Is there any way to make this work?
Edit: After adding both a callable class decorator and a function decorator, it turns out that the function decorator works as expected. The callable class decorator continues to fail. What is it about the callable class version that keeps it from being pickled?
import random
import multiprocessing
import functools
class my_decorator_class(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, elements):
f = []
for element in elements:
f.append(self.target([element])[0])
return f
def my_decorator_function(target):
#functools.wraps(target)
def inner(elements):
f = []
for element in elements:
f.append(target([element])[0])
return f
return inner
#my_decorator_function
def my_func(elements):
f = []
for element in elements:
f.append(sum(element))
return f
if __name__ == '__main__':
elements = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([e],)) for e in elements]
pool.close()
f = [r.get()[0] for r in results]
print(f)

The problem is that pickle needs to have some way to reassemble everything that you pickle. See here for a list of what can be pickled:
http://docs.python.org/library/pickle.html#what-can-be-pickled-and-unpickled
When pickling my_func, the following components need to be pickled:
An instance of my_decorator_class, called my_func.
This is fine. Pickle will store the name of the class and pickle its __dict__ contents. When unpickling, it uses the name to find the class, then creates an instance and fills in the __dict__ contents. However, the __dict__ contents present a problem...
The instance of the original my_func that's stored in my_func.target.
This isn't so good. It's a function at the top-level, and normally these can be pickled. Pickle will store the name of the function. The problem, however, is that the name "my_func" is no longer bound to the undecorated function, it's bound to the decorated function. This means that pickle won't be able to look up the undecorated function to recreate the object. Sadly, pickle doesn't have any way to know that object it's trying to pickle can always be found under the name __main__.my_func.
You can change it like this and it will work:
import random
import multiprocessing
import functools
class my_decorator(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, candidates, args):
f = []
for candidate in candidates:
f.append(self.target([candidate], args)[0])
return f
def old_my_func(candidates, args):
f = []
for c in candidates:
f.append(sum(c))
return f
my_func = my_decorator(old_my_func)
if __name__ == '__main__':
candidates = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([c], {})) for c in candidates]
pool.close()
f = [r.get()[0] for r in results]
print(f)
You have observed that the decorator function works when the class does not. I believe this is because functools.wraps modifies the decorated function so that it has the name and other properties of the function it wraps. As far as the pickle module can tell, it is indistinguishable from a normal top-level function, so it pickles it by storing its name. Upon unpickling, the name is bound to the decorated function so everything works out.

I also had some problem using decorators in multiprocessing. I'm not sure if it's the same problem as yours:
My code looked like this:
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
#decorate_func
def actual_func(x):
return x ** 2
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(actual_func,(2,))
print result.get()
and when I run the code I get this:
Traceback (most recent call last):
File "test.py", line 15, in <module>
print result.get()
File "somedirectory_too_lengthy_to_put_here/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I fixed it by defining a new function to wrap the function in the decorator function, instead of using the decorator syntax
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
def actual_func(x):
return x ** 2
def wrapped_func(*args, **kwargs):
return decorate_func(actual_func)(*args, **kwargs)
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(wrapped_func,(2,))
print result.get()
The code ran perfectly and I got:
I'm decorating
4
I'm not very experienced at Python, but this solution solved my problem for me

If you want the decorators too bad (like me), you can also use the exec() command on the function string, to circumvent the mentioned pickling.
I wanted to be able to pass all the arguments to an original function and then use them successively. The following is my code for it.
At first, I made a make_functext() function to convert the target function object to a string. For that, I used the getsource() function from the inspect module (see doctumentation here and note that it can't retrieve source code from compiled code etc.). Here it is:
from inspect import getsource
def make_functext(func):
ft = '\n'.join(getsource(func).split('\n')[1:]) # Removing the decorator, of course
ft = ft.replace(func.__name__, 'func') # Making function callable with 'func'
ft = ft.replace('#§ ', '').replace('#§', '') # For using commented code starting with '#§'
ft = ft.strip() # In case the function code was indented
return ft
It is used in the following _worker() function that will be the target of the processes:
def _worker(functext, args):
scope = {} # This is needed to keep executed definitions
exec(functext, scope)
scope['func'](args) # Using func from scope
And finally, here's my decorator:
from multiprocessing import Process
def parallel(num_processes, **kwargs):
def parallel_decorator(func, num_processes=num_processes):
functext = make_functext(func)
print('This is the parallelized function:\n', functext)
def function_wrapper(funcargs, num_processes=num_processes):
workers = []
print('Launching processes...')
for k in range(num_processes):
p = Process(target=_worker, args=(functext, funcargs[k])) # use args here
p.start()
workers.append(p)
return function_wrapper
return parallel_decorator
The code can finally be used by defining a function like this:
#parallel(4)
def hello(args):
#§ from time import sleep # use '#§' to avoid unnecessary (re)imports in main program
name, seconds = tuple(args) # unpack args-list here
sleep(seconds)
print('Hi', name)
... which can now be called like this:
hello([['Marty', 0.5],
['Catherine', 0.9],
['Tyler', 0.7],
['Pavel', 0.3]])
... which outputs:
This is the parallelized function:
def func(args):
from time import sleep
name, seconds = tuple(args)
sleep(seconds)
print('Hi', name)
Launching processes...
Hi Pavel
Hi Marty
Hi Tyler
Hi Catherine
Thanks for reading, this is my very first post. If you find any mistakes or bad practices, feel free to leave a comment. I know that these string conversions are quite dirty, though...

If you use this code for your decorator:
import multiprocessing
from types import MethodType
DEFAULT_POOL = []
def run_parallel(_func=None, *, name: str = None, context_pool: list = DEFAULT_POOL):
class RunParallel:
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
process = multiprocessing.Process(target=self.func, name=name, args=args, kwargs=kwargs)
context_pool.append(process)
process.start()
def __get__(self, instance, owner):
return self if instance is None else MethodType(self, instance)
if _func is None:
return RunParallel
else:
return RunParallel(_func)
def wait_context(context_pool: list = DEFAULT_POOL, kill_others_if_one_fails: bool = False):
finished = []
for process in context_pool:
process.join()
finished.append(process)
if kill_others_if_one_fails and process.exitcode != 0:
break
if kill_others_if_one_fails:
# kill unfinished processes
for process in context_pool:
if process not in finished:
process.kill()
# wait for every process to be dead
for process in context_pool:
process.join()
Then you can use it like this, in these 4 examples:
#run_parallel
def m1(a, b="b"):
print(f"m1 -- {a=} {b=}")
#run_parallel(name="mym2", context_pool=DEFAULT_POOL)
def m2(d, cc="cc"):
print(f"m2 -- {d} {cc=}")
a = 1/0
class M:
#run_parallel
def c3(self, k, n="n"):
print(f"c3 -- {k=} {n=}")
#run_parallel(name="Mc4", context_pool=DEFAULT_POOL)
def c4(self, x, y="y"):
print(f"c4 -- {x=} {y=}")
if __name__ == "__main__":
m1(11)
m2(22)
M().c3(33)
M().c4(44)
wait_context(kill_others_if_one_fails=True)
The output will be:
m1 -- a=11 b='b'
m2 -- 22 cc='cc'
c3 -- k=33 n='n'
(followed by the exception raised in method m2)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why Python pickling library complain about class member that doesn't exist? - python

Related

Is it really impossible to unpickle a Python class if the original python file has been deleted?

Understanding class variable behavior

Faking a traceback in Python

Unbound method TypeError

Python decorator with multiprocessing fails

Categories

Resources