Multiprocessing Manager failing on very simple example with pool.apply_async - python

I'm seeing some unexpected behavior in my code related to python multiprocessing, and the Manager class in particular. I wrote out a super simple example to try and better understand what's going on:
import multiprocessing as mp
from collections import defaultdict
def process(d):
print('doing the process')
d['a'] = []
d['a'].append(1)
d['a'].append(2)
def main():
pool = mp.Pool(mp.cpu_count())
with mp.Manager() as manager:
d = manager.dict({'c': 2})
result = pool.apply_async(process, args=(d))
print(result.get())
pool.close()
pool.join()
print(d)
if __name__ == '__main__':
main()
This fails, and the stack trace printed from result.get() is as follows:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "<string>", line 2, in __iter__
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 825, in _callmethod
proxytype = self._manager._registry[token.typeid][-1]
AttributeError: 'NoneType' object has no attribute '_registry'
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "mp_test.py", line 34, in <module>
main()
File "mp_test.py", line 25, in main
print(result.get())
File "/usr/local/Cellar/python/3.7.5/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
AttributeError: 'NoneType' object has no attribute '_registry'
I'm still unclear on what's happening here. This seems to me to be a very, very straightforward application of the Manager class. It's nearly a copy of the actual example used in the official python documentation, with the only difference being that i'm using a pool and running the process with apply_async. I'm doing this because that's what i'm using in my actual project.
To clarify, I wouldn't get a stack trace if I didn't have the result = and print(result.get()) in there. I just see {'c': 2} printed when I run the script, which indicated to me that something was going wrong and wasn't being shown.

A couple things to start with: first, this isn't the code you ran. The code you posted has
result = pool.apply_async(process2, args=(d))
but there is no process2() defined. Assuming "process` was intended, the next thing is the
args=(d)
part. That's the same as typing
args=d
but that's not what's needed. You need to pass a sequence of the intended arguments. So you need to change that part to
args=(d,) # build a 1-tuple
or
args=[d] # build a list
Then the output changes, to
{'c': 2, 'a': []}
Why aren't 1 and 2 in the the 'a' list? Because it's only the dict itself that lives on the manager server.
d['a'].append(1)
first gets the mapping for 'a' from the server, which is an empty list. But that empty list is not shared in any way - it's local to process(). You append 1 to it, and then it's thrown away - the server knows nothing about it. Same thing for 2.
To get what you want, you need to "do something" to tell the manager server about what you changed; e.g.,
d['a'] = L = []
L.append(1)
L.append(2)
d['a'] = L

Related

Python jsonpickle error: 'OrderedDict' object has no attribute '_OrderedDict__root'

I'm hitting this exception with jsonpickle, when trying to pickle a rather complex object that unfortunately I'm not sure how to describe here. I know that makes it tough to say much, but for what it's worth:
>>> frozen = jsonpickle.encode(my_complex_object_instance)
>>> thawed = jsonpickle.decode(frozen)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/jsonpickle/__init__.py",
line 152, in decode
return unpickler.decode(string, backend=backend, keys=keys)
:
:
File "/Library/Python/2.7/site-packages/jsonpickle/unpickler.py",
line 336, in _restore_from_dict
instance[k] = value
File "/Library/Python/2.7/site-packages/botocore/vendored/requests/packages/urllib3/packages/ordered_dict.py",
line 49, in __setitem__
root = self.__root
AttributeError: 'OrderedDict' object has no attribute '_OrderedDict__root'
I don't find much of assistance when googling the error. I do see what looks like the same issue was resolved at some time past for simpler objects:
https://github.com/jsonpickle/jsonpickle/issues/33
The cited example in that report works for me:
>>> jsonpickle.decode(jsonpickle.encode(collections.OrderedDict()))
OrderedDict()
>>> jsonpickle.decode(jsonpickle.encode(collections.OrderedDict(a=1)))
OrderedDict([(u'a', 1)])
Has anyone ever run into this themselves and found a solution? I ask with the understanding that my case may be "differently idiosynchratic" than another known example.
The requests module for me seems to be running into problems when I .decode(). After looking at the jsonpickle code a bit, I decided to fork it and change the following lines to see what was going on (and I ended up keeping a private copy of jsonpickle with the changes so I can move forward).
In jsonpickle/unpickler.py (in my version it's line 368), search for the if statement section in the method _restore_from_dict():
if (util.is_noncomplex(instance) or
util.is_dictionary_subclass(instance)):
instance[k] = value
else:
setattr(instance, k, value)
and change it to this (it will logERROR the ones that are failing and then you can either keep the code in place or change your OrderedDict's version that have __root)
if (util.is_noncomplex(instance) or
util.is_dictionary_subclass(instance)):
# Currently requests.adapters.HTTPAdapter is using a non-standard
# version of OrderedDict which doesn't have a _OrderedDict__root
# attribute
try:
instance[k] = value
except AttributeError as e:
import logging
import pprint
warnmsg = 'Unable to unpickle {}[{}]={}'.format(pprint.pformat(instance), pprint.pformat(k), pprint.pformat(value))
logging.error(warnmsg)
else:
setattr(instance, k, value)

Setting a global variable at the end of a generator, persistence of loop variables

I want to know the number of items that a generator has generated.
I'm trying to do this by using the output of enumerate to set a global variable. It works on simple tests but goes wrong once I try to adapt the technique to my real application case.
The following script tests first a generator based on an iteration over the lines of a file, then a generator based on the parsing of a file using a bioinformatics library I want to use:
#!/usr/bin/env python3
def test1(delete=False):
# I have to comment the following otherwise I get:
# $ ./test.py
# Traceback (most recent call last):
# File "./test.py", line 60, in <module>
# test1()
# File "./test.py", line 31, in test1
# print(nb_things)
# UnboundLocalError: local variable 'nb_things' referenced before assignment
# if delete:
# try:
# del nb_things
# print("deleted nb_things")
# except NameError:
# pass
with open("test.py") as this_file:
def my_gen():
for i, thing in enumerate(this_file, start=1):
yield "just_to_test"
global nb_things
nb_things = i
return
g = my_gen()
for _ in g:
pass
print(nb_things)
return 0
import pysam
def test2(delete=False):
if delete:
try:
del nb_things
print("deleted nb_things")
except NameError:
pass
with pysam.AlignmentFile("/path/to/a/bam/file", "rb") as bamfile:
def my_gen():
for i, thing in enumerate(bamfile.fetch(), start=1):
yield "just_to_test"
global nb_things
nb_things = i
return
g = my_gen()
for _ in g:
pass
print(nb_things)
return 0
if __name__ == "__main__":
test1()
print("end of test 1")
test2()
print("end of test 2")
(As you can see in the comment in the above script, very strange things happen if I include code that mention my global variable without even being executed.)
When I execute the above code, the first test succeeds, but not the second, despite a very similar code structure:
$ ./test.py
63
end of test 1
Traceback (most recent call last):
File "./test.py", line 62, in <module>
test2()
File "./test.py", line 53, in test2
for _ in g:
File "./test.py", line 49, in my_gen
nb_things = i
UnboundLocalError: local variable 'i' referenced before assignment
My main question is:
Why does the enumeration counter still exist after the end of the for loop in the first case and not in the second?
I suspect that this has to do with the way the iteration is stopped. In the second case the generator somehow causes the enumerate result to cease to exist after the internal iterator gets stops.
What could cause such a difference?
A second question that occurred to me while designing the above test script is the following:
Why is the global variable nb_things considered local if I put code referencing it but not even executed? (note the delete=False, and the absence of a message mentioning the deletion)
I'm using python 3.6 and pysam version 0.10.0.
For an earlier version of the real code (but the essential approach is there), and clues regarding why I ended up defining my generator in the main function, see this question. (Essentially, the reason is that the generator actually uses a function that is defined depending on command-line options.)

Python - multiprocessing.pool.MaybeEncodingError while downloading images [duplicate]

Why does the code below work only with multiprocessing.dummy, but not with simple multiprocessing.
import urllib.request
#from multiprocessing.dummy import Pool #this works
from multiprocessing import Pool
urls = ['http://www.python.org', 'http://www.yahoo.com','http://www.scala.org', 'http://www.google.com']
if __name__ == '__main__':
with Pool(5) as p:
results = p.map(urllib.request.urlopen, urls)
Error :
Traceback (most recent call last):
File "urlthreads.py", line 31, in <module>
results = p.map(urllib.request.urlopen, urls)
File "C:\Users\patri\Anaconda3\lib\multiprocessing\pool.py", line 268, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\patri\Anaconda3\lib\multiprocessing\pool.py", line 657, in get
raise self._value
multiprocessing.pool.MaybeEncodingError: Error sending result: '[<http.client.HTTPResponse object at 0x0000016AEF204198>]'. Reason: 'TypeError("cannot serialize '_io.BufferedReader' object")'
What's missing so that it works without "dummy" ?
The http.client.HTTPResponse-object you get back from urlopen() has a _io.BufferedReader-object attached, and this object cannot be pickled.
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
Traceback (most recent call last):
...
pickle.dumps(urllib.request.urlopen('http://www.python.org').fp)
TypeError: cannot serialize '_io.BufferedReader' object
multiprocessing.Pool will need to pickle (serialize) the results to send it back to the parent process and this fails here. Since dummy uses threads instead of processes, there will be no pickling, because threads in the same process share their memory naturally.
A general solution to this TypeError is:
read out the buffer and save the content (if needed)
remove the reference to '_io.BufferedReader' from the object you are trying to pickle
In your case, calling .read() on the http.client.HTTPResponse will empty and remove the buffer, so a function for converting the response into something pickleable could simply do this:
def read_buffer(response):
response.text = response.read()
return response
Example:
r = urllib.request.urlopen('http://www.python.org')
r = read_buffer(r)
pickle.dumps(r)
# Out: b'\x80\x03chttp.client\nHTTPResponse\...
Before you consider this approach, make sure you really want to use multiprocessing instead of multithreading. For I/O-bound tasks like you have it here, multithreading would be sufficient, since most of the time is spend in waiting (no need for cpu-time) for the response anyway. Multiprocessing and the IPC involved also introduces substantial overhead.

implementing a deferred exception in Python

I would like to implement a deferred exception in Python that is OK to store somewhere but as soon as it is used in any way, it raises the exception that was deferred. Something like this:
# this doesn't work but it's a start
class DeferredException(object):
def __init__(self, exc):
self.exc = exc
def __getattr__(self, key):
raise self.exc
# example:
mydict = {'foo': 3}
try:
myval = obtain_some_number()
except Exception as e:
myval = DeferredException(e)
mydict['myval'] = myval
def plus_two(x):
print x+2
# later on...
plus_two(mydict['foo']) # prints 5
we_dont_use_this_val = mydict['myval'] # Always ok to store this value if not used
plus_two(mydict['myval']) # If obtain_some_number() failed earlier,
# re-raises the exception, otherwise prints the value + 2.
The use case is that I want to write code to analyze some values from incoming data; if this code fails but the results are never used, I want it to fail quietly; if it fails but the results are used later, then I'd like the failure to propagate.
Any suggestions on how to do this? If I use my DeferredException class I get this result:
>>> ke = KeyError('something')
>>> de = DeferredException(ke)
>>> de.bang # yay, this works
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 6, in __getattr__
KeyError: 'something'
>>> de+2 # boo, this doesn't
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'DeferredException' and 'int'
Read section 3.4.12 of the docs, "Special method lookup for new-style classes." It explains exactly the problem you have encountered. The normal attribute lookup is bypassed by the interpreter for certain operators, such as addition (as you found out the hard way). Thus the statement de+2 in your code never calls your getattr function.
The only solution, according to that section, is to insure that "the special method must be set on the class object itself in order to be consistently invoked by the interpreter."
Perhaps you'd be better off storing all your deferred exceptions in a global list, wrapping your entire program in a try:finally: statement, and printing out the whole list in the finally block.

PicklingError: Can't pickle <class 'decimal.Decimal'>: it's not the same object as decimal.Decimal

This is the error I got today at <a href"http://filmaster.com">filmaster.com:
PicklingError: Can't pickle <class
'decimal.Decimal'>: it's not the same
object as decimal.Decimal
What does that exactly mean? It does not seem to be making a lot of sense...
It seems to be connected with django caching. You can see the whole traceback here:
Traceback (most recent call last):
File
"/home/filmaster/django-trunk/django/core/handlers/base.py",
line 92, in get_response response =
callback(request, *callback_args,
**callback_kwargs)
File
"/home/filmaster/film20/film20/core/film_views.py",
line 193, in show_film
workflow.set_data_for_authenticated_user()
File
"/home/filmaster/film20/film20/core/film_views.py",
line 518, in
set_data_for_authenticated_user
object_id = self.the_film.parent.id)
File
"/home/filmaster/film20/film20/core/film_helper.py",
line 179, in get_others_ratings
set_cache(CACHE_OTHERS_RATINGS,
str(object_id) + "_" + str(user_id),
userratings)
File
"/home/filmaster/film20/film20/utils/cache_helper.py",
line 80, in set_cache return
cache.set(CACHE_MIDDLEWARE_KEY_PREFIX
+ full_path, result, get_time(cache_string))
File
"/home/filmaster/django-trunk/django/core/cache/backends/memcached.py",
line 37, in set
self._cache.set(smart_str(key), value,
timeout or self.default_timeout)
File
"/usr/lib/python2.5/site-packages/cmemcache.py",
line 128, in set val, flags =
self._convert(val)
File
"/usr/lib/python2.5/site-packages/cmemcache.py",
line 112, in _convert val =
pickle.dumps(val, 2)
PicklingError: Can't pickle <class
'decimal.Decimal'>: it's not the same
object as decimal.Decimal
And the source code for Filmaster can be downloaded from here: bitbucket.org/filmaster/filmaster-test
Any help will be greatly appreciated.
I got this error when running in an jupyter notebook. I think the problem was that I was using %load_ext autoreload autoreload 2. Restarting my kernel and rerunning solved the problem.
One oddity of Pickle is that the way you import a class before you pickle one of it's instances can subtly change the pickled object. Pickle requires you to have imported the object identically both before you pickle it and before you unpickle it.
So for example:
from a.b import c
C = c()
pickler.dump(C)
will make a subtly different object (sometimes) to:
from a import b
C = b.c()
pickler.dump(C)
Try fiddling with your imports, it might correct the problem.
I will demonstrate the problem with simple Python classes in Python2.7:
In [13]: class A: pass
In [14]: class B: pass
In [15]: A
Out[15]: <class __main__.A at 0x7f4089235738>
In [16]: B
Out[16]: <class __main__.B at 0x7f408939eb48>
In [17]: A.__name__ = "B"
In [18]: pickle.dumps(A)
---------------------------------------------------------------------------
PicklingError: Can't pickle <class __main__.B at 0x7f4089235738>: it's not the same object as __main__.B
This error is shown because we are trying to dump A, but because we changed its name to refer to another object "B", pickle is actually confused with which object to dump - class A or B. Apparently, pickle guys are very smart and they have already put a check on this behavior.
Solution:
Check if the object you are trying to dump has conflicting name with another object.
I have demonstrated debugging for the case presented above with ipython and ipdb below:
PicklingError: Can't pickle <class __main__.B at 0x7f4089235738>: it's not the same object as __main__.B
In [19]: debug
> /<path to pickle dir>/pickle.py(789)save_global()
787 raise PicklingError(
788 "Can't pickle %r: it's not the same object as %s.%s" %
--> 789 (obj, module, name))
790
791 if self.proto >= 2:
ipdb> pp (obj, module, name) **<------------- you are trying to dump obj which is class A from the pickle.dumps(A) call.**
(<class __main__.B at 0x7f4089235738>, '__main__', 'B')
ipdb> getattr(sys.modules[module], name) **<------------- this is the conflicting definition in the module (__main__ here) with same name ('B' here).**
<class __main__.B at 0x7f408939eb48>
I hope this saves some headaches! Adios!!
I can't explain why this is failing either, but my own solution to fix this was to change all my code from doing
from point import Point
to
import point
this one change and it worked. I'd love to know why... hth
There can be issues starting a process with multiprocessing by calling __init__. Here's a demo:
import multiprocessing as mp
class SubProcClass:
def __init__(self, pipe, startloop=False):
self.pipe = pipe
if startloop:
self.do_loop()
def do_loop(self):
while True:
req = self.pipe.recv()
self.pipe.send(req * req)
class ProcessInitTest:
def __init__(self, spawn=False):
if spawn:
mp.set_start_method('spawn')
(self.msg_pipe_child, self.msg_pipe_parent) = mp.Pipe(duplex=True)
def start_process(self):
subproc = SubProcClass(self.msg_pipe_child)
self.trig_proc = mp.Process(target=subproc.do_loop, args=())
self.trig_proc.daemon = True
self.trig_proc.start()
def start_process_fail(self):
self.trig_proc = mp.Process(target=SubProcClass.__init__, args=(self.msg_pipe_child,))
self.trig_proc.daemon = True
self.trig_proc.start()
def do_square(self, num):
# Note: this is an synchronous usage of mp,
# which doesn't make sense. But this is just for demo
self.msg_pipe_parent.send(num)
msg = self.msg_pipe_parent.recv()
print('{}^2 = {}'.format(num, msg))
Now, with the above code, if we run this:
if __name__ == '__main__':
t = ProcessInitTest(spawn=True)
t.start_process_fail()
for i in range(1000):
t.do_square(i)
We get this error:
Traceback (most recent call last):
File "start_class_process1.py", line 40, in <module>
t.start_process_fail()
File "start_class_process1.py", line 29, in start_process_fail
self.trig_proc.start()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 212, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/context.py", line 274, in _Popen
return Popen(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/popen_spawn_posix.py", line 33, in __init__
super().__init__(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
self._launch(process_obj)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/popen_spawn_posix.py", line 48, in _launch
reduction.dump(process_obj, fp)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/reduction.py", line 59, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function SubProcClass.__init__ at 0x10073e510>: it's not the same object as __main__.__init__
And if we change it to use fork instead of spawn:
if __name__ == '__main__':
t = ProcessInitTest(spawn=False)
t.start_process_fail()
for i in range(1000):
t.do_square(i)
We get this error:
Process Process-1:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/process.py", line 254, in _bootstrap
self.run()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
TypeError: __init__() missing 1 required positional argument: 'pipe'
But if we call the start_process method, which doesn't call __init__ in the mp.Process target, like this:
if __name__ == '__main__':
t = ProcessInitTest(spawn=False)
t.start_process()
for i in range(1000):
t.do_square(i)
It works as expected (whether we use spawn or fork).
Did you somehow reload(decimal), or monkeypatch the decimal module to change the Decimal class? These are the two things most likely to produce such a problem.
Same happened to me
Restarting the kernel worked for me
Due to the restrictions based upon reputation I cannot comment, but the answer of Salim Fahedy and following the debugging-path set me up to identify a cause for this error, even when using dill instead of pickle:
Under the hood, dill also accesses some functions of dill. And in pickle._Pickler.save_global() there is an import happening. To me it seems, that this is more of a "hack" than a real solution as this method fails as soon as the class of the instance you are trying to pickle is not imported from the lowest level of the package the class is in. Sorry for the bad explanation, maybe examples are more suitable:
The following example would fail:
from oemof import solph
...
(some code here, giving you the object 'es')
...
model = solph.Model(es)
pickle.dump(model, open('file.pickle', 'wb))
It fails, because while you can use solph.Model, the class actually is oemof.solph.models.Model for example. The save_global() resolves that (or some function before that which passes it to save_global()), but then imports Model from oemof.solph.models and throws an error, because it's not the same import as from oemof import solph.Model (or something like that, I'm not 100% sure about the workings).
The following example would work:
from oemof.solph.models import Model
...
some code here, giving you the object 'es')
...
model = Model(es)
pickle.dump(model, open('file.pickle', 'wb'))
It works, because now the Model object is imported from the same place, the pickle._Pickler.save_global() imports the comparison object (obj2) from.
Long story short: When pickling an object, make sure to import the class from the lowest possible level.
Addition: This also seems to apply to objects stored in the attributes of the class-instance you want to pickle. If for example model had an attribute es that itself is an object of the class oemof.solph.energysystems.EnergySystem, we would need to import it as:
from oemof.solph.energysystems import EnergySystem
es = EnergySystem()
My issue was that I had a function with the same name defined twice in a file. So I guess it was confused about which one it was trying to pickle.
I had same problem while debugging (Spyder). Everything worked normally if run the program. But, if I start to debug I faced the picklingError.
But, once I chose the option Execute in dedicated console in Run configuration per file (short-cut: ctrl+F6) everything worked normally as expected. I do not know exactly how it is adapting.
Note: In my script I have many imports like
from PyQt5.QtWidgets import *
from PyQt5.Qt import *
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
import os, sys, re, math
My basic understanding was, because of star (*) I was getting this picklingError.
I had a problem that no one has mentioned yet. I have a package with a __init__ file that does, among other things:
from .mymodule import cls
Then my top-level code says:
import mypkg
obj = mypkg.cls()
The problem with this is that in my top-level code, the type appears to be mypkg.cls, but it's actually mypkg.mymodule.cls. Using the full path:
obj = mypkg.mymodule.cls()
avoids the error.
I had the same error in Spyder. Turned out to be simple in my case. I defined a class named "Class" in a file also named "Class". I changed the name of the class in the definition to "Class_obj". pickle.dump(Class_obj,fileh) works, but pickle.dump(Class,fileh) does not when its saved in a file named "Class".
This miraculous function solves the mentioned error, but for me it turned out to another error 'permission denied' which comes out of the blue. However, I guess it might help someone find a solution so I am still posting the function:
import tempfile
import time
from tensorflow.keras.models import save_model, Model
# Hotfix function
def make_keras_picklable():
def __getstate__(self):
model_str = ""
with tempfile.NamedTemporaryFile(suffix='.hdf5', delete=True) as fd:
save_model(self, fd.name, overwrite=True)
model_str = fd.read()
d = {'model_str': model_str}
return d
def __setstate__(self, state):
with tempfile.NamedTemporaryFile(suffix='.hdf5', delete=True) as fd:
fd.write(state['model_str'])
fd.flush()
model = load_model(fd.name)
self.__dict__ = model.__dict__
cls = Model
cls.__getstate__ = __getstate__
cls.__setstate__ = __setstate__
# Run the function
make_keras_picklable()
### create then save your model here ###

Categories