Why cant method be pickled in python multiprocessing? - python

I'm new with all the multiprocessing stuff and my current program doesnt work. I read the last hours about the problem and I tried a lot, the method in or out of the class and within a different class and it didnt work.
import multiprocessing as mp
class A:
#staticmethod
def multi():
a = [1,2,3]
b = 4
prepared = list()
for x in a:
prepared.append((x, b))
pool = mp.Pool(mp.cpu_count()-1)
result = pool.starmap(method, prepared)
pool.close()
pool.join()
print(result)
def method(a, x):
return (a-x, a+x)
if __name__ == "__main__":
a = A()
a.multi()
This is just an example how my class/method structure looks like (and this one does work, even though I changed nothing in the multiprocessing part).
This is the exception I get:
AttributeError: Can't pickle local object 'FeatureExtracter.<locals>.feature_extracter_fwd'
It would be nice if someone knows the solution or at least why the method cant be pickled.

import multiprocessing as mp
class A:
#staticmethod
def multi():
b = 4
return [(x, b) for x in [1,2,3]]
def method(a,x): return (a-x, a+x)
if __name__ == "__main__":
with mp.Pool(mp.cpu_count() - 1) as p:
result = p.starmap(method, A().multi())
print(result)

Related

How to set an instance attribute in parallel in a Python class?

I want to set an instance attribute by running an instance method in parallel. Let's say the attribute is initially an empty dictionary called d, and I want to update it in parallel by an instance method called update_d. I am currentely using multiprocessing.Pool:
from multiprocessing import Pool
import random
class A():
def __init__(self, n_jobs):
self.d = dict()
self.n_jobs = n_jobs
pool = Pool(self.n_jobs)
pool.map(self.update_d, range(100))
pool.close()
def update_d(self, key):
self.d[key] = random.randint(0, 100)
if __name__ == '__main__':
a = A(n_jobs=4)
print(a.d)
However, the attribute is not updated after running update_d in parallel. I understand that it's because multiprocessing.Pool always folks the instance to individual processes. But I want to know what is the recommended way to do this in Python? Note that I don't want to return anything from update_d, and we can assume that the code is written in a way that the individual processes won't conflict with each other.
Edit: I just use dictionary as an example. I need a solution that allows the attribute to be any type of variable, e.g. a Pandas dataframe.
You may need a Manager to create a dict for you. I still don't know how well the updates will work, whether there will be any race conditions.
from multiprocessing import Pool, Manager
import random
class A():
def __init__(self, n_jobs, manager):
self.d = manager.dict()
self.n_jobs = n_jobs
pool = Pool(self.n_jobs)
pool.map(self.update_d, range(100))
pool.close()
def update_d(self, key):
self.d[key] = random.randint(0, 100)
if __name__ == '__main__':
with Manager() as manager:
a = A(n_jobs=4, manager=manager)
print(a.d)

Multiprocessing goes stuck after .get()

I'm trying to understand how multiprocessing works in python and having some issues.
This is the example:
import multiprocessing
def func():
return 1
p = multiprocessing.Pool()
result = p.apply_async(func).get()
When .get() function is called, the code is just stuck. What am I doing wrong?
You need to add these 2 lines inside if __name__ == "__main__":
so now your code must look like
import multiprocessing
def func():
return 1
if __name__ == "__main__":
p = multiprocessing.Pool()
result = p.apply_async(func).get()
If this is being called as an import, it will cause an infinite sequence of new processes. And adding them inside the if block works because if statement won't execute during import.
i do not have enough details to know exactly what is the problem.
but i have real strong guess that putting those lines:
p = multiprocessing.Pool()
result = p.apply_async(func).get()
inside a function will fix your problem.
try this:
import multiprocessing
def func():
return 1
def main():
p = multiprocessing.Pool()
result = p.apply_async(func).get()
print(result)
if __name__ == '__main__':
main()
tell me if it worked :)

python thread pool copy parameters

I'm learning about multithreading and I try to implement a few things to understand it.
After reading several (and very technical topics) I cannot find a solution or way to understand my issue.
Basically, I have the following structure:
class MyObject():
def __init__():
self.lastupdate = datetime.datetime.now()
def DoThings():
...
def MyThreadFunction(OneOfMyObject):
OneOfMyObject.DoThings()
OneOfMyObject.lastupdate = datetime.datetime.now()
def main():
MyObject1 = MyObject()
MyObject2 = MyObject()
MyObjects = [MyObject1, MyObject2]
pool = Pool(2)
while True:
pool.map(MyThreadFunction, MyObjects)
if __name__ == '__main__':
main()
I think the function .map make a copy of my objects because it does not update the time. Is it right ? if yes, how could I input a Global version of my objects. If not, would you have any idea why the time is fixed in my objects ?
When I check the new time with print(MyObject.lastupdate), the time is right, but not in the next loop
Thank you very much for any of your ideas
Yes, python threading will serialize (actually, pickle) your objects and then reconstruct them in the thread. However, it also sends them back. To recover them, see the commented additions to the code below:
class MyObject():
def __init__():
self.lastupdate = datetime.datetime.now()
def DoThings():
...
def MyThreadFunction(OneOfMyObject):
OneOfMyObject.DoThings()
OneOfMyObject.lastupdate = datetime.datetime.now()
# NOW, RETURN THE OBJECT
return oneOfMyObject
def main():
MyObject1 = MyObject()
MyObject2 = MyObject()
MyObjects = [MyObject1, MyObject2]
with Pool(2) as pool: # <- this is just a neater way of doing it than a while loop for various reasons. Checkout context managers if interested.
# Now we recover a list of the updated objects:
processed_object_list = pool.map(MyThreadFunction, MyObjects)
# Now inspect
for my_object in processed_object_list:
print(my_object.lastupdate)
if __name__ == '__main__':
main()

Python multiprocessing silent failure with class

the following does not work using python 2.7.9, but also does not throw any error or exception. is there a bug, or can multiprocessing not be used in a class?
from multiprocessing import Pool
def testNonClass(arg):
print "running %s" % arg
return arg
def nonClassCallback(result):
print "Got result %s" % result
class Foo:
def __init__(self):
po = Pool()
for i in xrange(1, 3):
po.apply_async(self.det, (i,), callback=self.cb)
po.close()
po.join()
print "done with class"
po = Pool()
for i in xrange(1, 3):
po.apply_async(testNonClass, (i,), callback=nonClassCallback)
po.close()
po.join()
def cb(self, r):
print "callback with %s" % r
def det(self, M):
print "method"
return M+2
if __name__ == "__main__":
Foo()
running prints this:
done with class
running 1
running 2
Got result 1
Got result 2
EDIT: THis seems related, but it uses .map, while I specifically am needing to use apply_async which seems to matter in terms of how multiprocessing works with class instances (e.g. I dont have a picklnig error, like many other questions related to this) - Python how to do multiprocessing inside of a class?
Processes don't share state or memory by default, each process is an independent program. You need to either 1) use threading 2) use specific types capable of sharing state or 3) design your program to avoid shared state and rely on return values instead.
Update
You have two issues in your code, and one is masking the other.
1) You don't do anything with the result of the apply_async, I see that you're using callbacks, but you still need to catch the results and handle them. Because you're not doing this, you're not seeing the error caused by the second problem.
2) Methods of an object cannot be passed to other processes... I was really annoyed when I first discovered this, but there is an easy workaround. Try this:
from multiprocessing import Pool
def _remote_det(foo, m):
return foo.det(m)
class Foo:
def __init__(self):
pass
po = Pool()
results = []
for i in xrange(1, 3):
r = po.apply_async(_remote_det, (self, i,), callback=self.cb)
results.append(r)
po.close()
for r in results:
r.wait()
if not r.successful():
# Raises an error when not successful
r.get()
po.join()
print "done with class"
def cb(self, r):
print "callback with %s" % r
def det(self, M):
print "method"
return M+2
if __name__ == "__main__":
Foo()
I'm pretty sure it can be used in a class, but you need to protect the call to Foo inside of a clause like:
if name == "__main__":
so that it only gets called in the main thread. You may also have to alter the __init__ function of the class so that it accepts a pool as an argument instead of creating a pool.
I just tried this
from multiprocessing import Pool
#global counter
#counter = 0
class Foo:
def __init__(self, po):
for i in xrange(1, 300):
po.apply_async(self.det, (i,), callback=self.cb)
po.close()
po.join()
print( "foo" )
#print counter
def cb(self, r):
#global counter
#print counter, r
counter += 1
def det(self, M):
return M+2
if __name__ == "__main__":
po = Pool()
Foo(po)
and I think I know what the problem is now. Python isn't multi-threaded; global interpreter lock prevents that. Python is using multiple processes, instead, so the sub-processes in the Pool don't have access to the standard output of the main process.
The subprocesses also are unable to modify the variable counter because it exists in a different process (I tried running with the counter lines commented out and uncommented). Now, I do recall seeing cases where global state variables get altered by processes in the pool, so I don't know all of the minutiae. I do know that it is, in general, a bad idea to have global state variables like that, if for no other reason than they can lead to race conditions and/or wasted time with locks and waiting for access to the global variable.

Process containing object method doesn't recognize edit to object

I have the following situation process=Process(target=sample_object.run) I then would like to edit a property of the sample_object: sample_object.edit_property(some_other_object).
class sample_object:
def __init__(self):
self.storage=[]
def edit_property(self,some_other_object):
self.storage.append(some_other_object)
def run:
while True:
if len(self.storage) is not 0:
print "1"
#I know it's an infinite loop. It's just an example.
_______________________________________________________
from multiprocessing import Process
from sample import sample_object
from sample2 import some_other_object
class driver:
if __name__ == "__main__":
samp = sample_object()
proc = Process(target=samp.run)
proc.start()
while True:
some = some_other_object()
samp.edit_property(some)
#I know it's an infinite loop
The previous code never prints "1". How would I connect the Process to the sample_object so that an edit made to the object whose method Process is calling is recognized by the process? In other words, is there a way to get .run to recognize the change in sample_object ?
Thank you.
You can use multiprocessing.Manager to share Python data structures between processes.
from multiprocessing import Process, Manager
class A(object):
def __init__(self, storage):
self.storage = storage
def add(self, item):
self.storage.append(item)
def run(self):
while True:
if self.storage:
print 1
if __name__ == '__main__':
manager = Manager()
storage = manager.list()
a = A(storage)
p = Process(target=a.run)
p.start()
for i in range(10):
a.add({'id': i})
p.join()

Categories