Multiprocessing Pool initializer fails pickling

Multiprocessing Pool initializer fails pickling - python

I am trying to use the multiprocessing.Pool to implement a multithread application. To share some variables I am using a Queue as hinted here:
def get_prediction(data):
#here the real calculation will be performed
....
def mainFunction():
def get_prediction_init(q):
print("a")
get_prediction.q = q
queue = Queue()
pool = Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
if __name__== '__main__':
mainFunction()
This code is running perfectly on a Debian machine, but is not working at all on another Windows 10 device. It fails with the error
AttributeError: Can't pickle local object 'mainFunction.<locals>.get_prediction_init'
I do not really know what exactly is causing the error. How can I solve the problem so that I can run the code on the Windows device as well?
EDIT: The problem is solved if I create the get_predediction_init function on the same level as the mainFunction. It has only failed when I defined it as an inner function. Sorry for the confusion in my post.

The problem is in something you haven't shown us. For example, it's a mystery where "mainFunction" came from in the AttributeError message you showed.
Here's a complete, executable program based on the fragment you posted. Worked fine for me under Windows 10 just now, under Python 3.6.1 (I'm guessing you're using Python 3 from your print syntax), printing "a" 16 times:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
if __name__ == "__main__":
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
Edit
And, based on your edit, this program also works fine for me:
import multiprocessing as mp
def get_prediction(data):
#here the real calculation will be performed
pass
def get_prediction_init(q):
print("a")
get_prediction.q = q
def mainFunction():
queue = mp.Queue()
pool = mp.Pool(processes=16, initializer=get_prediction_init, initargs=[queue,])
pool.close()
pool.join()
if __name__ == "__main__":
mainFunction()
Edit 2
And now you've moved the definition of get_prediction_init() into the body of mainFunction. Now I can see your error :-)
As shown, define the function at module level instead. Trying to pickle local function objects can be a nightmare. Perhaps someone wants to fight with that, but not me ;-)

Related

How to get information from parent process in Python 3.9 multiprocessing

I built a simple multiprocessing program in Python. It worked perfectly in Python3.7, but since the upgrade to 3.9 I'm struggling to understand how to make it work now multiprocessing has changed.
The program follows roughly the following pattern:
import multiprocessing
def print_multiprocessing(my_string):
print(my_prefix, my_string)
if __name__ == "__main__":
output = []
for x in range(10):
output.append(x)
my_prefix = input()
pool = multiprocessing.Pool(4)
pool.map_async(print_multiprocessing, output, )
pool.close()
pool.join()
The actual program is more complicated obviously, but this will demonstrate my issue.
In Python3.7, the child processes would automatically inherit the my_prefix variable from the parent, but now in 3.9 that variable isn't available for the child process.
I could declare the variable outside the if __name__ == "__main__": which would mean it gets declared by each child process, but this means I'd have to call the input() function for every child process.
My current workaround is to have the print_multiprocessing function accept a list as an argument, and pass in every value I need as part of that list, but it feels very messy, especially when dealing with multiple data types.
Is there a simple trick I'm missing here?

Why is multiprocessing.manager in python behaving weirdly?

When I run this and input something it goes into the main function but then again asks for input. Why is that even happening?
I am running using command prompt in windows. version is 3.8
import multiprocessing
from concurrent.futures import ProcessPoolExecutor
import concurrent.futures
input('?')
def pp(id,lock):
with lock:
for i in range(5):
print(f'{id}=>{i}')
def main():
pool = ProcessPoolExecutor()
m = multiprocessing.Manager()
lock = m.Lock()
futures = [pool.submit(pp, num,lock) for num in range(10)]
with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
executor.map(main, list(range(10)),[lock]*10)
if __name__=='__main__':
main()
Here is the output:
?abc
?abd
????
How to solve this problem so it runs the input just once?

I cannot reproduce, it only runs once on my local Python.
What is your Python version ?
However, I can recommend putting input inside the if __name__ == "main". The problem is that your input is called whenever you import your module, which could be done by a Thread when importing the main function.
Note: sorry to not post a comment, but I can't with a lower reputation than 50.

Multiprocessing never executing function keeps repeating code before function

I have a multiprocessing pool , that runs with 1 thread, and it keeps repeating the code before my function, i have tried with different threads, and also, i make things like this quite a bit, so i think i know what is causing the problem but i dont understand why, usually i use argparse to to parse files from the user, but i instead wanted to use input, no errors are thrown so i honestly have no clue.
from colorama import Fore
import colorama
import os
import ctypes
import multiprocessing
from multiprocessing import Pool
import random
colorama.init(autoreset=False)
print("headerhere")
#as you can see i used input instead of argparse
g = open(input(Fore.RED + " File Path?: " + Fore.RESET))
gg = open(input(Fore.RED + "File Path?: " + Fore.RESET))
#I messed around with this to see if it was the problem, ultimately disabling it until i fixed it, i just use 1 thread
threads = int(input(Fore.RED + "Amount of Threads?: " + Fore.RESET))
arrange = [lines.replace("\n", "")for lines in g]
good = [items.replace("\n", "") for items in gg]
#this is all of the code before the function that Pool calls
def che(line):
print("f")
#i would show my code but as i said this isnt the problem since ive made programs like this before, the only thing i changed is how i take file inputs from the user
def main():
pool = Pool(1)
pool.daemon = True
result = pool.map(che, arrange)
if __name__ == "__main__":
main()
if __name__ == "__main__":
main()

Here's a minimal, reproducible example of your issue:
from multiprocessing import Pool
print('header')
def func(n):
print(f'func {n}')
def main():
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
On OSes where "spawn" (Windows and MacOS) or "forkserver" (some Unix) are the default start methods, the sub-process imports your script. Since print('header') is at global scope, it will run the first time a script is imported into a process, so the output is:
header
header
header
header
func 1
func 2
func 3
A multiprocessing script should have everything meant to run once inside function(s), and they should be called once by the main script via if_name__ == '__main__':, so the solution is to move it into your def main()::
from multiprocessing import Pool
def func(n):
print(f'func {n}')
def main():
print('header')
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
Output:
header
func 1
func 2
func 3

If you want the top level code before the definition of che to only be executed in the master process, then place it in a function and call that function in main.
In multiprocessing, the top level statements will be interpreted/executed by both the master process and every child process. So, if some code should be executed only by the master and not by the children, then such code should not placed that at the top-level. Instead, such code should be placed in functions and these functions should be invoked in the main scope, i.e., in the scope of if block controlled by __main__ (or called in the main function in your code snippet).

Python Multiprocessing Pool Class won't work

I am trying to use multiprocessing on a different problem but I can't get it to work. To make sure I'm using the Pool class correctly, I made the following simpler problem but even that won't work. What am I doing wrong here?
from multiprocessing import Pool
def square(x):
sq = x**2
return sq
def main():
x1 = [1,2,3,4]
pool = Pool()
result = pool.map( square, x1 )
print(result)
if __name__ == '__main__': main()
The computer just seems to run forever and I need to close and restart the IPython shell before I can do anything.

I figured out what was wrong. I named the script "multiprocessing.py" which is the name of the module that was being imported. This resulted in the script attempting to import itself instead of the actual module.

multiprocessing.Pool hangs if child causes a segmentation fault

I want to apply a function in parallel using multiprocessing.Pool.
The problem is that if one function call triggers a segmentation fault the Pool hangs forever.
Has anybody an idea how I can make a Pool that detects when something like this happens and raises an error?
The following example shows how to reproduce it (requires scikit-learn > 0.14)
import numpy as np
from sklearn.ensemble import gradient_boosting
import time
from multiprocessing import Pool
class Bad(object):
tree_ = None
def fit_one(i):
if i == 3:
# this will segfault
bad = np.array([[Bad()] * 2], dtype=np.object)
gradient_boosting.predict_stages(bad,
np.random.rand(20, 2).astype(np.float32),
1.0, np.random.rand(20, 2))
else:
time.sleep(1)
return i
pool = Pool(2)
out = pool.imap_unordered(fit_one, range(10))
# we will never see 3
for o in out:
print o

As described in the comments, this just works in Python 3 if you use concurrent.Futures.ProcessPoolExecutor instead of multiprocessing.Pool.
If you're stuck on Python 2, the best option I've found is to use the timeout argument on the result objects returned by Pool.apply_async and Pool.map_async. For example:
pool = Pool(2)
out = pool.map_async(fit_one, range(10))
for o in out:
print o.get(timeout=1000) # allow 1000 seconds max
This works as long as you have an upper bound for how long a child process should take to complete a task.

This is a known bug, issue #22393, in Python. There is no meaningful workaround as long as you're using multiprocessing.pool until it's fixed. A patch is available at that link, but it has not been integrated into the main release as yet, so no stable release of Python fixes the problem.

Instead of using Pool().imap() maybe you would rather manually create child processes yourself with Process(). I bet the object returned would allow you to get liveness status of any child. You will know if they hang up.

I haven't run your example to see if it can handle the error, but try concurrent futures. Simply replace my_function(i) with your fit_one(i). Keep the __name__=='__main__': structure. concurrent futures seems to need this. The code below is tested on my machine so will hopefully work straight up on yours.
import concurrent.futures
def my_function(i):
print('function running')
return i
def run():
number_processes=4
executor = concurrent.futures.ProcessPoolExecutor(number_processes)
futures = [executor.submit(my_function,i) for i in range(10)]
concurrent.futures.wait(futures)
for f in futures:
print(f.result())
if __name__ == '__main__':
run()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing Pool initializer fails pickling - python

Related

How to get information from parent process in Python 3.9 multiprocessing

Why is multiprocessing.manager in python behaving weirdly?

Multiprocessing never executing function keeps repeating code before function

Python Multiprocessing Pool Class won't work

multiprocessing.Pool hangs if child causes a segmentation fault

Categories

Resources