python multiprocessing map function - python

I encountered a problem while writing the python code with a multiprocessing map function. The minimum code to reproduce the problem is like
import multiprocessing as mp
if __name__ == '__main__':
def f(x):
return x*x
num_workers = 2
with mp.Pool(num_workers) as p:
print(p.map(f, [1,2,3]))
If one runs this piece of code, I got the error message
AttributeError: Can't get attribute 'f' on <module '__mp_main__' from 'main.py'>
However, If I move f-function outside the main function, i.e.
import multiprocessing as mp
def f(x):
return x*x
if __name__ == '__main__':
num_workers = 2
with mp.Pool(num_workers) as p:
print(p.map(f, [1,2,3]))
It works this time. I am wondering what's the difference between them and how can I get an error in the first version. Thanks in advance.

Depending on your operating system, sub-processes will either be forked or spawned. macOS, for example, will spawn whereas Windows will fork.
You can enforce forking but you need to fully understand the implications of doing so.
For this specific question a workaround could be implemented thus:
import multiprocessing as mp
from multiprocessing import set_start_method
if __name__ == '__main__':
def f(x):
return x*x
set_start_method('fork')
num_workers = 2
with mp.Pool(num_workers) as p:
print(p.map(f, [1,2,3]))

This will vary between operating systems, but the basic reason is that this line of code
if __name__ == '__main__':
is telling the Python interpreter to only include anything in this code section in the main process when run as a script - it won't be included in any sub process, nor will it appear if you import it as a module. So when you do this
import multiprocessing as mp
if __name__ == '__main__':
def f(x):
return x*x
num_workers = 2
with mp.Pool(num_workers) as p:
print(p.map(f, [1,2,3]))
any sub processes created by p.map will not have the definition of function f

Related

Multiprocessing hanging in Spyder

I have been trying the pool.map() multiprocessing with python3 and not matter how I simplify my function it hangs and shows that the code is still running and gives no results.
I am using Windows.Here is my code:
import multiprocessing as mp
def f(x):
return x + 1
if __name__ == '__main__':
with mp.Pool() as pool:
print(pool.map(f, range(10)))
Can anyone tell me how I can solve this problem?
Thanks!

Python Multiprocessing wouldn't Work

I'm playing with Python multiprocessing. But it wouldn't work on my system.
I ran the example code I found on multiprocessing page. But it just hangs there and the CPU usage is 0%. What do I do to make it work? Thanks a lot!
https://docs.python.org/2/library/multiprocessing.html
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
update: just tried to run the same code in command line and get the following error message.
Error Message

Python Multiprocessing Pool Class won't work

I am trying to use multiprocessing on a different problem but I can't get it to work. To make sure I'm using the Pool class correctly, I made the following simpler problem but even that won't work. What am I doing wrong here?
from multiprocessing import Pool
def square(x):
sq = x**2
return sq
def main():
x1 = [1,2,3,4]
pool = Pool()
result = pool.map( square, x1 )
print(result)
if __name__ == '__main__': main()
The computer just seems to run forever and I need to close and restart the IPython shell before I can do anything.
I figured out what was wrong. I named the script "multiprocessing.py" which is the name of the module that was being imported. This resulted in the script attempting to import itself instead of the actual module.

Python multiprocessing cannot access variable defined in if __name__ == "main"

I use windows. I don't understand why the following code fails.
a, when defined in the "if main" block, should be a global variable. But running the script I got error "a is not defined". However, if a is defined outside the "if main" block, the code will work.
from multiprocessing import Pool
import numpy as np
# a = np.array([1,2,3])
def f(x):
return a*x
if __name__ == '__main__':
a = np.array([1,2,3])
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
Only the main thread has __name__=='__main__'. The other child threads import your code from a spawned process without setting __name__ to __main__. This is intentional, and required in windows (where fork() is not available), to provide a mechanism for executing code like initializing the pool only in the parent. See discussion here: Workaround for using __name__=='__main__' in Python multiprocessing

python multiprocessing not working at all

So I took the following code, ran it, and literally nothing happened. Python acted like it had finished everything (maybe it did) but nothing printed. Any help getting this to work would be greatly appreciated!
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
test = multiprocessing.Process(target=worker, args=[0,1,2,3,4])
test.start()
Your code should actually result in an error. The args argument to multiprocessing.Process() does not open a process for each argument, it just supplies the arguments in the list to a single function and then calls that function in a child process. To run 5 separate instances like that, you would have to do something like this:
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
procs = []
for i in range(5):
procs.append(multiprocessing.Process(target=worker, args=[i]))
[proc.start() for proc in procs]
Your code tries to run worker(0,1,2,3,4) in a new process. If you want to execute worker() function in parallel in multiple processes:
from multiprocessing import Pool
def worker(number):
return number*number
if __name__ == '__main__':
pool = Pool() # use all available CPUs
for square in pool.imap(worker, [0,1,2,3,4]):
print(square)
Your code results in error when I run it. Since args are parsed using commas, you need to specify that the entire array consists of a single argument.
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
test = multiprocessing.Process(target=worker, args=([0,1,2,3,4],))
test.start()
test.join()
Also, don't forget to join the process at the end.

Categories