python multiprocessing does not run functions - python

I just want to see a simple code implementation of multiprocessing on windows, but it doesn't enter/run functions neither in jupyternotebook or running saved .py
import time
import multiprocessing
s=[1,4]
def subu(remo):
s[remo-1]=remo*9
print(f'here{remo}')
return
if __name__=="__main__":
p1=multiprocessing.Process(target=subu , args=[1])
p2=multiprocessing.Process(target=subu , args=[2])
p1.start()
p2.start()
p1.join()
p2.join()
# print("2222here")
print(s)
input()
the output by .py is:
[1, 4]
[1, 4]
and the output by jupyternotebook is:
[1,4]
which I hoped to be:
here1
here2
[9,18]
what's wrong with code above? and what about this code:
import concurrent
thread_num=2
s=[1,4]
def subu(remo):
s[remo-1]=remo*9
print(f'here{remo}')
return
with concurrent.futures.ProcessPoolExecutor() as executor:
## or if __name__=="__main__":
##... with concurrent.futures.ProcessPoolExecutor() as executor:
results=[executor.submit(subu,i) for i in range(thread_num)]
for f in concurrent.futures.as_completed(results):
print(f.result())
input()
doesnot run at all in jupyter pulling error
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I kinda know I can't expect jupyter to run multiprocessing. but saved.py also can't run it. and it exits without waiting for input()

There are a couple of potential problems. The worker function needs to be importable (at least on Windows) so that it can be found by the subprocess. And since subprocess memory isn't visible to the parent, the results need to be returned. So, putting the worker in a separate module
subumodule.py
def subu(remo):
remo = remo*9
print(f'here{remo}')
return remo
And using a process pool's existing infrastructure to return a worker return value to the parent. You could
import time
import multiprocessing
if __name__=="__main__":
with multiprocessing.Pool(2) as pool:
s = list(pool.map(subu, (1,2))) #here
print(s)
input()

Related

Python3 Process.join() not actually waiting on Linux when the process is created in multi-thread

I need to put a timeout on a process that is created inside a thread, however i encountered a strange behavoir and i'm not sure how to proceed.
The following code executed on Linux produces a wierd bug where (if the number of thrads is greater than 2 (my laptop has 8 core) or the code is executed in a loop for a few times) the process.join() doesn't actually wait for the process to finish or the timeout to expire but just goes on with the next instruction.
If the same code is executed on Windows with python 3.9 it gives a circular import error in the libraries for no reason.
If it is executed with python 3.8 it works almost perfectly until like 256 threads, then gives the same stange beahvour on process.join() as in linux.
Error on windows Python 3.9:
ImportError: cannot import name 'Queue' from partially initialized module 'multiprocessing.queues' (most likely due to a circular import)
Furthermore if i remove the return value from the process, so i remove the Queue. On linux the process.join() start working properly for arbitrarily large n_threads. However running the code in a loop stiil gives the error even for very small n_threads.
import random
from multiprocessing import Process, Queue
from threading import Thread
def dummy_process():
return random.randint(1, 10)
#function to retrieve process return value
def process_returner(queue, function, args):
queue.put(function(*args))
#function that creates the process with timeout
def execute_with_timeout(function, args, timeout=3):
q = Queue()
p1 = Process(
target=process_returner,
args=(q, function, args),
name="P",
)
p1.start()
p1.join(timeout=timeout) # SOMETIME IT DOES NOT WAIT FOR THE PROCESS TO FINISH
if p1.exitcode is None:
print(f"Oops, {p1} timeouts!")# SO IT RAISES THIS ERROR even if nowhere near 3 secods have passed
raise TimeoutError
p1.terminate()
return q.get() if not q.empty() else None
#thread that just call the new process and stores the return value in the given array
def dummy_thread(result_array, index):
try:
result_array[index] = execute_with_timeout(dummy_process, args=())
except TimeoutError:
pass
def test():
#in loop because with low n_threads as 4 the error is not so common
for _ in range(10):
n_threads =8
results = [-1] * n_threads
threads = set()
for i in range(n_threads):
t = Thread(target=dummy_thread, args=(results, i))
threads.add(t)
t.start()
for t in threads:
t.join()
print(results)
if __name__ == '__main__':
test()
I ran into a similar problem when using the multiprocessing module on Linux. Process.join() started returning immediately instead of waiting. exitcode would be equal to None and is_alive() would return True.
It turns out the problem wasn't in the Python code. I was calling my Python program from a Bash script that would sometimes execute trap "" SIGCHLD. Normally, setting trap only affects the script itself, but trap "" some_signal tells the shell's child processes to ignore the signal as well. Blocking SIGCHLD interferes with the multiprocessing module.
In my case, adding signal.signal(signal.SIGCHLD, signal.SIG_DFL) to the beginning of the Python program fixed the problem.

Python multiprocessing loses activity without exiting file

I have a problem where my .py file, which uses maximum CPU through multiprocessing, stops operating without exiting the .py file.
I am running a heavy task that uses all cores in an old MacBook Pro (2012). The task runs fine at first, where I can visually see four python3.7 tasks populate the Activity Monitor window. However, after about 20 minutes, those four python3.7 disappear from the Activity Monitor.
The strangest part is the multiprocessing .py file is still operating, i.e. it never threw an uncaught exception nor exited the file.
Would you guys/gals have any ideas as to what's going on? My guess is 1) it's most likely an error in the script, and 2) the old computer is overheating.
Thanks!
Edit: Below is the multiprocess code, where the multiprocess function to execute is func with a list as its argument. I hope this helps!
import multiprocessing
def main():
pool = multiprocessing.Pool()
for i in range(24):
pool.apply_async(func, args = ([], ))
pool.close()
pool.join()
if __name__ == '__main__':
main()
Use a context manager to handle closing processes properly.
from multiprocessing import Pool
def main():
with Pool() as p:
result = p.apply_async(func, args = ([], ))
print(result)
if __name__ == '__main__':
main()
I wasn't sure what you were doing with the for i in range() part.

Python multiprocessing module not calling function

I have a program that needs to create several graphs, with each one often taking hours. Therefore I want to run these simultaneously on different cores, but cannot seem to get these processes to run with the multiprocessing module. Here is my code:
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join()
(full_graph() has been defined earlier in the program, and is simply a function that runs a collection of other functions)
The function normally outputs some graphs, and saves the data to a .txt file. All data is saved to the same 2 text files. However, calling the functions using the above code gives no console output, nor any output to the text file. All that happens is a few second long pause, and then the program exits.
I am using the Spyder IDE with WinPython 3.6.3
Without a simple full_graph sample nobody can tell you what's happening. But your code is inherently wrong.
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join() # <- This would block until p is done
See the comment after p.join(). If your processes really take hours to complete, you would run one process for hours and then the 2nd, the 3rd. Serially and using a single core.
From the docs: https://docs.python.org/3/library/multiprocessing.html
Process.join: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.join
If the optional argument timeout is None (the default), the method blocks until the process whose join() method is called terminates. If timeout is a positive number, it blocks at most timeout seconds. Note that the method returns None if its process terminates or if the method times out. Check the process’s exitcode to determine if it terminated.
If each process does something different, you should then also have some args for full_graph(hint: may that be the missing factor?)
You probably want to use an interface like map from Pool
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
And do (from the docs again)
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))

Spawning multiple processes with Python

Earlier I tried to use the threading module in python to create multiple threads. Then I learned about the GIL and how it does not allow taking advantage of multiple CPU cores on a single machine. So now I'm trying to do multiprocessing (I don't strictly need seperate threads).
Here is a sample code I wrote to see if distinct processes are being created. But as can be seen in the output below, I'm getting the same process ID everytime. So multiple processes are not being created. What am I missing?
import multiprocessing as mp
import os
def pri():
print(os.getpid())
if __name__=='__main__':
# Checking number of CPU cores
print(mp.cpu_count())
processes=[mp.Process(target=pri()) for x in range(1,4)]
for p in processes:
p.start()
for p in processes:
p.join()
Output:
4
12554
12554
12554
The Process class requires a callable as its target.
Instead of running the function in the separate process, you are calling it and passing its result (None in this case) to the Process class.
Just change the following:
mp.Process(target=pri())
with:
mp.Process(target=pri)
Since the subprocesses runs on a different process, you won't see their print statements. They also don't share the same memory space. You pass pri() to target, where it needs to be pri. You need to pass a callable object, not execute it.
The prints you see are part of your main thread executions. Because you pass pri(), the code is actually executed. You need to change your code so the pri function returns value, rather than prints it.
Then you need to implement a queue, where all your threads write to it and when they're done, your main thread reads the queue.
A nice feature of the multiprocessing module is the Pool object. It allows you to create a thread pool, and then just use it. It's more convenient.
I have tried your code, the thing is the command executes too quick, so the OS reuses the PIDs. If you add a time.sleep(1) in your pri function, it would work as you expect.
That is True only for Windows. The example below is made on Windows platform. On Unix like machines, you won't need the sleep.
The more convenience solution is like this:
from multiprocessing import Pool
from time import sleep
import os
def pri(x):
sleep(1)
return os.getpid()
def use_procs():
p_pool = Pool(4)
p_results = p_pool.map(pri, [_ for _ in range(1,4)])
p_pool.close()
p_pool.join()
return p_results
if __name__ == '__main__':
res = use_procs()
for r in res:
print r
Without the sleep:
==================== RESTART: C:/Python27/tests/test2.py ====================
6576
6576
6576
>>>
with the sleep:
==================== RESTART: C:/Python27/tests/test2.py ====================
10396
10944
9000

How does the python multiprocessing works in backend?

When i tried to run the code:
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
The output is blank and simply executing without printing "Worker". How to print the required output in multiprocessing?
What actually is happening while using multiprocessing?
What is the maximum number of cores we can use for multiprocessing?
I've tried your code in Windows 7, Cygwin, and Ubuntu. For me all the threads finish before the loop comes to an end so I get all the prints to show, but using join() will guarantee all the threads will finish.
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
for i in range(len(jobs)):
jobs.pop().join()
As far as how multiprocessing works in the backend, I'm going to let someone more experienced than myself answer that one :) I'll probably just make a fool of myself.
I get 5 time "Worker" printed for my part, are you on Python 3 ? if it is the case you muste use print("Worker"). from my experiment, I think multitreading doesn't mean using multiple cores, it just run the diferent tread alternatively to ensure a parallelism. try reading the multiprocessing lib documentation for more info.

Categories