Why do these two processes behave like this? - python

I'm creating two instances of a process over here but when I'm running this program I'm getting only main function output.
import multiprocessing
import time
def sleepy_man():
print("Starting to Sleep")
time.sleep(1)
print("Done Sleeping")
tic = time.time()
p1 = multiprocessing.Process(target=sleepy_man)
p2 = multiprocessing.Process(target=sleepy_man)
p1.start()
p2.start()
toc = time.time()
print("Done in {:.4f} seconds".format(toc-tic))
Output
Done in 0.0554 seconds
I was doing it for practice from this blog only.
Source: https://www.analyticsvidhya.com/blog/2021/04/a-beginners-guide-to-multi-processing-in-python/

It is worth noting you would see the same behavior if you had somehow set p1.daemon = p2.daemon = True.
It is also possibly due to output buffering, rather than logic errors.
Two questions:
If you add a sys.stdout.flush() or flush=True to your print, do you see different behavior?
If you run this with time python foobar.py does it take .02s or 1s to run?
Obviously, continuing your tutorial and correctly adding .join() below will resolve the issue in a way that would be expected for normal usage.

import multiprocessing as mp
import time
def sleepy_man():
print("Starting to Sleep")
time.sleep(1)
print("Done Sleeping")
# if you are on Windows, which use spawning to create child processes, use __name__ == '__main__'
if __name__ == '__main__':
tic = time.time()
processes = [
mp.Process(target=sleepy_man),
mp.Process(target=sleepy_man)
]
[p.start() for p in processes]
# if you want to see results of process work, join them
# otherwise if main process finish its work before their children
# you'll get no results since parent process will finish children
# you can also declare Process as daemon=False - as another choice
# in that case you can use no join()
# on the other hand join() makes parent process to wait for children join()
# and only then it prints time in your case
[p.join() for p in processes]
toc = time.time()
print("Done in {:.4f} seconds".format(toc-tic))

Related

Multiprocessing Queue - child processes gets stuck sometimes and does not reap

First of all I apologize if the title is bit weird but i literally could not think of how to put into a single line the problem i am facing.
So I have the following code
import time
from multiprocessing import Process, current_process, Manager
from multiprocessing import JoinableQueue as Queue
# from threading import Thread, current_thread
# from queue import Queue
def checker(q):
count = 0
while True:
if not q.empty():
data = q.get()
# print(f'{data} fetched by {current_process().name}')
# print(f'{data} fetched by {current_thread().name}')
q.task_done()
count += 1
else:
print('Queue is empty now')
print(current_process().name, '-----', count)
# print(current_thread().name, '-----', count)
if __name__ == '__main__':
t = time.time()
# m = Manager()
q = Queue()
# with open("/tmp/c.txt") as ifile:
# for line in ifile:
# q.put((line.strip()))
for i in range(1000):
q.put(i)
time.sleep(0.1)
procs = []
for _ in range(2):
p = Process(target=checker, args=(q,), daemon=True)
# p = Thread(target=checker, args=(q,))
p.start()
procs.append(p)
q.join()
for p in procs:
p.join()
Sample outputs
1: When the process just hangs
Queue is empty now
Process-2 ----- 501
output hangs at this point
2: When everything works just fine.
Queue is empty now
Process-1 ----- 515
Queue is empty now
Process-2 ----- 485
Process finished with exit code 0
Now the behavior is intermittent and happens sometimes but not always.
I have tried using Manager.Queue() as well in place of multiprocessing.Queue() but no success and both exhibits same issue.
I tested this with both multiprocessing and multithreading and i get exactly same behavior, with one slight difference that with multithreading the rate of this behavior is much less compared to multiprocessing.
So I think there is something I am missing conceptually or doing wrong, but i am not able to catch it now since I have spent way too much time on this and now my mind is not seeing something which may be very basic.
So any help is appreciated.
I believe you have a race condition in the checker method. You check whether the queue is empty and then dequeue the next task in separate steps. It's usually not a good idea to separate these two kinds of operations without mutual exclusion or locking, because the state of the queue may change between the check and the pop. It may be non-empty, but another process may then dequeue the waiting work before the process which passed the check is able to do so.
However I generally prefer communication over locking whenever possible; it's less error prone and makes one's intentions clearer. In this case, I would send a sentinel value to the worker processes (such as None) to indicate that all work is done. Each worker then just dequeues the next object (which is always thread-safe), and, if the object is None, the sub-process exits.
The example code below is a simplified version of your program, and should work without races:
def checker(q):
while True:
data = q.get()
if data is None:
print(f'process f{current_process().name} ending')
return
else:
pass # do work
if __name__ == '__main__':
q = Queue()
for i in range(1000):
q.put(i)
procs = []
for _ in range(2):
q.put(None) # Sentinel value
p = Process(target=checker, args=(q,), daemon=True)
p.start()
procs.append(p)
for proc in procs:
proc.join()

JoinableQueue join() method blocking main thread even after task_done()

In below code, if I put daemon = True , consumer will quit before reading all queue entries. If consumer is non-daemon, Main thread is always blocked even after the task_done() for all the entries.
from multiprocessing import Process, JoinableQueue
import time
def consumer(queue):
while True:
final = queue.get()
print (final)
queue.task_done()
def producer1(queue):
for i in "QWERTYUIOPASDFGHJKLZXCVBNM":
queue.put(i)
if __name__ == "__main__":
queue = JoinableQueue(maxsize=100)
p1 = Process(target=consumer, args=((queue),))
p2 = Process(target=producer1, args=((queue),))
#p1.daemon = True
p1.start()
p2.start()
print(p1.is_alive())
print (p2.is_alive())
for i in range(1, 10):
queue.put(i)
time.sleep(0.01)
queue.join()
Let's see what—I believe—is happening here:
both processes are being started.
the consumer process starts its loop and blocks until a value is received from the queue.
the producer1 process feeds the queue 26 times with a letter while the main process feeds the queue 9 times with a number. The order in which letters or numbers are being fed is not guaranteed—a number could very well show up before a letter.
when both the producer1 and the main processes are done with feeding their data, the queue is being joined. No problem here, the queue can be joined since all the buffered data has been consumed and task_done() has been called after each read.
the consumer process is still running but is blocked until more data to consume show up.
Looking at your code, I believe that you are confusing the concept of joining processes with the one of joining queues. What you most likely want here is to join processes, you probably don't need a joinable queue at all.
#!/usr/bin/env python3
from multiprocessing import Process, Queue
import time
def consumer(queue):
for final in iter(queue.get, 'STOP'):
print(final)
def producer1(queue):
for i in "QWERTYUIOPASDFGHJKLZXCVBNM":
queue.put(i)
if __name__ == "__main__":
queue = Queue(maxsize=100)
p1 = Process(target=consumer, args=((queue),))
p2 = Process(target=producer1, args=((queue),))
p1.start()
p2.start()
print(p1.is_alive())
print(p2.is_alive())
for i in range(1, 10):
queue.put(i)
time.sleep(0.01)
queue.put('STOP')
p1.join()
p2.join()
Also your producer1 exits on its own after feeding all the letters but you need a way to tell your consumer process to exit when there won't be any more data for it to process. You can do this by sending a sentinel, here I chose the string 'STOP' but it can be anything.
In fact, this code is not great since the 'STOP' sentinel could be received before some letters, thus both causing some letters to not be processed but also a deadlock because the processes are trying to join even though the queue still contains some data. But this is a different problem.

Simple and safe way to wait for a Python Process to complete when using a Queue

I'm using the Process class to create and manage subprocesses, which may return non-trival quantities of data. The documentation states that join() is the correct way to wait for a Process to complete (https://docs.python.org/2/library/multiprocessing.html#the-process-class).
However, when using multiprocessing.Queue this can cause a hang after joining the process, as described here: https://bugs.python.org/issue8426 and here https://docs.python.org/2/library/multiprocessing.html#multiprocessing-programming (not a bug).
These docs suggest removing p.join() - but surely this will remove the guarantee that all processes have completed, as Queue.get() only waits for a single item to become available?
How can I wait for completion of all Processes in this case, and ensure I'm collecting output from them all?
A simple example of the hang I'd like to deal with:
from multiprocessing import Process, Queue
class MyClass:
def __init__(self):
pass
def example_run(output):
output.put([MyClass() for i in range(1000)])
print("Bottom of example_run() - note hangs after this is printed")
if __name__ == '__main__':
output = Queue()
processes = [Process(target=example_run, args=(output,)) for x in range(5)]
for p in processes:
p.start()
for p in processes:
p.join()
print("Processes completed")
https://bugs.python.org/issue8426
This means that whenever you use a queue you need to make sure that
all items which have been put on the queue will eventually be removed
before the process is joined. Otherwise you cannot be sure that
processes which have put items on the queue will terminate.
In your example I just added output.get() before calling to join() and every thing worked fine. We put data in queue to be used some where, so just make sure that.
for p in processes:
p.start()
print output.get()
for p in processes:
p.join()
print("Processes completed")
An inelegant solution is to add
output_final = []
for i in range(5): # we have 5 processes
output_final.append(output.get())
before attempting to join any of the processes. This simply tries to get the appropriate number of outputs for the number of processes we've started.
Turns out a much better, wider solution is not to use Process at all; use Pool instead. This way the hassles of starting worker processes and collecting the results is handled for you:
import multiprocessing
class MyClass:
def __init__(self):
pass
def example_run(someArbitraryInput):
foo = [MyClass() for i in range(10000)]
return foo
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=5)
output = pool.map(example_run, range(5))
pool.close(); pool.join() # make sure the processes are complete and tidy
print("Processes completed")

Python multiprocessing apply_async read/write var in main process at starting subprocess

I am using Python 3.5 multiprocessing apply_async. My code is like task = pool.apply_async(myFunc, args). I pass a info (object from Info) in the args. It has a data member called startTime. I hope when myFunc starting running, info.startTime will be written as time.time(). The problem is that the info in main process and the info in subprocess is not the same. info.startTime = time.time() in myFunc does not change the info in the main process. Is there a good way to save the startTime? Thanks.
The processes in a pool can not write to a common variable. Think of them as existing in parallel universes. You'll need some mechanism to share information between them. Here's a simple example using Manager to keep the time stamp from all the processes:
from multiprocessing import Pool, Manager, current_process
import time
def do_work(x, ll):
time.sleep(.2)
ll.append(current_process().name + ' took task '+str(x)+' at '+str(time.time()))
if __name__ == '__main__':
with Manager() as manager:
timestamp = manager.list()
p = Pool(processes=4)
for x in range(10):
p.apply_async(do_work, (x, timestamp))
p.close()
p.join()
print(timestamp)
If you change timestamp = manager.list() to simply timestamp = list(), you'll see it no longer works.
P.S. Queue doesn't seem as easy to handle when you're using Pool.

Filling a queue and managing multiprocessing in python

I'm having this problem in python:
I have a queue of URLs that I need to check from time to time
if the queue is filled up, I need to process each item in the queue
Each item in the queue must be processed by a single process (multiprocessing)
So far I managed to achieve this "manually" like this:
while 1:
self.updateQueue()
while not self.mainUrlQueue.empty():
domain = self.mainUrlQueue.get()
# if we didn't launched any process yet, we need to do so
if len(self.jobs) < maxprocess:
self.startJob(domain)
#time.sleep(1)
else:
# If we already have process started we need to clear the old process in our pool and start new ones
jobdone = 0
# We circle through each of the process, until we find one free ; only then leave the loop
while jobdone == 0:
for p in self.jobs :
#print "entering loop"
# if the process finished
if not p.is_alive() and jobdone == 0:
#print str(p.pid) + " job dead, starting new one"
self.jobs.remove(p)
self.startJob(domain)
jobdone = 1
However that leads to tons of problems and errors. I wondered if I was not better suited using a Pool of process. What would be the right way to do this?
However, a lot of times my queue is empty, and it can be filled by 300 items in a second, so I'm not too sure how to do things here.
You could use the blocking capabilities of queue to spawn multiple process at startup (using multiprocessing.Pool) and letting them sleep until some data are available on the queue to process. If your not familiar with that, you could try to "play" with that simple program:
import multiprocessing
import os
import time
the_queue = multiprocessing.Queue()
def worker_main(queue):
print os.getpid(),"working"
while True:
item = queue.get(True)
print os.getpid(), "got", item
time.sleep(1) # simulate a "long" operation
the_pool = multiprocessing.Pool(3, worker_main,(the_queue,))
# don't forget the comma here ^
for i in range(5):
the_queue.put("hello")
the_queue.put("world")
time.sleep(10)
Tested with Python 2.7.3 on Linux
This will spawn 3 processes (in addition of the parent process). Each child executes the worker_main function. It is a simple loop getting a new item from the queue on each iteration. Workers will block if nothing is ready to process.
At startup all 3 process will sleep until the queue is fed with some data. When a data is available one of the waiting workers get that item and starts to process it. After that, it tries to get an other item from the queue, waiting again if nothing is available...
Added some code (submitting "None" to the queue) to nicely shut down the worker threads, and added code to close and join the_queue and the_pool:
import multiprocessing
import os
import time
NUM_PROCESSES = 20
NUM_QUEUE_ITEMS = 20 # so really 40, because hello and world are processed separately
def worker_main(queue):
print(os.getpid(),"working")
while True:
item = queue.get(block=True) #block=True means make a blocking call to wait for items in queue
if item is None:
break
print(os.getpid(), "got", item)
time.sleep(1) # simulate a "long" operation
def main():
the_queue = multiprocessing.Queue()
the_pool = multiprocessing.Pool(NUM_PROCESSES, worker_main,(the_queue,))
for i in range(NUM_QUEUE_ITEMS):
the_queue.put("hello")
the_queue.put("world")
for i in range(NUM_PROCESSES):
the_queue.put(None)
# prevent adding anything more to the queue and wait for queue to empty
the_queue.close()
the_queue.join_thread()
# prevent adding anything more to the process pool and wait for all processes to finish
the_pool.close()
the_pool.join()
if __name__ == '__main__':
main()

Categories