Blocking multiprocessing pool.map called in a process - python

I want to call a multiprocessing.pool.map inside a process.
When initialized inside the run() function, it works. When initialized at instantiation, it does not.
I cannot figure the reason for this behavior ? What happens in the process ?
I am on python 3.6
from multiprocessing import Pool, Process, Queue
def DummyPrinter(key):
print(key)
class Consumer(Process):
def __init__(self, task_queue):
Process.__init__(self)
self.task_queue = task_queue
self.p = Pool(1)
def run(self):
p = Pool(8)
while True:
next_task = self.task_queue.get()
if next_task is None:
break
p.map(DummyPrinter, next_task) # Works
#self.p.map(DummyPrinter, next_task) # Does not Work
return
if __name__ == '__main__':
task_queue = Queue()
Consumer(task_queue).start()
task_queue.put(range(5))
task_queue.put(None)

multiprocessing.Pool cannot be shared by multiple processes because it relies on pipes and threads for its functioning.
The __init__ method gets executed in the parent process whereas the run logic belongs to the child process.
I usually recommend against sub-classing the Process object as it's quite counter intuitive.
A logic like the following would better show the actual division of responsibilities.
def function(task_queue):
"""This runs in the child process."""
p = Pool(8)
while True:
next_task = self.task_queue.get()
if next_task is None:
break
p.map(DummyPrinter, next_task) # Works
def main():
"""This runs in the parent process."""
task_queue = Queue()
process = Process(target=function, args=[task_queue])
process.start()

Related

Lock objects should only be shared between processes through inheritance

I am using the multiprocessing.Pool class within an object and attempting the following:
from multiprocessing import Lock, Pool
class A:
def __init__(self):
self.lock = Lock()
self.file = open('test.txt')
def function(self, i):
self.lock.acquire()
line = self.file.readline()
self.lock.release()
return line
def anotherfunction(self):
pool = Pool()
results = pool.map(self.function, range(10000))
pool.close()
pool.join()
return results
However I am getting a runtime error stating that lock objects should only be shared between processes through inheritance. I am fairly new to Python and multiprocessing. How can I get put on the right track?
multiprocessing.Lock instances can be attributes of multiprocessing.Process instances. When a process is created in the main process with a lock attribute, the lock exists in the main process’s address space. When the process’s start method is invoked and runs a subprocess which invokes the process’s run method, the lock has to be serialized/deserialized to the subprocess address space. This works as expected:
from multiprocessing import Lock, Process
class P(Process):
def __init__(self, *args, **kwargs):
Process.__init__(self, *args, **kwargs)
self.lock = Lock()
def run(self):
print(self.lock)
if __name__ == '__main__':
p = P()
p.start()
p.join()
Prints:
<Lock(owner=None)>
Unfortuantely, this does not work when you are dealing with multiprocessing.Pool instances. In your example, self.lock is created in the main process by the __init__ method. But when Pool.map is called to invoke self.function, the lock cannot be serialized/deserialized to the already-running pool process that will be running this method.
The solution is to initialize each pool process with a global variable set to this lock (there is no point in having this lock being an attribute of the class now). The way to do this is to use the initializer and initargs parameters of the pool __init__ method. See the documentation:
from multiprocessing import Lock, Pool
def init_pool_processes(the_lock):
'''Initialize each process with a global variable lock.
'''
global lock
lock = the_lock
class Test:
def function(self, i):
lock.acquire()
with open('test.txt', 'a') as f:
print(i, file=f)
lock.release()
def anotherfunction(self):
lock = Lock()
pool = Pool(initializer=init_pool_processes, initargs=(lock,))
pool.map(self.function, range(10))
pool.close()
pool.join()
if __name__ == '__main__':
t = Test()
t.anotherfunction()

Stopping eval code dinamically on event fired [duplicate]

What's the proper way to tell a looping thread to stop looping?
I have a fairly simple program that pings a specified host in a separate threading.Thread class. In this class it sleeps 60 seconds, the runs again until the application quits.
I'd like to implement a 'Stop' button in my wx.Frame to ask the looping thread to stop. It doesn't need to end the thread right away, it can just stop looping once it wakes up.
Here is my threading class (note: I haven't implemented looping yet, but it would likely fall under the run method in PingAssets)
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
def run(self):
config = controller.getConfig()
fmt = config['timefmt']
start_time = datetime.now().strftime(fmt)
try:
if onlinecheck.check_status(self.asset):
status = "online"
else:
status = "offline"
except socket.gaierror:
status = "an invalid asset tag."
msg =("{}: {} is {}. \n".format(start_time, self.asset, status))
wx.CallAfter(self.window.Logger, msg)
And in my wxPyhton Frame I have this function called from a Start button:
def CheckAsset(self, asset):
self.count += 1
thread = PingAssets(self.count, asset, self)
self.threads.append(thread)
thread.start()
Threaded stoppable function
Instead of subclassing threading.Thread, one can modify the function to allow
stopping by a flag.
We need an object, accessible to running function, to which we set the flag to stop running.
We can use threading.currentThread() object.
import threading
import time
def doit(arg):
t = threading.currentThread()
while getattr(t, "do_run", True):
print ("working on %s" % arg)
time.sleep(1)
print("Stopping as you wish.")
def main():
t = threading.Thread(target=doit, args=("task",))
t.start()
time.sleep(5)
t.do_run = False
if __name__ == "__main__":
main()
The trick is, that the running thread can have attached additional properties. The solution builds
on assumptions:
the thread has a property "do_run" with default value True
driving parent process can assign to started thread the property "do_run" to False.
Running the code, we get following output:
$ python stopthread.py
working on task
working on task
working on task
working on task
working on task
Stopping as you wish.
Pill to kill - using Event
Other alternative is to use threading.Event as function argument. It is by
default False, but external process can "set it" (to True) and function can
learn about it using wait(timeout) function.
We can wait with zero timeout, but we can also use it as the sleeping timer (used below).
def doit(stop_event, arg):
while not stop_event.wait(1):
print ("working on %s" % arg)
print("Stopping as you wish.")
def main():
pill2kill = threading.Event()
t = threading.Thread(target=doit, args=(pill2kill, "task"))
t.start()
time.sleep(5)
pill2kill.set()
t.join()
Edit: I tried this in Python 3.6. stop_event.wait() blocks the event (and so the while loop) until release. It does not return a boolean value. Using stop_event.is_set() works instead.
Stopping multiple threads with one pill
Advantage of pill to kill is better seen, if we have to stop multiple threads
at once, as one pill will work for all.
The doit will not change at all, only the main handles the threads a bit differently.
def main():
pill2kill = threading.Event()
tasks = ["task ONE", "task TWO", "task THREE"]
def thread_gen(pill2kill, tasks):
for task in tasks:
t = threading.Thread(target=doit, args=(pill2kill, task))
yield t
threads = list(thread_gen(pill2kill, tasks))
for thread in threads:
thread.start()
time.sleep(5)
pill2kill.set()
for thread in threads:
thread.join()
This has been asked before on Stack. See the following links:
Is there any way to kill a Thread in Python?
Stopping a thread after a certain amount of time
Basically you just need to set up the thread with a stop function that sets a sentinel value that the thread will check. In your case, you'll have the something in your loop check the sentinel value to see if it's changed and if it has, the loop can break and the thread can die.
I read the other questions on Stack but I was still a little confused on communicating across classes. Here is how I approached it:
I use a list to hold all my threads in the __init__ method of my wxFrame class: self.threads = []
As recommended in How to stop a looping thread in Python? I use a signal in my thread class which is set to True when initializing the threading class.
class PingAssets(threading.Thread):
def __init__(self, threadNum, asset, window):
threading.Thread.__init__(self)
self.threadNum = threadNum
self.window = window
self.asset = asset
self.signal = True
def run(self):
while self.signal:
do_stuff()
sleep()
and I can stop these threads by iterating over my threads:
def OnStop(self, e):
for t in self.threads:
t.signal = False
I had a different approach. I've sub-classed a Thread class and in the constructor I've created an Event object. Then I've written custom join() method, which first sets this event and then calls a parent's version of itself.
Here is my class, I'm using for serial port communication in wxPython app:
import wx, threading, serial, Events, Queue
class PumpThread(threading.Thread):
def __init__ (self, port, queue, parent):
super(PumpThread, self).__init__()
self.port = port
self.queue = queue
self.parent = parent
self.serial = serial.Serial()
self.serial.port = self.port
self.serial.timeout = 0.5
self.serial.baudrate = 9600
self.serial.parity = 'N'
self.stopRequest = threading.Event()
def run (self):
try:
self.serial.open()
except Exception, ex:
print ("[ERROR]\tUnable to open port {}".format(self.port))
print ("[ERROR]\t{}\n\n{}".format(ex.message, ex.traceback))
self.stopRequest.set()
else:
print ("[INFO]\tListening port {}".format(self.port))
self.serial.write("FLOW?\r")
while not self.stopRequest.isSet():
msg = ''
if not self.queue.empty():
try:
command = self.queue.get()
self.serial.write(command)
except Queue.Empty:
continue
while self.serial.inWaiting():
char = self.serial.read(1)
if '\r' in char and len(msg) > 1:
char = ''
#~ print('[DATA]\t{}'.format(msg))
event = Events.PumpDataEvent(Events.SERIALRX, wx.ID_ANY, msg)
wx.PostEvent(self.parent, event)
msg = ''
break
msg += char
self.serial.close()
def join (self, timeout=None):
self.stopRequest.set()
super(PumpThread, self).join(timeout)
def SetPort (self, serial):
self.serial = serial
def Write (self, msg):
if self.serial.is_open:
self.queue.put(msg)
else:
print("[ERROR]\tPort {} is not open!".format(self.port))
def Stop(self):
if self.isAlive():
self.join()
The Queue is used for sending messages to the port and main loop takes responses back. I've used no serial.readline() method, because of different end-line char, and I have found the usage of io classes to be too much fuss.
Depends on what you run in that thread.
If that's your code, then you can implement a stop condition (see other answers).
However, if what you want is to run someone else's code, then you should fork and start a process. Like this:
import multiprocessing
proc = multiprocessing.Process(target=your_proc_function, args=())
proc.start()
now, whenever you want to stop that process, send it a SIGTERM like this:
proc.terminate()
proc.join()
And it's not slow: fractions of a second.
Enjoy :)
My solution is:
import threading, time
def a():
t = threading.currentThread()
while getattr(t, "do_run", True):
print('Do something')
time.sleep(1)
def getThreadByName(name):
threads = threading.enumerate() #Threads list
for thread in threads:
if thread.name == name:
return thread
threading.Thread(target=a, name='228').start() #Init thread
t = getThreadByName('228') #Get thread by name
time.sleep(5)
t.do_run = False #Signal to stop thread
t.join()
I find it useful to have a class, derived from threading.Thread, to encapsulate my thread functionality. You simply provide your own main loop in an overridden version of run() in this class. Calling start() arranges for the object’s run() method to be invoked in a separate thread.
Inside the main loop, periodically check whether a threading.Event has been set. Such an event is thread-safe.
Inside this class, you have your own join() method that sets the stop event object before calling the join() method of the base class. It can optionally take a time value to pass to the base class's join() method to ensure your thread is terminated in a short amount of time.
import threading
import time
class MyThread(threading.Thread):
def __init__(self, sleep_time=0.1):
self._stop_event = threading.Event()
self._sleep_time = sleep_time
"""call base class constructor"""
super().__init__()
def run(self):
"""main control loop"""
while not self._stop_event.isSet():
#do work
print("hi")
self._stop_event.wait(self._sleep_time)
def join(self, timeout=None):
"""set stop event and join within a given time period"""
self._stop_event.set()
super().join(timeout)
if __name__ == "__main__":
t = MyThread()
t.start()
time.sleep(5)
t.join(1) #wait 1s max
Having a small sleep inside the main loop before checking the threading.Event is less CPU intensive than looping continuously. You can have a default sleep time (e.g. 0.1s), but you can also pass the value in the constructor.
Sometimes you don't have control over the running target. In those cases you can use signal.pthread_kill to send a stop signal.
from signal import pthread_kill, SIGTSTP
from threading import Thread
from itertools import count
from time import sleep
def target():
for num in count():
print(num)
sleep(1)
thread = Thread(target=target)
thread.start()
sleep(5)
pthread_kill(thread.ident, SIGTSTP)
result
0
1
2
3
4
[14]+ Stopped

Python Multiprocessing: Adding to Queue Within Child Process

I want to implement a file crawler that stores data to a Mongo. I would like to use multiprocessing as a way to 'hand off' blocking tasks such as unzipping files, file crawling and uploading to Mongo. There are certain tasks that are reliant on other tasks (i.e., a file needs to be unzipped before files can be crawled), so I would like the ability to complete the necessary task and add new ones to the same task queue.
Below is what I currently have:
import multiprocessing
class Worker(multiprocessing.Process):
def __init__(self, task_queue: multiprocessing.Queue):
super(Worker, self).__init__()
self.task_queue = task_queue
def run(self):
for (function, *args) in iter(self.task_queue.get, None):
print(f'Running: {function.__name__}({*args,})')
# Run the provided function with its parameters in child process
function(*args)
self.task_queue.task_done()
def foo(task_queue: multiprocessing.Queue) -> None:
print('foo')
# Add new task to queue from this child process
task_queue.put((bar, 1))
def bar(x: int) -> None:
print(f'bar: {x}')
def main():
# Start workers on separate processes
workers = []
manager = multiprocessing.Manager()
task_queue = manager.Queue()
for i in range(multiprocessing.cpu_count()):
worker = Worker(task_queue)
workers.append(worker)
worker.start()
# Run foo on child process using the queue as parameter
task_queue.put((foo, task_queue))
for _ in workers:
task_queue.put(None)
# Block until workers complete and join main process
for worker in workers:
worker.join()
print('Program completed.')
if __name__ == '__main__':
main()
Expected Behaviour:
Running: foo((<AutoProxy[Queue] object, typeid 'Queue' at 0x1b963548908>,))
foo
Running: bar((1,))
bar: 1
Program completed.
Actual Behaviour:
Running: foo((<AutoProxy[Queue] object, typeid 'Queue' at 0x1b963548908>,))
foo
Program completed.
I am quite new to multiprocessing so any help would be greatly appreciated.
As #FrankYellin noted, this is due to the fact that None is being put into task_queue before bar can be added.
Assuming that the queue will either be non-empty or waiting for a task to complete
during the program (which is true in my case), the join method on the queue can be used. According to the docs:
Blocks until all items in the queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls
task_done() to indicate that the item was retrieved and all work on it
is complete. When the count of unfinished tasks drops to zero, join()
unblocks.
Below is the updated code:
import multiprocessing
class Worker(multiprocessing.Process):
def __init__(self, task_queue: multiprocessing.Queue):
super(Worker, self).__init__()
self.task_queue = task_queue
def run(self):
for (function, *args) in iter(self.task_queue.get, None):
print(f'Running: {function.__name__}({*args,})')
# Run the provided function with its parameters in child process
function(*args)
self.task_queue.task_done() # <-- Notify queue that task is complete
def foo(task_queue: multiprocessing.Queue) -> None:
print('foo')
# Add new task to queue from this child process
task_queue.put((bar, 1))
def bar(x: int) -> None:
print(f'bar: {x}')
def main():
# Start workers on separate processes
workers = []
manager = multiprocessing.Manager()
task_queue = manager.Queue()
for i in range(multiprocessing.cpu_count()):
worker = Worker(task_queue)
workers.append(worker)
worker.start()
# Run foo on child process using the queue as parameter
task_queue.put((foo, task_queue))
# Block until all items in queue are popped and completed
task_queue.join() # <---
for _ in workers:
task_queue.put(None)
# Block until workers complete and join main process
for worker in workers:
worker.join()
print('Program completed.')
if __name__ == '__main__':
main()
This seems to work fine. I will update this if I discover anything new. Thank you all.

how to add more items to a multiprocessing queue while script in motion

I am trying to learn multiprocessing with queue.
What I want to do is figure out when/how to "add more items to the queue" when the script is in motion.
The below script is the baseline I am working from:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(
proc_name, self.name))
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
print(queue.qsize())
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
on line 26, the Fancy Dan inject works, but the Frankie piece doesn't. I am able to confirm that Frankie does make it into the queue. I need a spot where I can "Check for more items" and insert them into the queue as needed. If no more items exist, then close the queue when the existing items are clear.
How do I do this?
Thanks!
Let's make it clear:
the target function worker(q) will be called just once in the above scheme. At that first call the function will suspend waiting the result from blocking operation q.get(). It gets the instance MyFancyClass('Fancy Dan') from the queue, invokes its do_something method and get finished.
MyFancyClass('Frankie') will be put into the queue but won't go to the Process cause the process' target function is done.
one of the ways is to read from the queue and wait for a signal/marked item which signals that queue usage is stopped. Let's say None value.
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(proc_name, self.name))
def worker(q):
while True:
obj = q.get()
if obj is None:
break
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
# print(queue.qsize())
queue.put(None)
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
The output:
Doing something fancy in Process-1 for Fancy Dan!
Doing something fancy in Process-1 for Frankie!
One way you could do this is by changing worker to
def worker(q):
while not q.empty():
obj = q.get()
obj.do_something()
The problem with your original code is that worker returns after doing work on one item on the queue. You need some sort of looping logic.
This solution is imperfect because empty() is not reliable. Also will fail if the queue becomes empty before adding more items to it (the process will just return).
I would suggest using a Process Pool Executor.
Submit is pretty close to what you're looking for.

Extending mp.Process in Python 3

import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.exit = mp.Event()
self.name = name
print("{0} initiated".format(self.name))
def run(self):
while not self.exit.is_set():
pass
print("Process {0} exited.".format(self.name))
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.exit.set()
def f(x):
while True:
print(x)
x = x+1
if __name__ == "__main__":
p = MyProcess(target=f, args=[3], name="function")
p.start()
#p.join()
t.wait(2)
p.shutdown()
I'm trying to extend the multiprocessing.Process class to add a shutdown method in order to be able to exit a function which could potentially have to be run for an undefined amount of time. Following instructions from Python Multiprocessing Exit Elegantly How? and adding the argument passing I came up with myself, only gets me this output:
function initiated
Shutdown initiated for function.
Process function exited.
But no actual method f(x) output. It seems that the actual process target doesn't get started. I'm obviously doing something wrong, but just can't figure out what, any ideas?
Thanks!
The sane way to handle this situation is, where possible, to have the background task cooperate in the exit mechanism by periodically checking the exit event. For that, there's no need to subclass Process: you can rewrite your background task to include that check. For example, here's your code rewritten using that approach:
import multiprocessing as mp
import time as t
def f(x, exit_event):
while not exit_event.is_set():
print(x)
x = x+1
print("Exiting")
if __name__ == "__main__":
exit_event = mp.Event()
p = mp.Process(target=f, args=(3, exit_event), name="function")
p.start()
t.sleep(2)
exit_event.set()
p.join()
If that's not an option (for example because you can't modify the code that's being run in the background job), then you can use the Process.terminate method. But you should be aware that using it is dangerous: the child process won't have an opportunity to clean up properly, so for example if it's shutdown while holding a multiprocessing lock, no other process will be able to acquire that lock, giving a risk of deadlock. It's far better to have the child cooperate in the shutdown if possible.
The solution to this problem is to call the super().run() function in your class run method.
Of course, this will cause the permanent execution of your function due to the existence of while True, and the specified event will not cause its end.
You can use Process.terminate() method to end your process.
import multiprocessing as mp
import time as t
class MyProcess(mp.Process):
def __init__(self, target, args, name):
mp.Process.__init__(self, target=target, args=args)
self.name = name
print("{0} initiated".format(self.name))
def run(self):
print("Process {0} started.".format(self.name))
super().run()
def shutdown(self):
print("Shutdown initiated for {0}.".format(self.name))
self.terminate()
def f(x):
while True:
print(x)
t.sleep(1)
x += 1
if __name__ == "__main__":
p = MyProcess(target=f, args=(3,), name="function")
p.start()
# p.join()
t.sleep(5)
p.shutdown()

Categories