I've read that it's considered bad practice to kill a thread. (Is there any way to kill a Thread?) There are a LOT of answers there, and I'm wondering if even using a thread in the first place is the right answer for me.
I have a bunch multiprocessing.Processes. Essentially, each Process is doing this:
while some_condition:
result = self.function_to_execute(i, **kwargs_i)
# outQ is a multiprocessing.queue shared between all Processes
self.outQ.put(Result(i, result))
Problem is... I need a way to interrupt function_to_execute, but can't modify the function itself. Initially, I was thinking simply process.terminate(), but that appears to be unsafe with multiprocessing.queue.
Most likely (but not guaranteed), if I need to kill a thread, the 'main' program is going to be done soon. Is my safest option to do something like this? Or perhaps there is a more elegant solution than using a thread in the first place?
def thread_task():
while some_condition:
result = self.function_to_execute(i, **kwargs_i)
if (this_thread_is_not_daemonized):
self.outQ.put(Result(i, result))
t = Thread(target=thread_task)
t.start()
if end_early:
t.daemon = True
I believe the end result of this is that the Process that spawned the thread will continue to waste CPU cycles on a task I no longer care about the output for, but if the main program finishes, it'll clean up all my memory nicely.
The main problem with daemonizing a thread is that the main program could potentially continue for 30+ minutes even when I don't care about the output of that thread anymore.
From the threading docs:
If you want your threads to stop gracefully, make them non-daemonic
and use a suitable signalling mechanism such as an Event
Here is a contrived example of what I was thinking - no idea if it mimics what you are doing or can be adapted for your situation. Another caveat: I've never written any real concurrent code.
Create an Event object in the main process and pass it all the way to the thread.
Design the thread so that it loops until the Event object is set. Once you don't need the processing anymore SET the Event object in the main process. No need to modify the function being run in the thread.
from multiprocessing import Process, Queue, Event
from threading import Thread
import time, random, os
def f_to_run():
time.sleep(.2)
return random.randint(1,10)
class T(Thread):
def __init__(self, evt,q, func, parent):
self.evt = evt
self.q = q
self.func = func
self.parent = parent
super().__init__()
def run(self):
while not self.evt.is_set():
n = self.func()
self.q.put(f'PID {self.parent}-{self.name}: {n}')
def f(T,evt,q,func):
pid = os.getpid()
t = T(evt,q,func,pid)
t.start()
t.join()
q.put(f'PID {pid}-{t.name} is alive - {t.is_alive()}')
q.put(f'PID {pid}:DONE')
return 'foo done'
if __name__ == '__main__':
results = []
q = Queue()
evt = Event()
# two processes each with one thread
p= Process(target=f, args=(T, evt, q, f_to_run))
p1 = Process(target=f, args=(T, evt, q, f_to_run))
p.start()
p1.start()
while len(results) < 40:
results.append(q.get())
print('.',end='')
print('')
evt.set()
p.join()
p1.join()
while not q.empty():
results.append(q.get_nowait())
for thing in results:
print(thing)
I initially tried to use threading.Event but the multiprocessing module complained that it couldn't be pickled. I was actually surprised that the multiprocessing.Queue and multiprocessing.Event worked AND could be accessed by the thread.
Not sure why I started with a Thread subclass - I think I thought it would be easier to control/specify what happens in it's run method. But it can be done with a function also.
from multiprocessing import Process, Queue, Event
from threading import Thread
import time, random
def f_to_run():
time.sleep(.2)
return random.randint(1,10)
def t1(evt,q, func):
while not evt.is_set():
n = func()
q.put(n)
def g(t1,evt,q,func):
t = Thread(target=t1,args=(evt,q,func))
t.start()
t.join()
q.put(f'{t.name} is alive - {t.is_alive()}')
return 'foo'
if __name__ == '__main__':
q = Queue()
evt = Event()
p= Process(target=g, args=(t1, evt, q, f_to_run))
p.start()
time.sleep(5)
evt.set()
p.join()
Related
I am trying to learn multiprocessing with queue.
What I want to do is figure out when/how to "add more items to the queue" when the script is in motion.
The below script is the baseline I am working from:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(
proc_name, self.name))
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
print(queue.qsize())
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
on line 26, the Fancy Dan inject works, but the Frankie piece doesn't. I am able to confirm that Frankie does make it into the queue. I need a spot where I can "Check for more items" and insert them into the queue as needed. If no more items exist, then close the queue when the existing items are clear.
How do I do this?
Thanks!
Let's make it clear:
the target function worker(q) will be called just once in the above scheme. At that first call the function will suspend waiting the result from blocking operation q.get(). It gets the instance MyFancyClass('Fancy Dan') from the queue, invokes its do_something method and get finished.
MyFancyClass('Frankie') will be put into the queue but won't go to the Process cause the process' target function is done.
one of the ways is to read from the queue and wait for a signal/marked item which signals that queue usage is stopped. Let's say None value.
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(proc_name, self.name))
def worker(q):
while True:
obj = q.get()
if obj is None:
break
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
# print(queue.qsize())
queue.put(None)
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
The output:
Doing something fancy in Process-1 for Fancy Dan!
Doing something fancy in Process-1 for Frankie!
One way you could do this is by changing worker to
def worker(q):
while not q.empty():
obj = q.get()
obj.do_something()
The problem with your original code is that worker returns after doing work on one item on the queue. You need some sort of looping logic.
This solution is imperfect because empty() is not reliable. Also will fail if the queue becomes empty before adding more items to it (the process will just return).
I would suggest using a Process Pool Executor.
Submit is pretty close to what you're looking for.
import random
import queue as Queue
import _thread as Thread
a = Queue.Queue()
def af():
while True:
a.put(random.randint(0,1000))
def bf():
while True:
if (not a.empty()): print (a.get())
def main():
Thread.start_new_thread(af, ())
Thread.start_new_thread(bf, ())
return
if __name__ == "__main__":
main()
the above code works fine with extreme high CPU usage, i tried to use multiprocessing with no avail. i have tried
def main():
multiprocessing.Process(target=af).run()
multiprocessing.Process(target=bf).run()
and
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
pool.apply_async(af)
pool.apply_async(bf)
both not working, can anyone please help me? thanks a bunch ^_^
def main():
multiprocessing.Process(target=af).run() # will not return
multiprocessing.Process(target=bf).run()
The above code does not work because af does not return; no chance to call bf. You need to separate run call to start/join so that both can run in parallel. (+ to make them share manage.Queue)
To make the second code work, you need to pass a (manager.Queue object) to functions. Otherwise they will use Queue.Queue global object which is not shared between processes; need to modify af, bf to accepts a, and main to pass a.
def af(a):
while True:
a.put(random.randint(0, 1000))
def bf(a):
while True:
print(a.get())
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
proc1 = pool.apply_async(af, [a])
proc2 = pool.apply_async(bf, [a])
# Wait until process ends. Uncomment following line if there's no waiting code.
# proc1.get()
# proc2.get()
In the first alternative main you use Process, but the method you should call to start the activity is not run(), as one would think, but rather start(). You will want to follow that up with appropriate join() statements. Following the information in multiprocessing (available here: https://docs.python.org/2/library/multiprocessing.html), here is a working sample:
import random
from multiprocessing import Process, Queue
def af(q):
while True:
q.put(random.randint(0,1000))
def bf(q):
while True:
if not q.empty():
print (q.get())
def main():
a = Queue()
p = Process(target=af, args=(a,))
c = Process(target=bf, args=(a,))
p.start()
c.start()
p.join()
c.join()
if __name__ == "__main__":
main()
To add to the accepted answer, in the original code:
while True:
if not q.empty():
print (q.get())
q.empty() is being called every time which is unnecessary since q.get() if the queue is empty will wait until something is available here documentation.
Similar answer here
I assume that this could affect the performance since calling the .empty() every iteration should consume more resources (it should be more noticeable if Thread was used instead of Process because Python Global Interpreter Lock (GIL))
I know it's an old question but hope it helps!
I'm struggling to get my head around multiprocessing and passing a global True/False variable into my function.
After get_data() finishes I want the analysis() function to start and process the data, while fetch() continues running. How can I make this work? TIA
import multiprocessing
ready = False
def fetch():
global ready
get_data()
ready = True
return
def analysis():
analyse_data()
if __name__ == '__main__':
p1 = multiprocessing.Process(target=fetch)
p2 = multiprocessing.Process(target=analysis)
p1.start()
if ready:
p2.start()
You should run the two processes and use a shared queue to exchange information between them, such as signaling the completion of an action in one of the processes.
Also, you need to have a join() statement to properly wait for completion of the processes you spawn.
from multiprocessing import Process, Queue
import time
def get_data(q):
#Do something to get data
time.sleep(2)
#Put an event in the queue to signal that get_data has finished
q.put('message from get_data to analyse_data')
def analyse_data(q):
#waiting for get_data to finish...
msg = q.get()
print msg #Will print 'message from get_data to analyse_data'
#get_data has finished
if __name__ == '__main__':
#Create queue for exchanging messages between processes
q = Queue()
#Create processes, and send the shared queue to them
processes = [Process(target=get_data,args(q,)),Process(target=analyse_data,args=(q,))]
#Start processes
for p in processes:
p.start()
#Wait until all processes complete
for p in processes:
p.join()
You example won't work for a few reasons :
Process cannot share a piece of memory with each other (you can't change the global in one process and see the change in the other)
Even if you could change the global value, you are checking it too fast and most likely it won't change in time
Read https://docs.python.org/3/library/ipc.html for more possibilities for inter-process-communications
I am new to multiprocessing
I have run example code for two 'highly recommended' multiprocessing examples given in response to other stackoverflow multiprocessing questions. Here is an example of one (which i dare not run again!)
test2.py (running from pydev)
import multiprocessing
class MyFancyClass(object):
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print(proc_name, self.name)
def worker(q):
obj = q.get()
obj.do_something()
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
When I run this my computer slows down imminently. It gets incrementally slower. After some time I managed to get into the task manager only to see MANY MANY python.exe under the processes tab. after trying to end process on some, my mouse stopped moving. It was the second time i was forced to reboot.
I am too scared to attempt a third example...
running - Intel(R) Core(TM) i7 CPU 870 # 2.93GHz (8 CPUs), ~2.9GHz on win7 64
If anyone know what the issue is and can provide a VERY SIMPLE example of multiprocessing (send a string too a multiprocess, alter it and send it back for printing) I would be very grateful.
From the docs:
Make sure that the main module can be safely imported by a new Python
interpreter without causing unintended side effects (such a starting a
new process).
Thus, on Windows, you must wrap your code inside a
if __name__=='__main__':
block.
For example, this sends a string to the worker process, the string is reversed and the result is printed by the main process:
import multiprocessing as mp
def worker(inq,outq):
obj = inq.get()
obj = obj[::-1]
outq.put(obj)
if __name__=='__main__':
inq = mp.Queue()
outq = mp.Queue()
p = mp.Process(target=worker, args=(inq,outq))
p.start()
inq.put('Fancy Dan')
# Wait for the worker to finish
p.join()
result = outq.get()
print(result)
Because of the way multiprocessing works on Windows (child processes import the __main__ module) the __main__ module cannot actually run anything when imported -- any code that should execute when run directly must be protected by the if __name__ == '__main__' idiom. Your corrected code:
import multiprocessing
class MyFancyClass(object):
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print(proc_name, self.name)
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
Might I suggest this link? It's using threads, instead of multiprocessing, but many of the principles are the same.
I have a thread which extends Thread. The code looks a little like this;
class MyThread(Thread):
def run(self):
# Do stuff
my_threads = []
while has_jobs() and len(my_threads) < 5:
new_thread = MyThread(next_job_details())
new_thread.run()
my_threads.append(new_thread)
for my_thread in my_threads
my_thread.join()
# Do stuff
So here in my pseudo code I check to see if there is any jobs (like a db etc) and if there is some jobs, and if there is less than 5 threads running, create new threads.
So from here, I then check over my threads and this is where I get stuck, I can use .join() but my understanding is that - this then waits until it's finished so if the first thread it checks is still in progress, it then waits till it's done - even if the other threads are finished....
so is there a way to check if a thread is done, then remove it if so?
eg
for my_thread in my_threads:
if my_thread.done():
# process results
del (my_threads[my_thread]) ?? will that work...
As TokenMacGuy says, you should use thread.is_alive() to check if a thread is still running. To remove no longer running threads from your list you can use a list comprehension:
for t in my_threads:
if not t.is_alive():
# get results from thread
t.handled = True
my_threads = [t for t in my_threads if not t.handled]
This avoids the problem of removing items from a list while iterating over it.
mythreads = threading.enumerate()
Enumerate returns a list of all Thread objects still alive.
https://docs.python.org/3.6/library/threading.html
you need to call thread.isAlive()to find out if the thread is still running
The answer has been covered, but for simplicity...
# To filter out finished threads
threads = [t for t in threads if t.is_alive()]
# Same thing but for QThreads (if you are using PyQt)
threads = [t for t in threads if t.isRunning()]
Better way is to use Queue class:
http://docs.python.org/library/queue.html
Look at the good example code in the bottom of documentation page:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
A easy solution to check thread finished or not. It is thread safe
Install pyrvsignal
pip install pyrvsignal
Example:
import time
from threading import Thread
from pyrvsignal import Signal
class MyThread(Thread):
started = Signal()
finished = Signal()
def __init__(self, target, args):
self.target = target
self.args = args
Thread.__init__(self)
def run(self) -> None:
self.started.emit()
self.target(*self.args)
self.finished.emit()
def do_my_work(details):
print(f"Doing work: {details}")
time.sleep(10)
def started_work():
print("Started work")
def finished_work():
print("Work finished")
thread = MyThread(target=do_my_work, args=("testing",))
thread.started.connect(started_work)
thread.finished.connect(finished_work)
thread.start()