I have written a simple code like below. This is just a model of another, much more complicated problem. Here is a simple function "task submit" addint tasks in the queue, its aim is to continiously seek tasks deligated by used since user can create new tasks after the code has been launched. I have a worker, behaving like doing something, just a simple worker function. Then I call ThreadPoolExecutor, call "task submit" with queue argument. Then I start adding tasks pulled from queue. But it happens the code doest terminate even when only main thread (which is my program itself) remains in the pool of threads. Cant understand why even shutdown doesnt work.
from concurrent.futures import ThreadPoolExecutor as Tpe
import time
import random
import queue
import threading
def task_submit(q):
for i in range(7):
threading.currentThread().setName('task_submit')
new_task = random.randint(10, 20)
q.put_nowait(new_task)
print(f' {i} new task with argument {new_task} has been added to queue')
time.sleep(5)
def worker(t):
threading.currentThread().setName(f'worker {t}')
print(f'{threading.currentThread().getName()} started')
time.sleep(t)
print(f'{threading.currentThread().getName()} FINISHED!')
with Tpe(max_workers=4) as executor:
q = queue.Queue(maxsize=100)
q_thread = executor.submit(task_submit, q)
tasks = []
while True:
time.sleep(10)
print('\n\n------------NEW CYCLE----------------\n\n')
if not q.empty():
print(threading.enumerate())
tasks.append(executor.submit(worker, q.get()))
else:
print('is queue empty?', q.empty())
print(f'active threads: {threading.active_count()}')
print(threading.enumerate())
executor.shutdown(wait=True)
Related
I am trying to use a queue to load up a bunch of tasks and then have a process pool setup go at it where each process pops-out a task out of the queue and works on it. I am running into problems in that the setup is not working. Something is blocking the processes from getting started and I need help in figuring out the bug. E.g. the queue is filled up correctly, however, when the individual process runs, it doesn't start processing the task subroutine.
# -*- coding: utf-8 -*-
"""
Created on Tue Aug 30 17:08:42 2022
#author: Rahul
"""
import threading
import queue
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import multiprocessing as mp
import time
q = queue.Queue()
# some worker task
def worker(id, q):
print(f'{id}:: Worker running', flush=True)
while q.unfinished_tasks > 0:
item = q.get()
print(f'{id}::Working on {item}', flush=True)
print(f'{id}::Finished {item}', flush=True)
q.task_done()
print(f'{id}::Sleeping. Item: {item}', flush=True)
time.sleep(0.1)
print(
f'We reached the end. Queue size is {q.unfinished_tasks}', flush=True)
def main():
print('running main')
# Send thirty task requests to the worker.
for item in range(30):
q.put(item)
# Confirm that queue is filled
print(f'Size of queue {q.unfinished_tasks}')
id = 0
# start process pool
with ProcessPoolExecutor(max_workers=4) as executor:
executor.map(worker, [1, 2, 3, 4], [q, q, q, q])
# Block until all tasks are done.
q.join()
print('All work completed')
if __name__ == "__main__":
main()
This creates the following output and is stuck after that, no control of keyboard etc., have to shutdown IDE and restart.
running main
Size of queue 30
For multiprocessing, there are 2 ways to uses a queue.
You have to either
use queue as shared global via initializer parameter or
use a manager
See Python multiprocessing.Queue vs multiprocessing.manager().Queue() for examples for how to set it up.
Below is an example of using a manager of OP's use case.
A few things to highlight:
Uses manager.Queue() which helps share the queue across different processes.
Typically for worker process, it's better to use while True loop that terminates when seeing some SENTINEL value. This will allow the worker to wait even if all current work is done, in case there are more work coming. It's also more robust than q.empty() check (or q.unfinished_tasks, which doesn't exist in multiprocessing version)
Using the SENTINEL approach requires adding 4 SENTINEL values, 1 for each process, after all the tasks.
with ProcessPoolExecutor ... context manager is blocking, meaning it will wait until all Processes exits before continuing to the next lines. You may consider using explicit shutdown for non-block statments, e.g.
executor = ProcessPoolExecutor(max_workers=4)
executor.map(...)
executor.shutdown(wait=False)
Now finally, the example solution:
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import multiprocessing as mp
import time
# some worker task
SENTINEL = 'DONE'
def worker(id, q):
print(f'{id}:: Worker running', flush=True)
# better to use the while True with SENTINEL
# other methods such as checking 'q.empty()' may be unreliable.
while True:
item = q.get()
if item == SENTINEL:
q.task_done()
break
print(f'{id}::Working on {item}', flush=True)
print(f'{id}::Finished {item}', flush=True)
q.task_done()
print(f'{id}::Sleeping. Item: {item}', flush=True)
time.sleep(0.1)
print(
f'We reached the end.', flush=True)
def main():
print('running main')
# Send thirty task requests to the worker.
with mp.Manager() as manager:
q = manager.Queue()
for item in range(30):
q.put(item)
# adding 4 sentinel values at the end, 1 for each process.
for _ in range(4):
q.put(SENTINEL)
# Confirm that queue is filled
print(f'Approx queue size: {q.qsize()}')
id = 0
# start process pool
with ProcessPoolExecutor(max_workers=4) as executor:
executor.map(worker, [1, 2, 3, 4], [q, q, q, q])
print('working')
# Block until all tasks are done.
q.join()
print('All work completed')
if __name__ == "__main__":
main()
I want to run multiple threads in parallel. Each thread picks up a task from a task queue and executes that task.
from threading import Thread
from Queue import Queue
import time
class link(object):
def __init__(self, i):
self.name = str(i)
def run_jobs_in_parallel(consumer_func, jobs, results, thread_count,
async_run=False):
def consume_from_queue(jobs, results):
while not jobs.empty():
job = jobs.get()
try:
results.append(consumer_func(job))
except Exception as e:
print str(e)
results.append(False)
finally:
jobs.task_done()
#start worker threads
if jobs.qsize() < thread_count:
thread_count = jobs.qsize()
for tc in range(1,thread_count+1):
worker = Thread(
target=consume_from_queue,
name="worker_{0}".format(str(tc)),
args=(jobs,results,))
worker.start()
if not async_run:
jobs.join()
def create_link(link):
print str(link.name)
time.sleep(10)
return True
def consumer_func(link):
return create_link(link)
# create_link takes a while to execute
jobs = Queue()
results = list()
for i in range(0,10):
jobs.put(link(i))
run_jobs_in_parallel(consumer_func, jobs, results, 25, async_run=False)
Now what is happening is, let say we have 10 link objects in jobs queue, while the threads are running in parallel, multiple threads are executing same task. How can I prevent this from happening?
Note - the above sample code does not have the problem describe above, but i have exactly same code except create_link method does some complex stuff.
I think what you need is a lock object (docs,tutorial+examples). If you create an instance of such an object you can 'lock' some parts of your code, ensuring that only one thread executes this part at a time.
I guess in your case you want to lock the line job = jobs.get().
First you have to create the lock in a scope where all threads have access to it. (You don't want a lock for every thread but a single lock for all your threads. That means creating the lock within your thread just before acquiring it won't work)
import threading
lock = threading.Lock()
then you can use it on your line like:
lock.acquire()
job = jobs.get()
lock.release()
or
with lock:
job = jobs.get()
The first thread to reach acquire() will lock the lock. other threads that try to acquire() the lock will pause until the lock gets unlocked again by calling release().
I am using the following code to complete a task using multithreading with Queue and Joinable Queue. Sometimes the script executes perfectly other times it stalls at the end of the task without ending the worker and will not continue on to the next portion of the script. I am new to working with Queue and JoinableQueue and I need to find out why this stalling happens.
Before this part in the code I run another Queue, JoinableQueue worker to download some data and it works perfectly fine everytime. Do I need to close() any thing from the first Queue/JoinableQueue? Is there a way to check if it stalls and if so continue on?
Here is my code:
import multiprocessing
from multiprocessing import Queue
from multiprocessing import JoinableQueue
from threading import Thread
def run_this_definition(hr):
#do things here
return()
def worker():
while True:
item = jq.get()
run_this_definition(item)
jq.task_done()
return()
q = Queue()
jq = JoinableQueue()
number_of_threads = 8
for i in range(number_of_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
input_list = [0,1,2,3,4]
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The script never prints "finished" when it stalls, but seems to finish all the tasks and stalls at the end of the 'run_this_definition' on the very last item in the Queue.
My guess is you are using the multiprocessing.JoinableQueue()!? Use the Queue.Queue() instead for threading. It has a .join() and a .task_done() method as well. Furthermore you should pass your queue as an argument to your threads: See the following example:
import threading
from threading import Thread
from Queue import Queue
def worker(jq):
while True:
item = jq.get()
# Do whatever you have to do.
print '{}: {}'.format(threading.currentThread().name, item)
jq.task_done()
return()
number_of_threads = 4
input_list = [1,2,3,4,5]
jq = Queue()
for i in range(number_of_threads):
t = Thread(target=worker, args=(jq,))
t.daemon = True
t.start()
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The print output form multiple threads might look messy, but as an example it should be fine.
For the future: Please provide a comprehensive example of your code. Neither your imports, nor number_of_threads, run_this_definition or input_list were defined in your example.
When I have an error in my code, I'd like my processes to exit, but I have some strange behavior that I don't know how to work around.
This code errors out and closes the processes as expected:
from multiprocessing import Queue, Pool
def worker(queue):
raise error
task_queue = Queue(10)
the_pool = Pool(1, worker, (task_queue,))
But this one spins off an infinite number of new processes which all error out (but followed up by yet new processes):
from multiprocessing import Queue, Pool
def worker(queue):
raise error
task_queue = Queue(10)
the_pool = Pool(1, worker, (task_queue,))
while True: # <-- added this
pass
How can I effectively stop the second from spinning off infinite new processes?
I don't understand why "while True:" is needed in below example
import os
import sys
import subprocess
import time
from threading import Thread
from Queue import Queue
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
def do_work(item):
time.sleep(item)
print item
q = Queue()
for i in range(2):
t = Thread(target=worker)
t.daemon = True
t.start()
source = [2,3,1,4,5]
for item in source:
q.put(item)
q.join()
Because otherwise the worker thread would quit as soon as the first job was processed from the queue. The infinite loop ensures that the worker thread retrieves a new job from the queue when finished.
Update: to summarize the comments to my (admittedly hasty) answer: the worker thread is daemonic (ensured by t.daemon = True), which means that it will automatically terminate when there are only daemonic threads left in the Python interpreter (a more detailed explanation is given here). It is also worth mentioning that the get method of the queue on which the worker operates blocks the thread when the queue is empty to let other threads run while the worker is waiting for more jobs to appear in the queue.