Python threading: Queue workers - hang the program - python

I'm trying to implement a simple Queue with workers that do something.
The program should wait until the workers have finished emptying the queue, and continue execution.
I took the documentation example and tried to implement it in a class, since this is how it's gonna be implemented in my project.
Like this:
class Test:
def __init__(self, n, q):
self.q = Queue()
print "Starting workers..."
for i in range(n):
t = threading.Thread(target=self.worker)
t.daemon = True
t.start()
print "Workers started"
for i in range(q):
self.q.put(i)
self.q.join()
print "Exiting"
def worker(self):
name = threading.currentThread().getName()
print "Thread %s started" % name
while True:
item = self.q.get()
print "Processing item %d" % item
sleep(1)
self.q.task_done()
When instantiating the class t = Test(2, 100), all I can see is the "Thread... started" messages and the program hangs.
What is wrong with the code?
EDIT:
I just noticed that while this code hangs in IDLE (where I tested it), it performs flawlessly on the command line.
Looks like an environmental problem.

Yes, this have to be an environmental problem. I even tested it on a few different editors and PCs.
Output
Starting workers...
Thread Thread-1 started
Thread Thread-2 started
Workers started
Processing item 0
Processing item 1
Processing item 2
Processing item 3
Processing item 4
Processing item 5
Processing item 6
Processing item 7
Processing item 8
Processing item 9
Exiting
Code:
from Queue import Queue
import threading
from time import sleep
class Test:
def __init__(self, n, q):
self.q = Queue()
print "Starting workers..."
for i in range(n):
t = threading.Thread(target=self.worker)
t.daemon = True
t.start()
print "Workers started"
for i in range(q):
self.q.put(i)
self.q.join()
print "Exiting"
def worker(self):
name = threading.currentThread().getName()
print "Thread %s started" % name
while True:
item = self.q.get()
print "Processing item %d" % item
sleep(1)
self.q.task_done()
t = Test(2, 10)

Related

threads not running parallel in python script

I am new to python and threading. I am trying to run multiple threads at a time. Here is my basic code :
import threading
import time
threads = []
print "hello"
class myThread(threading.Thread):
def __init__(self,i):
threading.Thread.__init__(self)
print "i = ",i
for j in range(0,i):
print "j = ",j
time.sleep(5)
for i in range(1,4):
thread = myThread(i)
thread.start()
While 1 thread is waiting for time.sleep(5) i want another thread to start. In short, all the threads should run parallel.
You might have some misunderstandings on how to subclass threading.Thread, first of all __init__() method is roughly what represents a constructor in Python, basically it'll get executed every time you create an instance, so in your case when thread = myThread(i) executes, it'll block till the end of __init__().
Then you should move your activity into run(), so that when start() is called, the thread will start to run. For example:
import threading
import time
threads = []
print "hello"
class myThread(threading.Thread):
def __init__(self, i):
threading.Thread.__init__(self)
self.i = i
def run(self):
print "i = ", self.i
for j in range(0, self.i):
print "j = ",j
time.sleep(5)
for i in range(1,4):
thread = myThread(i)
thread.start()
P.S. Because of the existence of GIL in CPython, you might not be able to fully take advantages of all your processors if the task is CPU-bound.
Here is an example on how you could use threading based on your code:
import threading
import time
threads = []
print "hello"
def doWork(i):
print "i = ",i
for j in range(0,i):
print "j = ",j
time.sleep(5)
for i in range(1,4):
thread = threading.Thread(target=doWork, args=(i,))
threads.append(thread)
thread.start()
# you need to wait for the threads to finish
for thread in threads:
thread.join()
print "Finished"
import threading
import subprocess
def obj_func(simid):
simid = simid
workingdir = './' +str (simid) # the working directory for the simulation
cmd = './run_delwaq.sh' # cmd is a bash commend to launch the external execution
subprocess.Popen(cmd, cwd=workingdir).wait()
def example_subprocess_files():
num_threads = 4
jobs = []
# Launch the threads and give them access to the objective function
for i in range(num_threads):
workertask = threading.Thread(target=obj_func(i))
jobs.append(workertask)
for j in jobs:
j.start()
for j in jobs:
j.join()
print('All the work finished!')
if __name__ == '__main__':
example_subprocess_files()
This one not works for my case that the task is not print but CPU-Intensive task. The thread are excluded in serial.

Cannot to line up item from one queue to another

I deal with two python queues.
Short description of my issue:
Clients pass through the waiting queue(q1) and they (the clients) are served afterwards. The size of the waiting queue can't be greater than N (10 in my program). If waiting queue becomes full, clients pass to outside queue(q2, size 20). If outside queue becomes full, clients are rejected and not served.
Every client that left a waiting queue allows another client from outside queue to join the waiting queue.
Work with queues should be thread-safe.
Below I implemented approximately what I want. But I'm faced with the problem - enqueuing a client from outside queue (q1) to the waiting queue (q2) during execution serve function. I guess I lost or forgot something important. I think this statement q1.put(client) blocks permanently but don't know why.
import time
import threading
from random import randrange
from Queue import Queue, Full as FullQueue
class Client(object):
def __repr__(self):
return '<{0}: {1}>'.format(self.__class__.__name__, id(self))
def serve(q1, q2):
while True:
if not q2.empty():
client = q2.get()
print '%s leaved outside queue' % client
q1.put(client)
print '%s is in the waiting queue' % client
q2.task_done()
client = q1.get()
print '%s leaved waiting queue for serving' % client
time.sleep(2) # Do something with client
q1.task_done()
def main():
waiting_queue = Queue(10)
outside_queue = Queue(20)
for _ in range(2):
worker = threading.Thread(target=serve, args=(waiting_queue, outside_queue))
worker.setDaemon(True)
worker.start()
delays = [randrange(1, 5) for _ in range(100)]
# Every d seconds 10 clients enter to the waiting queue
for d in delays:
time.sleep(d)
for _ in range(10):
client = Client()
try:
waiting_queue.put_nowait(client)
except FullQueue:
print 'Waiting queue is full. Please line up in outside queue.'
try:
outside_queue.put_nowait(client)
except FullQueue:
print 'Outside queue is full. Please go out.'
waiting_queue.join()
outside_queue.join()
print 'Done'
Finally I found the solution. I check docs more attentive
If full() returns True it doesn’t guarantee that a subsequent call to get() will not block https://docs.python.org/2/library/queue.html#Queue.Queue.full
That's why q1.full() is not reliable in a few threads. I added mutex before inserting item to queues and checking queue is full:
class Client(object):
def __init__(self, ident):
self.ident = ident
def __repr__(self):
return '<{0}: {1}>'.format(self.__class__.__name__, self.ident)
def serve(q1, q2, mutex):
while True:
client = q1.get()
print '%s leaved waiting queue for serving' % client
time.sleep(2) # Do something with client
q1.task_done()
with mutex:
if not q2.empty() and not q1.full():
client = q2.get()
print '%s leaved outside queue' % client
q1.put(client)
print '%s is in the waiting queue' % client
q2.task_done()
def main():
waiting_queue = Queue(10)
outside_queue = Queue(20)
lock = threading.RLock()
for _ in range(2):
worker = threading.Thread(target=serve, args=(waiting_queue, outside_queue, lock))
worker.setDaemon(True)
worker.start()
# Every 1-5 seconds 10 clients enter to the waiting room
i = 1 # Used for unique <int> client's id
while True:
delay = randrange(1, 5)
time.sleep(delay)
for _ in range(10):
client = Client(i)
try:
lock.acquire()
if not waiting_queue.full():
waiting_queue.put(client)
else:
outside_queue.put_nowait(client)
except FullQueue:
# print 'Outside queue is full. Please go out.'
pass
finally:
lock.release()
i += 1
waiting_queue.join()
outside_queue.join()
print 'Done'
Now it works well.

Strange process clone appears with python multiprocessing

I have faces a very strange behavior of Python. It looks like when I start parallel program which uses multiprocessing and in the main process spawn 2 more(producer, consumer) I see 4 processes running. I think there should be only 3: the main, Producer, Consumer. But after some time the 4th process appears.
I have made a minimal example of the code to reproduce the problem. It create two processes in which calculate Fibonacci numbers using recursion:
from multiprocessing import Process, Queue
import os, sys
import time
import signal
def fib(n):
if n == 1 or n == 2:
return 1
result = fib(n-1) + fib(n-2)
return result
def worker(queue, amount):
pid = os.getpid()
def workerProcess(a, b):
print a, b
print 'This is Writer(', pid, ')'
signal.signal(signal.SIGUSR1, workerProcess)
print 'Worker', os.getpid()
for i in range(0, amount):
queue.put(fib(35 - i % 4))
queue.put('end')
print 'Worker finished'
def writer(queue):
pid = os.getpid()
def writerProcess(a, b):
print a, b
print 'This is Writer(', pid, ')'
signal.signal(signal.SIGUSR1, writerProcess)
print 'Writer', os.getpid()
working = True
while working:
if not queue.empty():
value = queue.get()
if value != 'end':
fib(32 + value % 4)
else:
working = False
else:
time.sleep(1)
print 'Writer finished'
def daemon():
print 'Daemon', os.getpid()
while True:
time.sleep(1)
def useProcesses(amount):
q = Queue()
writer_process = Process(target=writer, args=(q,))
worker_process = Process(target=worker, args=(q, amount))
writer_process.daemon = True
worker_process.daemon = True
worker_process.start()
writer_process.start()
def run(amount):
print 'Main', os.getpid()
pid = os.getpid()
def killThisProcess(a, b):
print a, b
print 'Main killed by signal(', pid, ')'
sys.exit(0)
signal.signal(signal.SIGTERM, killThisProcess)
useProcesses(amount)
print 'Ready to exit main'
while True:
time.sleep(1)
def main():
run(1000)
if __name__=='__main__':
main()
What I see in the output is:
$ python python_daemon.py
Main 13257
Ready to exit main
Worker 13258
Writer 13259
but in htop I see the following:
And it looks like the process with PID 13322 is actually a thread. The question is what is it? Who spawn it? Why?
If I send SIGUSR1 to this PID I see in the output appears:
10 <frame object at 0x7f05c14ed5d8>
This is Writer( 13258 )
This question is slightly related with: Python multiprocessing: more processes than requested
The threads belongs to the Queue object.
It uses internally a thread to dispatch the data over a Pipe.
From the docs:
class multiprocessing.Queue([maxsize])
Returns a process shared queue implemented using a pipe and a few locks/semaphores. When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.

Handling kill events for python multiprocessing processes

For a program that should run both on Linux and Windows (python 2.7), I'm trying to update values of a given object using multiprocessing.Process (while the main program is running, I'm calling the update class by a separate process).
Sometimes it takes too long before my object is updated, so I want to be able to kill my update process, and to continue with the main program. "Too long" is not strictly defined here, but rather a subjective perception of the user.
For a single queue (as in the MyFancyClass example in http://pymotw.com/2/multiprocessing/communication.html) I can kill the update process and the main program continues as I want.
However, when I make a second queue to retrieve the updated object, ending the update process does not allow me to continue in the main program.
What I have so far is:
import multiprocessing
import time, os
class NewParallelProcess(multiprocessing.Process):
def __init__(self, taskQueue, resultQueue, processName):
multiprocessing.Process.__init__(self)
self.taskQueue = taskQueue
self.resultQueue = resultQueue
self.processName = processName
def run(self):
print "pid %s of process that could be killed" % os.getpid()
while True:
next_task = self.taskQueue.get()
if next_task is None:
# poison pill for terminate
print "%s: exiting" % self.processName
self.taskQueue.task_done()
break
print "%s: %s" % (self.processName, next_task)
answer = next_task()
self.taskQueue.task_done()
self.resultQueue.put(answer)
return
class OldObject(object):
def __init__(self):
self.accurate = "OldValue"
self.otherValue = "SomeOtherValue"
class UpdateObject(dict):
def __init__(self, objectToUpdate):
self.objectToUpdate = objectToUpdate
def __call__(self):
returnDict = {}
returnDict["update"] = self.updateValue("NewValue")
return returnDict
def __str__(self):
return "update starting"
def updateValue(self, updatedValue):
for i in range(5):
time.sleep(1) # updating my object - time consuming with possible pid kill
print "working... (pid=%s)" % os.getpid()
self.objectToUpdate.accurate = updatedValue
return self.objectToUpdate
if __name__ == '__main__':
taskQueue = multiprocessing.JoinableQueue()
resultQueue = multiprocessing.Queue()
newProcess = NewParallelProcess(taskQueue, resultQueue, processName="updateMyObject")
newProcess.start()
myObject = OldObject()
taskQueue.put(UpdateObject(myObject))
# poison pill for NewParallelProcess loop and wait to finish
taskQueue.put(None)
taskQueue.join()
# get back results
results = resultQueue.get()
print "Values have been updated"
print "---> %s became %s" % (myObject.accurate, results["update"].accurate)
Any suggestions on how to kill the newProcess and to continue in the main program?
Well, made some modifications, and this does what I want. Not sure whether it is the most efficient, so any improvements are always welcome :)
import multiprocessing
import time, os
class NewParallelProcess(multiprocessing.Process):
def __init__(self, taskQueue, resultQueue, processName):
multiprocessing.Process.__init__(self)
self.taskQueue = taskQueue
self.resultQueue = resultQueue
self.name = processName
def run(self):
print "Process %s (pid = %s) added to the list of running processes" % (self.name, self.pid)
next_task = self.taskQueue.get()
self.taskQueue.task_done()
self.resultQueue.put(next_task())
return
class OldObject(object):
def __init__(self):
self.accurate = "OldValue"
self.otherValue = "SomeOtherValue"
class UpdateObject(dict):
def __init__(self, objectToUpdate, valueToUpdate):
self.objectToUpdate = objectToUpdate
self.valueToUpdate = valueToUpdate
def __call__(self):
returnDict = {}
returnDict["update"] = self.updateValue(self.valueToUpdate)
return returnDict
def updateValue(self, updatedValue):
for i in range(5):
time.sleep(1) # updating my object - time consuming with possible pid kill
print "working... (pid=%s)" % os.getpid()
self.objectToUpdate.accurate = updatedValue
return self.objectToUpdate
if __name__ == '__main__':
# queue for single process
taskQueue = multiprocessing.JoinableQueue()
resultQueue = multiprocessing.Queue()
newProcess = NewParallelProcess(taskQueue, resultQueue, processName="updateMyObject")
newProcess.start()
myObject = OldObject()
taskQueue.put(UpdateObject(myObject, "NewValue"))
while True:
# check if newProcess is still alive
time.sleep(5)
if newProcess.is_alive() is False:
print "Process %s (pid = %s) is not running any more (exit code = %s)" % (newProcess.name, newProcess.pid, newProcess.exitcode)
break
if newProcess.exitcode == 0:
print "ALL OK"
taskQueue.join()
# get back results
print "NOT KILLED"
results = resultQueue.get()
print "Values have been updated"
print "---> %s became %s" % (myObject.accurate, results["update"].accurate)
elif newProcess.exitcode == 1:
print "ended with error in function"
print "KILLED"
for i in range(5):
time.sleep(1)
print "i continue"
elif newProcess.exitcode == -15 or newProcess.exitcode == -9:
print "ended with kill signal %s" % newProcess.exitcode
print "KILLED"
for i in range(5):
time.sleep(1)
print "i continue"
else:
print "no idea what happened"
print "KILLED"
for i in range(5):
time.sleep(1)
print "i continue"

How to handle multiple jobs in a queue with fixed number of threads in Python

In the below program I have posted 5 jobs to the queue, but have created only 3 threads. When I run the program, only 3 jobs are completed. How am I supposed to complete all 5 jobs with only 3 threads? Is there a way to the make a thread that has completed its job take the next job?
import time
import Queue
import threading
class worker(threading.Thread):
def __init__(self,qu):
threading.Thread.__init__(self)
self.que=qu
def run(self):
print "Going to sleep.."
time.sleep(self.que.get())
print "Slept .."
self.que.task_done()
q = Queue.Queue()
for j in range(3):
work = worker(q);
work.setDaemon(True)
work.start()
for i in range(5):
q.put(1)
q.join()
print "done!!"
You need to have your worker threads run in a loop. You can use a sentinel value (like None or custom class) to tell the workers to shut down after you've put all your actual worked items in the queue:
import time
import Queue
import threading
class worker(threading.Thread):
def __init__(self,qu):
threading.Thread.__init__(self)
self.que=qu
def run(self):
for item in iter(self.que.get, None): # This will call self.que.get() until None is returned, at which point the loop will break.
print "Going to sleep.."
time.sleep(item)
print "Slept .."
self.que.task_done()
self.que.task_done()
q = Queue.Queue()
for j in range(3):
work = worker(q);
work.setDaemon(True)
work.start()
for i in range(5):
q.put(1)
for i in range(3): # Shut down all the workers
q.put(None)
q.join()
print "done!!"
Another option would be to use a multiprocessing.dummy.Pool, which is a thread pool that Python manages for you:
import time
from multiprocessing.dummy import Pool
def run(i):
print "Going to sleep..."
time.sleep(i)
print "Slept .."
p = Pool(3) # 3 threads in the pool
p.map(run, range(5)) # Calls run(i) for each element i in range(5)
p.close()
p.join()
print "done!!"

Categories