gevent queue failed with LoopExit - python

I want to use python gevent library to implement one producer and multiple consumers server. There is my attempt:
class EmailValidationServer():
def __init__(self):
self.queue = Queue()
def worker(self):
while True:
json = self.queue.get()
def handler(self,socket,address):
fileobj = socket.makefile()
content = fileobj.read(max_read)
contents = json.loads(content)
for content in contents:
self.queue.put(content)
def daemon(self,addr='127.0.0.1',num_thread=5):
pool = Pool(1000)
server = StreamServer((addr, 6000),self.handler,spawn=pool) # run
pool = ThreadPool(num_thread)
for _ in range(num_thread):
pool.spawn(self.worker)
server.serve_forever()
if __name__ == "__main__":
email_server = EmailValidationServer()
email_server.daemon()
I used the queue from gevent.queue.Queue. It gives me the error information:
LoopExit: This operation would block forever
(<ThreadPool at 0x7f08c80eef50 0/4/5>,
<bound method EmailValidationServer.worker of <__main__.EmailValidationServer instance at 0x7f08c8dcd998>>) failed with LoopExit
Problem: But when I change the Queue from gevent's implementation to python build-in library, it works. I do not know the reason, I guess it's supported to have difference between their implementation. I do not know the reason why gevent does not allow infinite wait. Is there anyone can give an explanation? Thanks advance

I suggest that you could use the gevent.queue.JoinableQueue() instead of Python's built-in Queue(). You can refer to the official queue guide for API Usages (http://www.gevent.org/gevent.queue.html)
def worker():
while True:
item = q.get()
try:
do_work(item)
finally:
q.task_done()
q = JoinableQueue()
for i in range(num_worker_threads):
gevent.spawn(worker)
for item in source():
q.put(item)
q.join() # block until all tasks are done
If you met the exceptions again, you'd better get fully understand the principle of Gevent corouinte control flow ...Once you get the point, that was not a big deal. :)

Related

multiprocessing.Process doesn't terminate after putting requests response.content to queue

I'm trying to run multiple API requests in parallel with multiprocessing.Process and requests. I put urls to parse into JoinableQueue instance and put back the content to the Queue instance. I've noticed that putting response.content into the Queue somehow prevents the process from terminating.
Here's simplified example with just 1 process (Python 3.5):
import multiprocessing as mp
import queue
import requests
import time
class ChildProcess(mp.Process):
def __init__(self, q, qout):
super().__init__()
self.qin = qin
self.qout = qout
self.daemon = True
def run(self):
while True:
try:
url = self.qin.get(block=False)
r = requests.get(url, verify=False)
self.qout.put(r.content)
self.qin.task_done()
except queue.Empty:
break
except requests.exceptions.RequestException as e:
print(self.name, e)
self.qin.task_done()
print("Infinite loop terminates")
if __name__ == '__main__':
qin = mp.JoinableQueue()
qout = mp.Queue()
for _ in range(5):
qin.put('http://en.wikipedia.org')
w = ChildProcess(qin, qout)
w.start()
qin.join()
time.sleep(1)
print(w.name, w.is_alive())
After running the code I get:
Infinite loop terminates
ChildProcess-1 True
Please help to understand why the process doesn't terminate after run function exits.
Update: added print statement to show the loop terminates
As noted in the Pipes and Queues documentation
if a child process has put items on a queue (and it has not used
JoinableQueue.cancel_join_thread), then that process will not
terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock
unless you are sure that all items which have been put on the queue
have been consumed.
...
Note that a queue created using a manager does not have this issue.
If you switch over to a manager queue, then the process terminates successfully:
import multiprocessing as mp
import queue
import requests
import time
class ChildProcess(mp.Process):
def __init__(self, q, qout):
super().__init__()
self.qin = qin
self.qout = qout
self.daemon = True
def run(self):
while True:
try:
url = self.qin.get(block=False)
r = requests.get(url, verify=False)
self.qout.put(r.content)
self.qin.task_done()
except queue.Empty:
break
except requests.exceptions.RequestException as e:
print(self.name, e)
self.qin.task_done()
print("Infinite loop terminates")
if __name__ == '__main__':
manager = mp.Manager()
qin = mp.JoinableQueue()
qout = manager.Queue()
for _ in range(5):
qin.put('http://en.wikipedia.org')
w = ChildProcess(qin, qout)
w.start()
qin.join()
time.sleep(1)
print(w.name, w.is_alive())
It's a bit hard to figure this out based on the Queue documentation - I struggled with the same problem.
The key concept here is that before a producer thread terminates, it joins any queues that it has put data into; that join then blocks until the queue's background thread terminates, which only happens when the queue is empty. So basically, before your ChildProcess can exit, someone has to consume all the stuff it put into the queue!
There is some documentation of the Queue.cancel_join_thread function, which is supposed to circumvent this problem, but I couldn't get it to have any effect - maybe I'm not using it correctly.
Here's an example modification you can make that should fix the issue:
if __name__ == '__main__':
qin = mp.JoinableQueue()
qout = mp.Queue()
for _ in range(5):
qin.put('http://en.wikipedia.org')
w = ChildProcess(qin, qout)
w.start()
qin.join()
while True:
try:
qout.get(True, 0.1) # Throw away remaining stuff in qout (or process it or whatever,
# just get it out of the queue so the queue background process
# can terminate, so your ChildProcess can terminate.
except queue.Empty:
break
w.join() # Wait for your ChildProcess to finish up.
# time.sleep(1) # Not necessary since we've joined the ChildProcess
print(w.name, w.is_alive())
Add a call to w.terminate() above the print message.
Regarding why the process doesn't terminate itself; your function code is an infinite loop, so it doesn't ever return. Calling terminate signals the process to kill itself.

Python interprocess communication with idle processes

I have an idle background process to process data in a queue, which I've implemented in the following way. The data passed in this example is just an integer, but I will be passing lists with up to 1000 integers and putting up to 100 lists on the queue per sec. Is this the correct approach, or should I be looking at more elaborate RPC and server methods?
import multiprocessing
import Queue
import time
class MyProcess(multiprocessing.Process):
def __init__(self, queue, cmds):
multiprocessing.Process.__init__(self)
self.q = queue
self.cmds = cmds
def run(self):
exit_flag = False
while True:
try:
obj = self.q.get(False)
print obj
except Queue.Empty:
if exit_flag:
break
else:
pass
if not exit_flag and self.cmds.poll():
cmd = self.cmds.recv()
if cmd == -1:
exit_flag = True
time.sleep(.01)
if __name__ == '__main__':
queue = multiprocessing.Queue()
proc2main, main2proc = multiprocessing.Pipe(duplex=False)
p = MyProcess(queue, proc2main)
p.start()
for i in range(5):
queue.put(i)
main2proc.send(-1)
proc2main.close()
main2proc.close()
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
It depends on how long it will take to process the data. I can't tell because I don't have a sample of the data, but in general it is better to move to more elaborate RPC and server methods when you need things like load balancing, guaranteed uptime, or scalability. Just remember that these things will add complexity, which may make your application harder to deploy, debug, and maintain. It will also increase the latency that it takes to process a task (which might or might not be a concern to you).
I would test it with some sample data, and determine if you need the scalability that multiple servers provide.

parallelly execute blocking calls in python

I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script
Use the threading module.
Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....
You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()
You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error
Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.

Checking on a thread / remove from list

I have a thread which extends Thread. The code looks a little like this;
class MyThread(Thread):
def run(self):
# Do stuff
my_threads = []
while has_jobs() and len(my_threads) < 5:
new_thread = MyThread(next_job_details())
new_thread.run()
my_threads.append(new_thread)
for my_thread in my_threads
my_thread.join()
# Do stuff
So here in my pseudo code I check to see if there is any jobs (like a db etc) and if there is some jobs, and if there is less than 5 threads running, create new threads.
So from here, I then check over my threads and this is where I get stuck, I can use .join() but my understanding is that - this then waits until it's finished so if the first thread it checks is still in progress, it then waits till it's done - even if the other threads are finished....
so is there a way to check if a thread is done, then remove it if so?
eg
for my_thread in my_threads:
if my_thread.done():
# process results
del (my_threads[my_thread]) ?? will that work...
As TokenMacGuy says, you should use thread.is_alive() to check if a thread is still running. To remove no longer running threads from your list you can use a list comprehension:
for t in my_threads:
if not t.is_alive():
# get results from thread
t.handled = True
my_threads = [t for t in my_threads if not t.handled]
This avoids the problem of removing items from a list while iterating over it.
mythreads = threading.enumerate()
Enumerate returns a list of all Thread objects still alive.
https://docs.python.org/3.6/library/threading.html
you need to call thread.isAlive()to find out if the thread is still running
The answer has been covered, but for simplicity...
# To filter out finished threads
threads = [t for t in threads if t.is_alive()]
# Same thing but for QThreads (if you are using PyQt)
threads = [t for t in threads if t.isRunning()]
Better way is to use Queue class:
http://docs.python.org/library/queue.html
Look at the good example code in the bottom of documentation page:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
A easy solution to check thread finished or not. It is thread safe
Install pyrvsignal
pip install pyrvsignal
Example:
import time
from threading import Thread
from pyrvsignal import Signal
class MyThread(Thread):
started = Signal()
finished = Signal()
def __init__(self, target, args):
self.target = target
self.args = args
Thread.__init__(self)
def run(self) -> None:
self.started.emit()
self.target(*self.args)
self.finished.emit()
def do_my_work(details):
print(f"Doing work: {details}")
time.sleep(10)
def started_work():
print("Started work")
def finished_work():
print("Work finished")
thread = MyThread(target=do_my_work, args=("testing",))
thread.started.connect(started_work)
thread.finished.connect(finished_work)
thread.start()

Python multiprocessing with twisted's reactor

I am working on a xmlrpc server which has to perform certain tasks cyclically. I am using twisted as the core of the xmlrpc service but I am running into a little problem:
class cemeteryRPC(xmlrpc.XMLRPC):
def __init__(self, dic):
xmlrpc.XMLRPC.__init__(self)
def xmlrpc_foo(self):
return 1
def cycle(self):
print "Hello"
time.sleep(3)
class cemeteryM( base ):
def __init__(self, dic): # dic is for cemetery
multiprocessing.Process.__init__(self)
self.cemRPC = cemeteryRPC()
def run(self):
# Start reactor on a second process
reactor.listenTCP( c.PORT_XMLRPC, server.Site( self.cemRPC ) )
p = multiprocessing.Process( target=reactor.run )
p.start()
while not self.exit.is_set():
self.cemRPC.cycle()
#p.join()
if __name__ == "__main__":
import errno
test = cemeteryM()
test.start()
# trying new method
notintr = False
while not notintr:
try:
test.join()
notintr = True
except OSError, ose:
if ose.errno != errno.EINTR:
raise ose
except KeyboardInterrupt:
notintr = True
How should i go about joining these two process so that their respective joins doesn't block?
(I am pretty confused by "join". Why would it block and I have googled but can't find much helpful explanation to the usage of join. Can someone explain this to me?)
Regards
Do you really need to run Twisted in a separate process? That looks pretty unusual to me.
Try to think of Twisted's Reactor as your main loop - and hang everything you need off that - rather than trying to run Twisted as a background task.
The more normal way of performing this sort of operation would be to use Twisted's .callLater or to add a LoopingCall object to the Reactor.
e.g.
from twisted.web import xmlrpc, server
from twisted.internet import task
from twisted.internet import reactor
class Example(xmlrpc.XMLRPC):
def xmlrpc_add(self, a, b):
return a + b
def timer_event(self):
print "one second"
r = Example()
m = task.LoopingCall(r.timer_event)
m.start(1.0)
reactor.listenTCP(7080, server.Site(r))
reactor.run()
Hey asdvawev - .join() in multiprocessing works just like .join() in threading - it's a blocking call the main thread runs to wait for the worker to shut down. If the worker never shuts down, then .join() will never return. For example:
class myproc(Process):
def run(self):
while True:
time.sleep(1)
Calling run on this means that join() will never, ever return. Typically to prevent this I'll use an Event() object passed into the child process to allow me to signal the child when to exit:
class myproc(Process):
def __init__(self, event):
self.event = event
Process.__init__(self)
def run(self):
while not self.event.is_set():
time.sleep(1)
Alternatively, if your work is encapsulated in a queue - you can simply have the child process work off of the queue until it encounters a sentinel (typically a None entry in the queue) and then shut down.
Both of these suggestions means that prior to calling .join() you can send set the event, or insert the sentinel and when join() is called, the process will finish it's current task and then exit properly.

Categories