import gevent
from gevent.event import AsyncResult
import time
class Job(object):
def __init__(self, name):
self.name = name
def setter(job):
print 'starting'
gevent.sleep(3)
job.result.set('%s done' % job.name)
def waiter(job):
print job.result.get()
# event loop
running = []
for i in range(5):
print 'creating'
j = Job(i)
j.result = AsyncResult()
running.append(gevent.spawn(setter, j))
running.append(gevent.spawn(waiter, j))
print 'started greenlets, event loop go do something else'
time.sleep(5)
gevent.joinall(running)
gevent doesnt actually start until joinall is called
Is there something that would start/spawn gevent asynchronously (why does it not start right away as soon as spawn is called)?
Is there a select/epoll on running greenlets to see which one needs to be joined instead of joinall()?
No, it does not start straight away. It will start as soon as your main greenlet yields to the hub (releases control by calling sleep or join for example)
Clearly your intention is that it starts when you call time. It does not, because you have not monkey patched it.
Add these lines to the very top of your file:
from gevent import monkey
monkey.patch_all()
This will then have the behaviour that you want (because under the hood, time will be modified to yield to the hub).
Alternatively, you can call gevent.sleep.
Since you did not monkey patch, time.sleep() is causing your app to pause. Use gevent.sleep(5) instead.
The very first step should be monkey patching
from gevent import monkey;
monkey.patch_all()
This will spawn the greenlets asynchronously.
Related
I have the problem that I need to write values generated by a consumer to disk. I do not want to open a new instance of a file to write every time, so I thought to use a second queue and a other consumer to write to disk from a singe Greenlet. The problem with my code is that the second queue does not get consumed async from the first queue. The first queue finishes first and then the second queue gets consumed.
I want to write values to disk at the same time then other values get generated.
Thanks for help!
#!/usr/bin/python
#- * -coding: utf-8 - * -
import gevent #pip install gevent
from gevent.queue import *
import gevent.monkey
from timeit import default_timer as timer
from time import sleep
import cPickle as pickle
gevent.monkey.patch_all()
def save_lineCount(count):
with open("count.p", "wb") as f:
pickle.dump(count, f)
def loader():
for i in range(0,3):
q.put(i)
def writer():
while True:
task = q_w.get()
print "writing",task
save_lineCount(task)
def worker():
while not q.empty():
task = q.get()
if task%2:
q_w.put(task)
print "put",task
sleep(10)
def asynchronous():
threads = []
threads.append(gevent.spawn(writer))
for i in range(0, 1):
threads.append(gevent.spawn(worker))
start = timer()
gevent.joinall(threads,raise_error=True)
end = timer()
#pbar.close()
print "\n\nTime passed: " + str(end - start)[:6]
q = gevent.queue.Queue()
q_w = gevent.queue.Queue()
gevent.spawn(loader).join()
asynchronous()
In general, that approach should work fine. There are some problems with this specific code, though:
Calling time.sleep will cause all greenlets to block. You either need to call gevent.sleep or monkey-patch the process in order to have just one greenlet block (I see gevent.monkey imported, but patch_all is not called). I suspect that's the major problem here.
Writing to a file is also synchronous and causes all greenlets to block. You can use FileObjectThread if that's a major bottleneck.
I have a primitive producer/consumer script running in gevent. It starts a few producer functions that put things into a gevent.queue.Queue, and one consumer function that fetches them out of the queue again:
from __future__ import print_function
import time
import gevent
import gevent.queue
import gevent.monkey
q = gevent.queue.Queue()
# define and spawn a consumer
def consumer():
while True:
item = q.get(block=True)
print('consumer got {}'.format(item))
consumer_greenlet = gevent.spawn(consumer)
# define and spawn a few producers
def producer(ID):
while True:
print("producer {} about to put".format(ID))
q.put('something from {}'.format(ID))
time.sleep(0.1)
# consumer_greenlet.switch()
producer_greenlets = [gevent.spawn(producer, i) for i in range(5)]
# wait indefinitely
gevent.monkey.patch_all()
print("about to join")
consumer_greenlet.join()
It works fine if I let gevent handle the scheduling implicitly (e.g. by calling time.sleep or some other gevent.monkey.patch()ed function), however when I switch to the consumer explicitly (replace time.sleepwith the commented-out switch call), gevent raises an AssertionError:
Traceback (most recent call last):
File "/my/virtualenvs/venv/local/lib/python2.7/site-packages/gevent/greenlet.py", line 327, in run
result = self._run(*self.args, **self.kwargs)
File "switch_test.py", line 14, in consumer
item = q.get(block=True)
File "/my/virtualenvs/venv/lib/python2.7/site-packages/gevent/queue.py", line 201, in get
assert result is waiter, 'Invalid switch into Queue.get: %r' % (result, )
AssertionError: Invalid switch into Queue.get: ()
<Greenlet at 0x7fde6fa6c870: consumer> failed with AssertionError
I would like to employ explicit switching because in production I have a lot of producers, gevent's scheduling does not allocate nearly enough runtime to the consumer and the queue gets longer and longer (which is bad). Alternatively, any insights into how to configure or modify gevent's scheduler is greatly appreciated.
This is on Python 2.7.2, gevent 1.0.1 and greenlet 0.4.5.
Seems to me explicit switch doesn't really play well with implicit switch.
You already have implicit switch happening either because monkey-patched I/O or because the gevent.queue.Queue().
The gevent documentation discourages usage of the raw greenlet methods:
Being a greenlet subclass, Greenlet also has switch() and throw()
methods. However, these should not be used at the application level as
they can very easily lead to greenlets that are forever unscheduled.
Prefer higher-level safe classes, like Event and Queue, instead.
Iterating gevent.queue.Queue() or accessing the queue's get method does implicit switching, interestingly put does not. So you have to generate an implicit thread switch yourself. Easiest is to call gevent.sleep(0) (you don't have to actually wait a specific time).
In conclusion you don't even have to monkey-pach things, provide that your code does not have blocking IO operations.
I would rewrite your code like this:
import gevent
import gevent.queue
q = gevent.queue.Queue()
# define and spawn a consumer
def consumer():
for item in q:
print('consumer got {}'.format(item))
consumer_greenlet = gevent.spawn(consumer)
# define and spawn a few producers
def producer(ID):
print('producer started', ID)
while True:
print("producer {} about to put".format(ID))
q.put('something from {}'.format(ID))
gevent.sleep(0)
producer_greenlets = [gevent.spawn(producer, i) for i in range(5)]
# wait indefinitely
print("about to join")
consumer_greenlet.join()
I'm new to Twisted and after finally figuring out how the deferreds work I'm struggling with the tasks. What I want to achieve is to have a script that sends a REST request in a loop, however if at some point it fails I want to stop the loop. Since I'm using callbacks I can't easily catch exceptions and because I don't know how to stop the looping from an errback I'm stuck.
This is the simplified version of my code:
def send_request():
agent = Agent(reactor)
req_result = agent.request('GET', some_rest_link)
req_result.addCallbacks(cp_process_request, cb_process_error)
if __name__ == "__main__":
list_call = task.LoopingCall(send_request)
list_call.start(2)
reactor.run()
To end a task.LoopingCall all you need to do is call the stop on the return object (list_call in your case).
Somehow you need to make that var available to your errback (cb_process_error) either by pushing it into a class that cb_process_error is in, via some other class used as a pseudo-global or by literally using a global, then you simply call list_call.stop() inside the errback.
BTW you said:
Since I'm using callbacks I can't easily catch exceptions
Thats not really true. The point of an errback to to deal with exceptions, thats one of the things that literally causes it to be called! Check out my previous deferred answer and see if it makes errbacks any clearer.
The following is a runnable example (... I'm not saying this is the best way to do it, just that it is a way...)
#!/usr/bin/python
from twisted.internet import task
from twisted.internet import reactor
from twisted.internet.defer import Deferred
from twisted.web.client import Agent
from pprint import pprint
class LoopingStuff (object):
def cp_process_request(self, return_obj):
print "In callback"
pprint (return_obj)
def cb_process_error(self, return_obj):
print "In Errorback"
pprint(return_obj)
self.loopstopper()
def send_request(self):
agent = Agent(reactor)
req_result = agent.request('GET', 'http://google.com')
req_result.addCallbacks(self.cp_process_request, self.cb_process_error)
def main():
looping_stuff_holder = LoopingStuff()
list_call = task.LoopingCall(looping_stuff_holder.send_request)
looping_stuff_holder.loopstopper = list_call.stop
list_call.start(2)
reactor.callLater(10, reactor.stop)
reactor.run()
if __name__ == '__main__':
main()
Assuming you can get to google.com this will fetch pages for 10 seconds, if you change the second arg of the agent.request to something like http://127.0.0.1:12999 (assuming that port 12999 will give a connection refused) then you'll see 1 errback printout (which will have also shutdown the loopingcall) and have a 10 second wait until the reactor shuts down.
I'd like to do something like that (1 queue, and multiple consumers):
import gevent
from gevent import queue
q=queue.Queue()
q.put(1)
q.put(2)
q.put(3)
q.put(StopIteration)
def consumer(qq):
for i in qq:
print i
jobs=[gevent.spawn(consumer,i) for i in [q,q]]
gevent.joinall(jobs)
But it's not possible ... the queue is consumed by job1 ... so job2 would block forever.
It gives me the exception gevent.hub.LoopExit: This operation would block forever.
I would that each consumer will be able to consume the full queue from start. (should display 1,2,3,1,2,3 or 1,1,2,2,3,3 ... nevermind)
One idea should be to clone the queue before spawning, but it's not possible using copy (shallow/deep) module ;-(
Is there another way to do that ?
[EDIT]
what do you think of that ?
import gevent
from gevent import queue
class MasterQueueClonable(queue.Queue):
def __init__(self,*a,**k):
queue.Queue.__init__(self,*a,**k)
self.__cloned = []
self.__old=[]
#override
def get(self,*a,**k):
e=queue.Queue.get(self,*a,**k)
for i in self.__cloned: i.put(e) # serve to current clones
self.__old.append(e) # save old element
return e
def clone(self):
q=queue.Queue()
for i in self.__old: q.put(i) # feed a queue with elements which are out
self.__cloned.append(q) # stock the queue, to be able to put newer elements too
return q
q=MasterQueueClonable()
q.put(1)
q.put(2)
q.put(3)
q.put(StopIteration)
def consumer(qq):
for i in qq:
print id(qq),i
jobs=[gevent.spawn(consumer,i) for i in [q.clone(), q ,q.clone(),q.clone()]]
gevent.joinall(jobs)
It's based on the idea of RyanYe. There is a "master queue" without a dispatcher.
My master queue override the GET method, and can dispatch to an ondemand clone.
And more, a "clone" can be created after the start of the masterqueue (with the __old trick).
I suggest you to create a greenlet to dispatch the work to consumers. Example code:
import gevent
from gevent import queue
master_queue=queue.Queue()
master_queue.put(1)
master_queue.put(2)
master_queue.put(3)
master_queue.put(StopIteration)
total_consumers = 10
consumer_queues = [queue.Queue() for i in xrange(total_consumers)]
def dispatcher(master_queue, consumer_queues):
for i in master_queue:
[j.put(i) for j in consumer_queues]
[j.put(StopIteration) for j in consumer_queues]
def consumer(qq):
for i in qq:
print i
jobs=[gevent.spawn(dispatcher, q, consumer_queues)] + [gevent.spawn(consumer, i) for i in consumer_queues]
gevent.joinall(jobs)
UPDATE: Fix missing StopIteration for consumer queues. Thanks arilou for pointing it out.
I've added copy() method to Queue class:
>>> import gevent.queue
>>> q = gevent.queue.Queue()
>>> q.put(5)
>>> q.copy().get()
5
>>> q
<Queue at 0x1062760d0 queue=deque([5])>
Let me know if it helps.
In the answer by Ryan Ye one line is missed in the end of the dispatcher() function:
[j.put(StopIteration) for j in consumer_queues]
Without it we still get 'gevent.hub.LoopExit: This operation would block forever' since 'for i in master_queue' loop doesn't copy StopIteration exception into the consumer_queues.
(Sorry, I can't leave comments yet so I write it as a separete answer.)
I am working on a xmlrpc server which has to perform certain tasks cyclically. I am using twisted as the core of the xmlrpc service but I am running into a little problem:
class cemeteryRPC(xmlrpc.XMLRPC):
def __init__(self, dic):
xmlrpc.XMLRPC.__init__(self)
def xmlrpc_foo(self):
return 1
def cycle(self):
print "Hello"
time.sleep(3)
class cemeteryM( base ):
def __init__(self, dic): # dic is for cemetery
multiprocessing.Process.__init__(self)
self.cemRPC = cemeteryRPC()
def run(self):
# Start reactor on a second process
reactor.listenTCP( c.PORT_XMLRPC, server.Site( self.cemRPC ) )
p = multiprocessing.Process( target=reactor.run )
p.start()
while not self.exit.is_set():
self.cemRPC.cycle()
#p.join()
if __name__ == "__main__":
import errno
test = cemeteryM()
test.start()
# trying new method
notintr = False
while not notintr:
try:
test.join()
notintr = True
except OSError, ose:
if ose.errno != errno.EINTR:
raise ose
except KeyboardInterrupt:
notintr = True
How should i go about joining these two process so that their respective joins doesn't block?
(I am pretty confused by "join". Why would it block and I have googled but can't find much helpful explanation to the usage of join. Can someone explain this to me?)
Regards
Do you really need to run Twisted in a separate process? That looks pretty unusual to me.
Try to think of Twisted's Reactor as your main loop - and hang everything you need off that - rather than trying to run Twisted as a background task.
The more normal way of performing this sort of operation would be to use Twisted's .callLater or to add a LoopingCall object to the Reactor.
e.g.
from twisted.web import xmlrpc, server
from twisted.internet import task
from twisted.internet import reactor
class Example(xmlrpc.XMLRPC):
def xmlrpc_add(self, a, b):
return a + b
def timer_event(self):
print "one second"
r = Example()
m = task.LoopingCall(r.timer_event)
m.start(1.0)
reactor.listenTCP(7080, server.Site(r))
reactor.run()
Hey asdvawev - .join() in multiprocessing works just like .join() in threading - it's a blocking call the main thread runs to wait for the worker to shut down. If the worker never shuts down, then .join() will never return. For example:
class myproc(Process):
def run(self):
while True:
time.sleep(1)
Calling run on this means that join() will never, ever return. Typically to prevent this I'll use an Event() object passed into the child process to allow me to signal the child when to exit:
class myproc(Process):
def __init__(self, event):
self.event = event
Process.__init__(self)
def run(self):
while not self.event.is_set():
time.sleep(1)
Alternatively, if your work is encapsulated in a queue - you can simply have the child process work off of the queue until it encounters a sentinel (typically a None entry in the queue) and then shut down.
Both of these suggestions means that prior to calling .join() you can send set the event, or insert the sentinel and when join() is called, the process will finish it's current task and then exit properly.