Normally, when using greenthread, I can write code as:
def myfun():
print "in my func"
eventlet.spawn(myfunc)
eventlet.sleep(0) #then myfunc() is triggered as a thread.
But after use money_patch(time=true), I can change the code as:
eventlet.monkey_patch(time=True)
eventlet.spawn(myfunc) # now myfunc is called immediately
Why I dont need to call eventlet.sleep(0) this time?
And after I write my own sleep function:
def my_sleep(seconds):
print "oh, my god, my own..."
and set the sleep attr of built-in time module as my_sleep func, then I find my_sleep func would be called so many times with a lot of outputs.
But I can only see only one debug-thread in eclipse, which did not call my_sleep func.
So, the conclusion is, sleep function is called continually by default, and I think that the authors of eventlet know this, so they developed the monkey_patch() function. Is that rigth?
#
According the answer of #temoto, CPython cannot reproduce the same result. I thinks I should add some message to introduce how I find this interesting thing.(Why I did not add this message at first time, cause input so many words in not easy, and my English is not that good.^^)
I find this thing when I remote-debug openstack code using eclipse.
In nova/network/model.py, a function is written like this:
class NetworkInfoAsyncWrapper(NetworkInfo):
"""Wrapper around NetworkInfo that allows retrieving NetworkInfo
in an async manner.
This allows one to start querying for network information before
you know you will need it. If you have a long-running
operation, this allows the network model retrieval to occur in the
background. When you need the data, it will ensure the async
operation has completed.
As an example:
def allocate_net_info(arg1, arg2)
return call_neutron_to_allocate(arg1, arg2)
network_info = NetworkInfoAsyncWrapper(allocate_net_info, arg1, arg2)
[do a long running operation -- real network_info will be retrieved
in the background]
[do something with network_info]
"""
def __init__(self, async_method, *args, **kwargs):
self._gt = eventlet.spawn(async_method, *args, **kwargs)
methods = ['json', 'fixed_ips', 'floating_ips']
for method in methods:
fn = getattr(self, method)
wrapper = functools.partial(self._sync_wrapper, fn)
functools.update_wrapper(wrapper, fn)
setattr(self, method, wrapper)
When I first debug to this function, After exeute
self._gt = eventlet.spawn(async_method, *args, **kwargs)
the call-back function async_method is exeuted at once. But please remember, this is a thread, it should be triggered by eventlet.sleep(0).
But I didnt find the code to call sleep(0), so if sleep is perhaps called by eclipse, then in the real world(not-debug world), who triggered it?
TL;DR: Eventlet API does not require sleep(0) to start a green thread. spawn(fun) will start function some time in future, including right now. You only should call sleep(0) to ensure that it starts right now, and even then it's better to use explicit synchronization, like Event or Semaphore.
I can't reproduce this behavior using CPython 2.7.6 and 3.4.3, eventlet 0.17.4 in either IPython or pure python console. So it's probably Eclipse who is calling time.sleep in background.
monkey_patch was introduced as a shortcut to running through all your (and third party) code and replacing time.sleep -> eventlet.sleep, and similar for os, socket, etc modules. It's not related to Eclipse (or something else) repeating time.sleep calls.
But this is an interesting observation, thank you.
Related
Since I dont like the approach to use loop.run() for various reasons I wanted to code contextual loop, since the docs states on different occasions that if you don't go with the canonical .run() you have to prevent memory leaks by yourself (i.e). After a bit of research it seems like the python devs answer this feature with We don't need it!. While contextmanagers seems in general perfectly fine if you using the lower level api of asyncio, see PEP 343 - The “with” Statement exampel 10:
This can be used to deterministically close anything with a close
method, be it file, generator, or something else. It can even be used
when the object isn’t guaranteed to require closing (e.g., a function
that accepts an arbitrary iterable)
So can we do it anyway?
Related links:
https://bugs.python.org/issue24795
https://bugs.python.org/issue32875
https://groups.google.com/g/python-tulip/c/8bRLexUzeU4
https://bugs.python.org/issue19860#msg205062
https://github.com/python/asyncio/issues/261
Yes we can have a context manager for our event loop, even if there seems no good practice via subclassing due the c implementations (i.e). Basically the idea crafted out below is the following:
TL;DR
Create an Object with __enter__ and __exit__ to have the syntax with working.
Instead of usually returning the object, we return the loop that is served by asyncio
wrapping the asycio.loop.close() so that the loop gets stopped and our __exit__ method gets invoked before.
Close all connections which can lead to memory leaks and then close the loop.
Side-note
The implementation is due a wrapper object that returns a new loop into an annonymous block statement. Be aware that loop.stop() will finalize the loop and no further actions should be called. Overall the code below is just a little help and more a styling choice in my opinion, especially due the fact that it is no real subclass. But I think if someone wants to use the lower api without minding to finalize everything before, here is a possibility.
import asyncio
class OpenLoop:
def close(self,*args, **kwargs):
self._loop.stop()
def _close_wrapper(self):
self._close = self._loop.close
self._loop.close = self.close
def __enter__(self):
self._loop = asyncio.new_event_loop()
self._close_wrapper()
return self._loop
def __exit__(self,*exc_info):
asyncio.run(self._loop.shutdown_asyncgens())
asyncio.run(self._loop.shutdown_default_executor())
#close other services
self._close()
if __name__ == '__main__':
with OpenLoop() as loop:
loop.call_later(1,loop.close)
loop.run_forever()
assert loop.is_closed()
Edit: I am closing this question.
As it turns out, my goal of having parallel HTTP posts is pointless. After implementing it successfully with aiohttp, I run into deadlocks elsewhere in the pipeline.
I will reformulate this and post a single question in a few days.
Problem
I want to have a class that, during some other computation, holds generated data and can write it to a DB via HTTP (details below) when convenient. It's gotta be a class as it is also used to load/represent/manipulate data.
I have written a naive, nonconcurrent implementation that works:
The class is initialized and then used in a "main loop". Data is added to it in this main loop to a naive "Queue" (a list of HTTP requests). At certain intervals in the main loop, the class calls a function to write those requests and clear the "queue".
As you can expect, this is IO bound. Whenever I need to write the "queue", the main loop halts. Furthermore, since the main computation runs on a GPU, the loop is also not really CPU bound.
Essentially, I want to have a queue, and, say, ten workers running in the background and pushing items to the http connector, waiting for the push to finish and then taking on the next (or just waiting for the next write call, not a big deal). In the meantime, my main loop runs and adds to the queue.
Program example
My naive program looks something like this
class data_storage(...):
def add(...):
def write_queue(self):
if len(self.queue) > 0:
res = self.connector.run(self.queue)
self.queue = []
def main_loop(storage):
# do many things
for batch in dataset: #simplified example
# Do stuff
for some_other_loop:
(...)
storage.add(results)
# For example, call each iteration
storage.write_queue()
if __name__ == "__main__":
storage=data_storage()
main_loop(storage)
...
In detail: the connector class is from the package 'neo4j-connector' to post to my Neo4j database. It essentially does JSON formatting and uses the "requests" api from python.
This works, even without a real queue, since nothing is concurrent.
Now I have to make it work concurrently.
From my research, I have seen that ideally I would want a "producer-consumer" pattern, where both are initialized via asyncio. I have only seen this implemented via functions, not classes, so I don't know how to approach this. With functions, my main loop should be a producer coroutine and my write function becomes the consumer. Both are initiated as tasks on the queue and then gathered, where I'd initialize only one producer but many consumers.
My issue is that the main loop includes parts that are already parallel (e.g. PyTorch). Asyncio is not thread safe, so I don't think I can just wrap everything in async decorators and make a co-routine. This is also precisely why I want the DB logic in a separate class.
I also don't actually want or need the main loop to run "concurrently" on the same thread with the workers. But it's fine if that's the outcome as the workers don't do much on the CPU. But technically speaking, I want multi-threading? I have no idea.
My only other option would be to write into the queue until it is "full", halt the loop and then use multiple threads to dump it to the DB. Still, this would be much slower than doing it while the main loop is running. My gain would be minimal, just concurrency while working through the queue. I'd settle for it if need be.
However, from a stackoverflow post, I came up with this small change
class data_storage(...):
def add(...):
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def write_queue(self):
if len(self.queue) > 0:
res = self.connector.run(self.queue)
self.queue = []
Shockingly this sort of "works" and is blazingly fast. Of course since it's not a real queue, things get overwritten. Furthermore, this overwhelms or deadlocks the HTTP API and in general produces a load of errors.
But since this - in principle - works, I wonder if I could do is the following:
class data_storage(...):
def add(...):
def background(f):
def wrapped(*args, **kwargs):
return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)
return wrapped
#background
def post(self, items):
if len(items) > 0:
self.nr_workers.increase()
res = self.connector.run(items)
self.nr_workers.decrease()
def write_queue(self):
if self.nr_workers < 10:
items=self.queue.get(200) # Extract and delete from queue, non-concurrent
self.post(items) # Add "Worker"
for some hypothetical queue and nr_workers objects. Then at the end of the main loop, have a function that blocks progress until number of workers is zero and clears, non-concurrently, the rest of the queue.
This seems like a monumentally bad idea, but I don't know how else to implement this. If this is terrible, I'd like to know before I start doing more work on this. Do you think it would it work?
Otherwise, could you give me any pointers as how to approach this situation correctly?
Some key words, tools or things to research would of course be enough.
Thank you!
Looking for a simple example demonstrating use of tornado.gen.coroutine’s callback argument. The docs say:
Functions with [the gen.coroutine] decorator return a Future. Additionally, they may be called with a callback keyword argument, which will be invoked with the future’s result when it resolves.
Adapting an example from the docs’ User’s guide, I would think I could do:
from tornado import gen
#gen.coroutine
def divide(x, y):
return x / y
#gen.coroutine
def good_call():
yield divide(1, 2)
good_call(callback=print)
I’d expect this to print 0.5, but there’s no output.
I’ve found copious examples demonstrating the deprecated gen.engine decorator, but there doesn’t seem to be as much out there on gen.coroutine. Running on Python 3.5.1 and Tornado 4.3.
You still need to start the IOLoop. If you add tornado.ioloop.IOLoop.current().start() at the end of your script you'll see the output printed (and then the IOLoop runs forever. If you want it to stop, you'll need to do so from your callback after printing).
Note that in general it is possible (and encouraged) to write Tornado applications using only coroutines and yield, without passing any callbacks directly.
I want to write a library that manages child processes with asyncio. I don't want to force my callers to be asynchronous themselves, so I'd prefer to get a new_event_loop, do a run_until_complete, and then close it. Ideally I'd like to do this without conflicting with any other asyncio stuff the caller might be doing.
My problem is that waiting on subprocesses doesn't work unless you call set_event_loop, which attaches the internal watcher. But of course if I do that, I might conflict with other event loops in the caller. A workaround is to cache the caller's current loop (if any), and then call set_event_loop one more time when I'm done to restore the caller's state. That almost works. But if the caller is not an asyncio user, a side effect of calling get_event_loop is that I've now created a global loop that didn't exist before, and Python will print a scary warning if the program exits without calling close on that loop.
The only meta-workaround I can think of is to do an atexit.register callback that closes the global loop. That won't conflict with the caller because close is safe to call more than once, unless the caller has done something crazy like trying to start the global loop during exit. So it's still not perfect.
Is there a perfect solution to this?
What you're trying to achieve looks very much like ProcessPoolExecutor (in concurrent.futures).
Asyncronous caller:
#coroutine
def in_process(callback, *args, executor=ProcessPoolExecutor()):
loop = get_event_loop()
result = yield from loop.run_in_executor(executor, callback, *args)
return result
Synchronous caller:
with ProcessPoolExecutor() as executor:
future = executor.submit(callback, *args)
result = future.result()
What libraries exist that provide a higher level interface to concurrency in Python? I'm not looking to necessarily make use of multiple cores - if the GIL serializes everything, that's fine with me. I just want a smoother/higher level interface to concurrency.
My application is writing tests of our software, and sometimes I'd like to, say, run several simulated users, as well as code that makes changes to the backend, all in parallel. It would be great if, say, I could do something like this:
setup_front_end()
setup_back_end()
parallel:
while some_condition():
os.system('wget http://...')
sleep(1)
for x in xrange(10):
os.system('top >> top.log')
sleep(1)
for x in xrange(100):
mess_with_backend()
The idea in the above code is that we start 3 threads, the first one runs:
while some_condition():
os.system('wget http://...')
sleep(1)
The second one runs the second loop, and the third one runs the third.
I know it won't be that simple, but is there something that can take the drudgery out of writing functions, spinning up threads, joining on them, etc?
Besides the threading module, you should also have a look into the concurrent.futures module (for py2 ther is a backport).
It provides ready to use thread/process pool implementations and allows you to write efficient concurrent code. As it provides an uniform API for threading and multiprocessing it allows you to switch easily between the two, should you ever need to.
If you just want to hide the complexity of creating a Thread and setting the parameters, create a simple decorator and use it for your functions:
async.py:
from threading import Thread
class async(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
Thread(target=self.func, args=args, kwargs=kwargs).start()
This is just a very simple wrapper around your functions which creates a new Thread and executes the function in the Thread. If you want, you can easily adapt this to use processes instead of threads.
main.py:
import time
from async import async
#async
def long_function(name, seconds_to_wait):
print "%s started..." % name
time.sleep(seconds_to_wait)
print "%s ended after %d seconds" % (name, seconds_to_wait)
if __name__ == '__main__':
long_function("first", 3)
long_function("second", 4)
long_function("third", 5)
Just decorate your functions with #async and you can call it asynchronously.
In your case: Just wrap your loops within a function and decorate them with #async.
For a full feature asynchronous decorator look at: http://wiki.python.org/moin/PythonDecoratorLibrary#Asynchronous_Call