I am working on a chatbot, where before I reply to the user I make a DB call to save the chat in a table. This will be done each time user types something, and it increases the response time.
So to decrease the response time, we need to call this asynchronously.
How to do this in Python 3?
I have read tutorials of asyncio library, but did not understand it completely and could not understand how to make it work.
Another workaround is to use queuing system, but that sounds like an overkill.
Example:
request = get_request_from_chat
res = call_some_function_to_prepare_response()
save_data() # this will be call asynchronously
reply() # this should not wait save_data() to finish
Any suggestions are welcome.
Use loop.create_task(some_async_function()) to run an async function "in the background". For example, this answer shows how to do that in case of a trivial client-server communication.
In your case the pseudo-code would look like this:
request = await get_request_from_chat()
res = call_some_function_to_prepare_response()
loop = asyncio.get_event_loop()
loop.create_task(save_data()) # runs in the "background"
reply() # doesn't wait for save_data() to finish
For this to work, of course, the program must be written for asyncio and save_data must be a coroutine. For a chat server it's a good approach to follow anyway, so I would recommend to give asyncio a chance.
Because you mentioned
Another workaround is to use queuing system, but that sounds like an
overkill.
I assume you are open to other solutions so I will propose multi-threading approach:
from concurrent.futures import ThreadPoolExecutor
from time import sleep
def long_runnig_funciton(param1):
print(param1)
sleep(10)
return "Complete"
with ThreadPoolExecutor(max_workers=10) as executor:
future = executor.submit(long_runnig_funciton,["Param1"])
print(future.result(timeout=12))
Steps:
1) You create a ThreadPoolExecutor and define maximum number of concurrent tasks.
2) You submit a function with arguments it needs
3) You call result() on the return value from submit() when you need the results
Note that the result() can throw exception if exception was thrown in the submitted function
You can also check if the result of your call is ready with future.done() which returns True or False
Related
There are many posts on SO asking specific questions about asyncio, but I cannot grasp the right way on what to use for a given situation.
Let's say I want to parse and crawl a number of web pages in parallel. I can do this in at least 3 different ways with asyncio:
with pool.submit:
with ThreadPoolExecutor(max_workers=10) as pool:
result_futures = list(map(lambda x: pool.submit(my_func, x), my_list))
for future in as_completed(result_futures):
results.append(future.result())
return results
With asyncio.gather:
loop = asyncio.get_running_loop()
with ThreadPoolExecutor(max_workers=10) as pool:
futures = [loop.run_in_executor(pool, my_func, x) for x in my_list]
results = await asyncio.gather(*futures)
With just pool.map:
with ThreadPoolExecutor(max_workers=10) as pool:
results = [x for x in pool.map(my_func, arg_list)]
my_func is something like
async def my_func(arg):
async with aiohttp.ClientSession() as session:
async with session.post(...):
...
Could somebody help me understand what would be the differences between those 3 approaches? I understand that I can, for example, handle exceptions independently in the first one, but any other differences?
None of these. ThreadPoolExecutor and run_in_executor will all execute your code in another thread, no matter you use the asyncio loop to watch for their execution. And at that point you might just as well not use asyncio at all: the idea of async is exactly managing to run everything on a single thread - getting some CPU cycles and easing a lot on race-conditions that emerge on multi-threaded code.
If your my_func is using async correctly, all the way (it looks like it is, but the code is incomplete), you have to create an asyncio Task for each call to your "async defined" function. On that, maybe the shortest path is indeed using asyncio.gather:
import asyncio
import aiohttp, ... # things used inside "my_func"
def my_func(x):
...
my_list = ...
results = asyncio.run(asyncio.gather(*(my_func(x) for x in my_list)))
An that is all there is for it.
Now going back to your code, and checking the differences:
your code work almost by chance, as in, you really just passed the async functiona and its parameters to the threadpool executor: on calling any async function in this way, they return imediatelly, with no work done. That means nothing (but some thin boiler plate inner code used to create the co-routines) is executed in your threadpool executors. The values returned by the call that runs in the target threads (i.e. the actual my_func(x) call) are the "co-routines": these are the objects that are to be awaited in the main thread and that will actually performe the network I/O. That is: your "my_func" is a "co-routine function" and when called it retoruns immediately with a "co-routine object". When the co-routine object is awaited the code inside "my_func" is actually executed.
Now, with that out of the way: in your first snippet you call future.result on the concurrent.futures Future: that will jsut give you the co-routine object: that code does not work - if you would write results.append(await future.result()) then, yes, if there are no exceptions in the execution, it would work, but would make all the calls in sequence: "await" stops the execution of the current thread until the awaited object resolves, and since awaiting for the other results would happen in this same code, they will queue and be executed in order, with zero parallelism.
Your pool.map code does the same, and your asyncio.gather code is wrong in a different way: the loop.run_in_executor code will take your call and run it on another thread, and gives you an awaitable object which is suitable to be used with gather. However, awaiting on it will return you the "co-routine object", not the result of the HTTP call.
Your real options regarding getting the exceptions raised in the parallel code are either using asyncio.gather, asyncio.wait or asyncio.as_completed. Check the docs here: https://docs.python.org/3/library/asyncio-task.html
I have a non-async function that looks like this:
def do_stuff(on_finished):
result = # complicated calculations here
on_finished(result)
The callback I pass in looks more or less like this:
async def on_finished(self, result):
response = await post_over_http(result)
self.last_status = response.status
When I call do_stuff, what I want to happen is this:
do_stuff executes and calls on_finished
on_finished executes, posts the result over HTTP, and then returns immediately.
do_stuff now returns immediately.
Later, the HTTP response comes back, and execution returns to the second line of on_finished.
Critically, I don't want do_stuff to be async. For architectural reasons, I want do_stuff isolated from the asynchronous nature of the network code, so I don't want to have to make it async just because some code using it is async.
In JavaScript this would be no problem - with basically the above code directly transcribed to JavaScript, I'll get the desired behavior. onFinished would return a Promise which doStuff doesn't wait for and returns immediately, but when the Promise resolves later the second line of onFinished runs. Is this possible in Python? I'm unsure of how to achieve it. With the above code I think I just create a coroutine in the last line of do_stuff but never call it.
You can design your do_stuff function like this:
def do_stuff(on_finished):
async def _do_complicated_calculation():
result = # do the calculation here & the post request
await on_finished(result)
asyncio.ensure_future(_do_complicated_calculation())
return "ok"
When you call do_stuff(...) the complicated calculation will be added to the asyncio event loop, so it gets executed asynchronous. You should have the event loop running in a different thread if you don't plan to start it in the main thread.
Since _do_complicated_calculation() is async, do_stuff will return "ok" first and after your calculations have finished, on_finished(...) is being called.
If I understand the JavaScript analogy, what you want is something like:
def do_stuff(on_finished):
result = ...
asyncio.create_task(on_finished(result))
The last line spawns a task that processes the result without actually waiting for it to finish. This is what you'd get in JavaScript by simply creating a promise, whereas in Python you have to be a bit more explicit.
Of course, do_stuff must run inside an event loop, and the calculations must not block (or take too long to complete), but the same would be the case in JavaScript.
Rather than pass a callback have the synchronous function return the its result and use run_in_executor to make the call from the asynchronous code to the synchronous code that will be running on its own thread.
def do_stuff():
result = # complicated calculations here
return result
async def main():
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(None, do_stuff)
response = await post_over_http(result)
self.last_status = response.status
Untested.
This is why Javascript behaves differently from Python.
ECMAScript 2015 introduced the concept of the Job Queue, which is used
by Promises (also introduced in ES6/ES2015). It's a way to execute the
result of an async function as soon as possible, rather than being put
at the end of the call stack.
Promises that resolve before the current function ends will be
executed right after the current function.
https://nodejs.dev/the-nodejs-event-loop
I just created a script which triggers a report from specific API and then loads it into my database.
I have already built something that works but I would like to know if there is something a bit more "precise" or efficient without the need of making my script loop over and over again.
My current script is the following:
import time
retry=1
trigger_report(report_id)
while report_id.status() != 'Complete':
time.sleep(retry * 1.3)
retry =+ 1
load_report(report_id)
EDIT:
The API doesn't provide with any wait for completion methods, the most it has is an endpoint which returns the status of the job.
It is a SOAP API.
While this post doesn't hold any relevance anymore as you said, it' a soap API. But I put the work into it, so I'll post it anyway. :)
To answer your question. I don't see any more efficient methods than polling (aka. looping over and over again)
There are multiple ways to do it.
The first way is implementing some sort of callback that is triggered when the task is completed. It will look something like this:
import time
def expensive_operation(callback):
time.sleep(20)
callback(6)
expensive_operation(lambda x:print("Done", x))
As you can see, the Message "Done 6" will be printed as soon as the operation has been completed.
You can rewrite this with Future-objects.
from concurrent.futures import Future
import threading
import time
def expensive_operation_impl():
time.sleep(20)
return 6
def expensive_operation():
fut = Future()
def _op_wrapper():
try:
result = expensive_operation_impl()
except Exception as e:
fut.set_exception(e)
else:
fut.set_result(result)
thr = threading.Thread(target=_op_wrapper)
thr.start()
return fut
future = expensive_operation()
print(future.result()) # Will block until the operation is done.
Since this looks complicated, there are some high-level functions implementing thread scheduling for you.
import concurrent.futures import ThreadPoolExecutor
import time
def expensive_operation():
time.sleep(20)
return 6
executor = ThreadPoolExecutor(1)
future = executor.submit(expensive_operation)
print(future.result())
Rather use events, not polling. There are a lot of options for how to implement events in Python. There was a discussion here already on stack overflow.
Here is a synthetic example uses zope.event and an event handler
import zope.event
import time
def trigger_report(report_id):
#do expensive operation like SOAP call
print('start expensive operation')
time.sleep(5)
print('5 seconds later...')
zope.event.notify('Success') #triggers 'replied' function
def replied(event): #this is the event handler
#event contains the text 'Success'
print(event)
def calling_function():
zope.event.subscribers.append(replied)
trigger_report('1')
But futures as in accepted answer is also neat. Depends on what floats your boat.
In Bash, it is possible to execute a command in the background by appending &. How can I do it in Python?
while True:
data = raw_input('Enter something: ')
requests.post(url, data=data) # Don't wait for it to finish.
print('Sending POST request...') # This should appear immediately.
Here's a hacky way to do it:
try:
requests.get("http://127.0.0.1:8000/test/",timeout=0.0000000001)
except requests.exceptions.ReadTimeout:
pass
Edit: for those of you that observed that this will not await a response - that is my understanding of the question "fire and forget... do not wait for it to finish". There are much more thorough and complete ways to do it with threads or async if you need response context, error handling, etc.
I use multiprocessing.dummy.Pool. I create a singleton thread pool at the module level, and then use pool.apply_async(requests.get, [params]) to launch the task.
This command gives me a future, which I can add to a list with other futures indefinitely until I'd like to collect all or some of the results.
multiprocessing.dummy.Pool is, against all logic and reason, a THREAD pool and not a process pool.
Example (works in both Python 2 and 3, as long as requests is installed):
from multiprocessing.dummy import Pool
import requests
pool = Pool(10) # Creates a pool with ten threads; more threads = more concurrency.
# "pool" is a module attribute; you can be sure there will only
# be one of them in your application
# as modules are cached after initialization.
if __name__ == '__main__':
futures = []
for x in range(10):
futures.append(pool.apply_async(requests.get, ['http://example.com/']))
# futures is now a list of 10 futures.
for future in futures:
print(future.get()) # For each future, wait until the request is
# finished and then print the response object.
The requests will be executed concurrently, so running all ten of these requests should take no longer than the longest one. This strategy will only use one CPU core, but that shouldn't be an issue because almost all of the time will be spent waiting for I/O.
Elegant solution from Andrew Gorcester. In addition, without using futures, it is possible to use the callback and error_callback attributes (see
doc) in order to perform asynchronous processing:
def on_success(r: Response):
if r.status_code == 200:
print(f'Post succeed: {r}')
else:
print(f'Post failed: {r}')
def on_error(ex: Exception):
print(f'Post requests failed: {ex}')
pool.apply_async(requests.post, args=['http://server.host'], kwargs={'json': {'key':'value'},
callback=on_success, error_callback=on_error))
According to the doc, you should move to another library :
Blocking Or Non-Blocking?
With the default Transport Adapter in place, Requests does not provide
any kind of non-blocking IO. The Response.content property will block
until the entire response has been downloaded. If you require more
granularity, the streaming features of the library (see Streaming
Requests) allow you to retrieve smaller quantities of the response at
a time. However, these calls will still block.
If you are concerned about the use of blocking IO, there are lots of
projects out there that combine Requests with one of Python’s
asynchronicity frameworks.
Two excellent examples are
grequests and
requests-futures.
Simplest and Most Pythonic Solution using threading
A Simple way to go ahead and send POST/GET or to execute any other function without waiting for it to finish is using the built-in Python Module threading.
import threading
import requests
def send_req():
requests.get("http://127.0.0.1:8000/test/")
for x in range(100):
threading.Thread(target=send_req).start() # start's a new thread and continues.
Other Important Features of threading
You can turn these threads into daemons using thread_obj.daemon = True
You can go ahead and wait for one to complete executing and then continue using thread_obj.join()
You can check if a thread is alive using thread_obj.is_alive() bool: True/False
You can even check the active thread count as well by threading.active_count()
Official Documentation
If you can write the code to be executed separately in a separate python program, here is a possible solution based on subprocessing.
Otherwise you may find useful this question and related answer: the trick is to use the threading library to start a separate thread that will execute the separated task.
A caveat with both approach could be the number of items (that's to say the number of threads) you have to manage. If the items in parent are too many, you may consider halting every batch of items till at least some threads have finished, but I think this kind of management is non-trivial.
For more sophisticated approach you can use an actor based approach, I have not used this library myself but I think it could help in that case.
from multiprocessing.dummy import Pool
import requests
pool = Pool()
def on_success(r):
print('Post succeed')
def on_error(ex):
print('Post requests failed')
def call_api(url, data, headers):
requests.post(url=url, data=data, headers=headers)
def pool_processing_create(url, data, headers):
pool.apply_async(call_api, args=[url, data, headers],
callback=on_success, error_callback=on_error)
I'm trying to accomplish something without using threading
I'd like to execute a function within a function, but I dont want the first function's flow to stop. Its just a procedure and I don't expect any return and I also need this to keep the execution for some reasons.
Here is a snippet code of what I'd like to do:
function foo():
a = 5
dosomething()
# I dont wan't to wait until dosomething finish. Just call and follow it
return a
Is there any way to do this?
Thanks in advance.
You can use https://docs.python.org/3/library/concurrent.futures.html to achieve fire-and-forget behavior.
import concurrent.futures
def foo():
a = 5
with ThreadPoolExecutor(max_workers=1) as executor:
future = executor.submit(dosomething)
future.add_done_callback(on_something_done)
#print(future.result())
#continue without waiting dosomething()
#future.cancel() #To cancel dosomething
#future.done() #return True if done.
return a
def on_something_done(future):
print(future.result())
[updates]
concurrent.futures is built-in since python 3
for Python 2.x you can download futures 2.1.6 here
Python is synchronous, you'll have to use asynchronous processing to accomplish this.
While there are many many ways that you can execute a function asynchronously, one way is to use python-rq. Python-rq allows you to queue jobs for processing in the background with workers. It is backed by Redis and it is designed to have a low barrier to entry. It should be integrated in your web stack easily.
For example:
from rq import Queue, use_connection
def foo():
use_connection()
q = Queue()
# do some things
a = 5
# now process something else asynchronously
q.enqueue(do_something)
# do more here
return a