I'm trying to convert a low-level library that is currently targeted to be used via asyncio to anyio.
However, I'm having a hard time figuring out the best way to do so, since the library uses
asyncio.Future futures to represent asynchronous interaction with two worker threads.
Since the logic in the threads is much more complicated than what I'm showing here, converting them to async code is not an option for me at this point. It's also not standard network communication, so I cannot just use an existing anyio based library instead.
The only solution I can come up with is using a thread safe result return Queue.queue that gets created with every sent message. SendMsgAsync would create the return queue, and store a copy of the queue and the message in pending_msgs and send the message via the send_queue to the send_thread. Then it would try to get the result from the result queue, async sleeping in between.
Once a reply is received, the recv_thread would put the reply into the result queue belonging to the original message (fetched from pending_msgs), causing SendMsgAsync to finish.
But polling the queue in SendMsgAsync doesn't seem like the right thing to do.
anyio does have anyio.create_memory_object_stream() that seems to be a form of async queue, but the documentation doesn't state whether these streams are thread safe, so I'm doubtful that I can use them between the event loop and my thread.
With futures this would be much more elegant.
I was also wondering whether I could use concurrent.futures, but I could not find any examples where those can be used with anyio after manually creating them. It seems anyio can return and check them, but apparently only when they are bound to a started task. But since I do not need a new task running in the event loop (just a pseudo-task, the result of which is monitored) I don't know how to elegantly solve this. In a nutshell, a way to make anyio async await a concurrent.futures object I created myself would solve my issue, but I have the feeling this is not compatible with the anyio paradigm of doing async.
Any ideas how to interface this code with anyio are highly appreciated.
Here is a simplification of the code I have:
import asyncio
import queue
from functools import partial
import threading
send_queue:queue.Queue = queue.Queue(10) ## used to send messages to send_thread_fun
pending_msgs:dict = dict() ## stored messages waiting for replies
## message classes
class msg_class:
def __init__(self, uuid) -> None:
self.uuid:str = uuid
class reply_class(msg_class):
def __init__(self, uuid, success:bool) -> None:
super().__init__(uuid)
self.success = success
## container class for stored messages
class stored_msg_class:
def __init__(self, a_msg:msg_class, future:asyncio.Future) -> None:
self.msg = a_msg
self.future = future
## async send function as interface to outside async world
async def SendMsgAsyncAndGetReply(themsg:msg_class, loop:asyncio.AbstractEventLoop):
afuture:asyncio.Future = SendMsg(themsg, loop)
return await afuture
## this send function is only called internally
def SendMsg(themsg:msg_class, loop:asyncio.AbstractEventLoop):
msg_future = loop.create_future()
msg_future.add_done_callback(lambda fut: partial(RemoveMsg_WhenFutureDone, uuid=themsg.uuid) ) ## add a callback, so that the command is removed from the pending list if the future is cancelled externally. This is also called when the future completes, so it must not have negative effects then either
pending_asyncmsg = stored_msg_class(themsg, msg_future)
pending_msgs[themsg.uuid] = pending_asyncmsg
return pending_asyncmsg.future
## Message status updates
def CompleteMsg(pendingmsg:stored_msg_class, result:any) -> bool:
future = pendingmsg.future
hdl:asyncio.Handle = future.get_loop().call_soon_threadsafe(future.set_result, result)
def FailMsg(pendingmsg:stored_msg_class, exception:Exception):
future = pendingmsg.future
hdl:asyncio.Handle = future.get_loop().call_soon_threadsafe(future.set_exception, exception)
def CancelMsg(pendingmsg:stored_msg_class):
future = pendingmsg.future
hdl:asyncio.Handle = future.get_loop().call_soon_threadsafe(future.cancel)
def RemoveMsg_WhenFutureDone(future:asyncio.Future, uuid):
## called by future callback once a future representing a pending msg is cancelled and if a result or an exception is set
s_msg:stored_msg_class = pending_msgs.pop(uuid, None)
## the thread functions:
def send_thread_fun():
while (True):
a_msg:msg_class = send_queue.get()
send(a_msg)
## ...
def recv_thread_fun():
while(True):
a_reply:reply_class = receive()
pending_msg:stored_msg_class = pending_msgs.pop(a_reply.uuid, None)
if (pending_msg is not None):
if a_reply.success:
CompleteMsg(pending_msg, a_reply)
else:
FailMsg(pending_msg, Exception(a_reply))
## ...
## low level functions
def send(a_msg:msg_class):
hardware_send(msg_class)
def receive() -> msg_class:
return hardware_recv()
## using the async message interface:
def main():
tx_thread = threading.Thread(target=send_thread_fun, name="send_thread", daemon=True)
rx_thread = threading.Thread(target=recv_thread_fun, name="recv_thread", daemon=True)
rx_thread.start()
tx_thread.start()
try:
loop = asyncio.get_running_loop()
except RuntimeError as ex:
loop = asyncio.new_event_loop()
msg1 = msg_class("123")
msg2 = msg_class("456")
m1 = SendMsgAsyncAndGetReply(msg1, loop)
m2 = SendMsgAsyncAndGetReply(msg2, loop)
r12 = asyncio.get_event_loop().run_until_complete(asyncio.gather(m1, m2))
Related
I am trying to set up a FastAPI server that will take as input some biological data, and run some processing on them. Since the processing takes up all the server's resources, queries should be processed sequentially. However, the server should stay responsive and add further requests in a buffer. I've been trying to use the BackgroundTasks module for this, but after sending the second query, the response gets delayed while the task is running. Any help appreciated, and thanks in advance.
import os
import sys
import time
from dataclasses import dataclass
from fastapi import FastAPI, Request, BackgroundTasks
EXPERIMENTS_BASE_DIR = "/experiments/"
QUERY_BUFFER = {}
app = FastAPI()
#dataclass
class Query():
query_name: str
query_sequence: str
experiment_id: str = None
status: str = "pending"
def __post_init__(self):
self.experiment_id = str(time.time())
self.experiment_dir = os.path.join(EXPERIMENTS_BASE_DIR, self.experiment_id)
os.makedirs(self.experiment_dir, exist_ok=False)
def run(self):
self.status = "running"
# perform some long task using the query sequence and get a return code #
self.status = "finished"
return 0 # or another code depending on the final output
#app.post("/")
async def root(request: Request, background_tasks: BackgroundTasks):
query_data = await request.body()
query_data = query_data.decode("utf-8")
query_data = dict(str(x).split("=") for x in query_data.split("&"))
query = Query(**query_data)
QUERY_BUFFER[query.experiment_id] = query
background_tasks.add_task(process, query)
return {"Query created": query, "Query ID": query.experiment_id, "Backlog Length": len(QUERY_BUFFER)}
async def process(query):
""" Process query and generate data"""
ret_code = await query.run()
del QUERY_BUFFER[query.experiment_id]
print(f'Query {query.experiment_id} processing finished with return code {ret_code}.')
#app.get("/backlog/")
def return_backlog():
return {f"Currently {len(QUERY_BUFFER)} jobs in the backlog."}
EDIT:
The original answer was influenced by testing with httpx.AsyncClient (as flagged might be the case in the original caveat). The test client causes background tasks to block that do not block without the test client. As such, there's a simpler solution provided you don't want to test it with httpx.AsyncClient. The new solution uses uvicorn and then I tested this manually with Postman instead.
This solution uses a function as the background task (process) so that it runs outside the main thread. It then schedules a job to run aprocess which will run in the main thread when the event loop gets a chance. The aprocess coroutine is able to then await the run coroutine of your Query as before.
Additionally, I've added a time.sleep(10) to the process function to illustrate that even long running non-IO tasks will not prevent your original HTTP session from sending a response back to the client (although this will only work if it is something that releases the GIL. If it's CPU bound though you might want a separate process altogether by using multiprocessing or a separate service). Finally, I've replaced the prints with logging so that they work along with the uvicorn logging.
import asyncio
import os
import sys
import time
from dataclasses import dataclass
from fastapi import FastAPI, Request, BackgroundTasks
import logging
logging.basicConfig(level=logging.INFO, format="%(levelname)-9s %(asctime)s - %(name)s - %(message)s")
LOGGER = logging.getLogger(__name__)
EXPERIMENTS_BASE_DIR = "/experiments/"
QUERY_BUFFER = {}
app = FastAPI()
loop = asyncio.get_event_loop()
#dataclass
class Query():
query_name: str
query_sequence: str
experiment_id: str = None
status: str = "pending"
def __post_init__(self):
self.experiment_id = str(time.time())
self.experiment_dir = os.path.join(EXPERIMENTS_BASE_DIR, self.experiment_id)
# os.makedirs(self.experiment_dir, exist_ok=False) # Commented out for testing
async def run(self):
self.status = "running"
await asyncio.sleep(5) # simulate long running query
# perform some long task using the query sequence and get a return code #
self.status = "finished"
return 0 # or another code depending on the final output
#app.post("/")
async def root(request: Request, background_tasks: BackgroundTasks):
query_data = await request.body()
query_data = query_data.decode("utf-8")
query_data = dict(str(x).split("=") for x in query_data.split("&"))
query = Query(**query_data)
QUERY_BUFFER[query.experiment_id] = query
background_tasks.add_task(process, query)
LOGGER.info(f'root - added task')
return {"Query created": query, "Query ID": query.experiment_id, "Backlog Length": len(QUERY_BUFFER)}
def process(query):
""" Schedule processing of query, and then run some long running non-IO job without blocking the app"""
asyncio.run_coroutine_threadsafe(aprocess(query), loop)
LOGGER.info(f"process - {query.experiment_id} - Submitted query job. Now run non-IO work for 10 seconds...")
time.sleep(10) # simulate long running non-IO work, does not block app as this is in another thread - provided it is not cpu bound.
LOGGER.info(f'process - {query.experiment_id} - wake up!')
async def aprocess(query):
""" Process query and generate data """
ret_code = await query.run()
del QUERY_BUFFER[query.experiment_id]
LOGGER.info(f'aprocess - Query {query.experiment_id} processing finished with return code {ret_code}.')
#app.get("/backlog/")
def return_backlog():
return {f"return_backlog - Currently {len(QUERY_BUFFER)} jobs in the backlog."}
if __name__ == "__main__":
import uvicorn
uvicorn.run("scratch_26:app", host="127.0.0.1", port=8000)
ORIGINAL ANSWER:
*A caveat on this answer - I've tried testing this with `httpx.AsyncClient`, which might account for different behavior compared to deploying behind guvicorn.*
From what I can tell (and I am very open to correction on this), BackgroundTasks actually need to complete prior to an HTTP response being sent. This is not what the Starlette docs or the FastAPI docs say, but it appears to be the case, at least while using the httpx AsyncClient.
Whether you add a a coroutine (which is executed in the main thread) or a function (which gets executed in it's own side thread) that HTTP response is blocked from being sent until the background task is complete.
If you want to await a long running (asyncio friendly) task, you can get around this problem by using a wrapper function. The wrapper function adds the real task (a coroutine, since it will be using await) to the event loop and then returns. Since this is very fast, the fact that it "blocks" no longer matters (assuming a few milliseconds doesn't matter).
The real task then gets executed in turn (but after the initial HTTP response has been sent), and although it's on the main thread, the asyncio part of the function will not block.
You could try this:
#app.post("/")
async def root(request: Request, background_tasks: BackgroundTasks):
...
background_tasks.add_task(process_wrapper, query)
...
async def process_wrapper(query):
loop = asyncio.get_event_loop()
loop.create_task(process(query))
async def process(query):
""" Process query and generate data"""
ret_code = await query.run()
del QUERY_BUFFER[query.experiment_id]
print(f'Query {query.experiment_id} processing finished with return code {ret_code}.')
Note also that you'll also need to make your run() function a coroutine by adding the async keyword since you're expecting to await it from your process() function.
Here's a full working example that uses httpx.AsyncClient to test it. I've added the fmt_duration helper function to show the lapsed time for illustrative purposes. I've also commented out the code that creates directories, and simulated a 2 second query duration in the run() function.
import asyncio
import os
import sys
import time
from dataclasses import dataclass
from fastapi import FastAPI, Request, BackgroundTasks
from httpx import AsyncClient
EXPERIMENTS_BASE_DIR = "/experiments/"
QUERY_BUFFER = {}
app = FastAPI()
start_ts = time.time()
#dataclass
class Query():
query_name: str
query_sequence: str
experiment_id: str = None
status: str = "pending"
def __post_init__(self):
self.experiment_id = str(time.time())
self.experiment_dir = os.path.join(EXPERIMENTS_BASE_DIR, self.experiment_id)
# os.makedirs(self.experiment_dir, exist_ok=False) # Commented out for testing
async def run(self):
self.status = "running"
await asyncio.sleep(2) # simulate long running query
# perform some long task using the query sequence and get a return code #
self.status = "finished"
return 0 # or another code depending on the final output
#app.post("/")
async def root(request: Request, background_tasks: BackgroundTasks):
query_data = await request.body()
query_data = query_data.decode("utf-8")
query_data = dict(str(x).split("=") for x in query_data.split("&"))
query = Query(**query_data)
QUERY_BUFFER[query.experiment_id] = query
background_tasks.add_task(process_wrapper, query)
print(f'{fmt_duration()} - root - added task')
return {"Query created": query, "Query ID": query.experiment_id, "Backlog Length": len(QUERY_BUFFER)}
async def process_wrapper(query):
loop = asyncio.get_event_loop()
loop.create_task(process(query))
async def process(query):
""" Process query and generate data"""
ret_code = await query.run()
del QUERY_BUFFER[query.experiment_id]
print(f'{fmt_duration()} - process - Query {query.experiment_id} processing finished with return code {ret_code}.')
#app.get("/backlog/")
def return_backlog():
return {f"{fmt_duration()} - return_backlog - Currently {len(QUERY_BUFFER)} jobs in the backlog."}
async def test_me():
async with AsyncClient(app=app, base_url="http://example") as ac:
res = await ac.post("/", content="query_name=foo&query_sequence=42")
print(f"{fmt_duration()} - [{res.status_code}] - {res.content.decode('utf8')}")
res = await ac.post("/", content="query_name=bar&query_sequence=43")
print(f"{fmt_duration()} - [{res.status_code}] - {res.content.decode('utf8')}")
content = ""
while not content.endswith('0 jobs in the backlog."]'):
await asyncio.sleep(1)
backlog_results = await ac.get("/backlog")
content = backlog_results.content.decode("utf8")
print(f"{fmt_duration()} - test_me - content: {content}")
def fmt_duration():
return f"Progress time: {time.time() - start_ts:.3f}s"
loop = asyncio.get_event_loop()
print(f'starting loop...')
loop.run_until_complete(test_me())
duration = time.time() - start_ts
print(f'Finished. Duration: {duration:.3f} seconds.')
in my local environment if I run the above I get this output:
starting loop...
Progress time: 0.005s - root - added task
Progress time: 0.006s - [200] - {"Query created":{"query_name":"foo","query_sequence":"42","experiment_id":"1627489235.9300923","status":"pending","experiment_dir":"/experiments/1627489235.9300923"},"Query ID":"1627489235.9300923","Backlog Length":1}
Progress time: 0.007s - root - added task
Progress time: 0.009s - [200] - {"Query created":{"query_name":"bar","query_sequence":"43","experiment_id":"1627489235.932097","status":"pending","experiment_dir":"/experiments/1627489235.932097"},"Query ID":"1627489235.932097","Backlog Length":2}
Progress time: 1.016s - test_me - content: ["Progress time: 1.015s - return_backlog - Currently 2 jobs in the backlog."]
Progress time: 2.008s - process - Query 1627489235.9300923 processing finished with return code 0.
Progress time: 2.008s - process - Query 1627489235.932097 processing finished with return code 0.
Progress time: 2.041s - test_me - content: ["Progress time: 2.041s - return_backlog - Currently 0 jobs in the backlog."]
Finished. Duration: 2.041 seconds.
I also tried making process_wrapper a function so that Starlette executes it in a new thread. This works the same way, just use run_coroutine_threadsafe instead of create_task i.e.
def process_wrapper(query):
loop = asyncio.get_event_loop()
asyncio.run_coroutine_threadsafe(process(query), loop)
If there is some other way to get a background task to run without blocking the HTTP response I'd love to find out how, but absent that this wrapper solution should work.
I think your issue is in the task you want to run, not in the BackgroundTask itself.
FastAPI (and underlying Starlette, which is responsible for running the background tasks) is created on top of the asyncio and handles all requests asynchronously. That means, if one request is being processed, if there is any IO operation while processing the current request, and that IO operation supports the asynchronous approach, FastAPI will switch to the next request in queue while this IO operation is pending.
Same goes for any background tasks added to the queue. If background task is pending, any requests or other background tasks will be handled only when FastAPI is waiting for any IO operation.
As you may see, this is not ideal when either your view or task doesn't have any IO operations or they cannot be run asynchronously. There is a workaround for that situation:
declare your views or tasks as normal, non asynchronous functions
Starlette will then run those views in a separate thread, outside of the main async loop, so other requests can be handled at the same time
manually run the part of your logic that may block the
processing of other requests using asgiref.sync_to_async
This will also cause this logic to be executed in a separate thread, releasing the main async loop to take care of other requests until the function returns.
If you are not doing any asynchronous IO operations in your long-running task, the first approach will be most suitable for you. Otherwise, you should take any part of your code that is either long-running or performs any non-asynchronous IO operations and wrap it with sync_to_async.
After years of NodeJS dev, I decided to give Python a shot. So far so good, but I just ran into a wall that I would really like some help with.
I am working on a library that communicates with a remote machine using MQTT. When invoking a function on that library, a message is posted for processing on that remote machine. Once the processing is done, it posts a new message on the bus that my library picks up on and returns the result back to the calling code (the code that invoked the library function).
In Javascript, this is done by returning a Promise, that has a resolve & reject function, that can be stored within the library until the remote message comes back through the broker with the result (intercepted in a different function elsewhere in the library), at which point I can simply invoke the 'resolve' function stored previously to return control to the calling code (the code that invoked the async function of my library). This library function would simply be invoked using the async keyword.
Now in Python, async/await does not use resolve and reject functions that can conveniently be stored away for later, so the logic must be implemented differently I suppose. Using a simple callback function rather than an async/await workflow works, but makes in inconvenient when invoked multiple times in sequence for similar back and forth communications, given that each result handling callback is a separate function.
Here is a basic example of what this would look like in Javascript (for illustration only):
let TASKS = {};
....
mqttClient.on('message', (topic, message) => {
if (topic == "RESULT_OK/123") {
TASKS["123"].resolve(message);
} else if (topic == "RESULT_KO/123") {
TASKS["123"].reject(message);
}
});
...
let myAsyncLibraryFunction = (someCommand) => {
return new Promise((res, rej) => {
TASKS["123"] = {
resolve: res,
reject: rej
};
mqttClient.publish("REQUEST/123", someCommand);
});
}
To call this, I would simply have to do:
try{
let response1 = await myAsyncLibraryFunction("do this");
let response2 = await myAsyncLibraryFunction("now do that");
...
} catch(e) {
...
}
NodeJS is an event loop based language, that's why this is very appropriate for those types of use cases. But this type of application logic is common when dealing with message-based disparate backends, so I am sure there are good ways of solving this in Python as well.
This is a test Python code snippet that I am working on, that attempts to use a future object to achieve something similar:
import paho.mqtt.client as mqtt
import asyncio
import threading
# Init a new asyncio event loop
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
# define a global future placeholder
_future = None
# Create MQTT client
mqttClient = mqtt.Client()
# MQTT Event - on connect
def on_connect(client, userdata, flags, rc):
print("Connected")
client.subscribe("YAVA/#")
# We start a new thread to test our workflow.
#
# If I had done this on the current thread, then the MQTT event loop
# would get stuck (does not process incoming and outgoing messages anymore) when
# calling "await" on the future object later on.
taskThread = threading.Thread(target=_simulateClient, args=())
taskThread.start()
# MQTT Event - on incoming message
def on_message(client, userdata, msg):
global _future
if msg.topic.startswith("YAVA/API/TASK_DONE/") == True:
payload = str(msg.payload.decode("utf-8", "ignore"))
# Resolve the future object
_future.set_result(payload)
mqttClient.on_connect = on_connect
mqttClient.on_message = on_message
# Use asyncio to call a async function and test the workflow
def _simulateClient():
asyncio.run(performAsyncTask())
# This async function will ask for a task to be performed on a remote machine,
# and wait for the response to be sent back
async def performAsyncTask():
result = await pubAndWhaitForResponse("YAVA/API/TASK_START", "")
print(result)
# perform the actual MQTT async logic
async def pubAndWhaitForResponse(topic, message):
# Create a future object that can be resolved in the MQTT event "on_message"
global _future
_future = asyncio.get_running_loop().create_future()
# Publish message that will start the task execution remotely somewhere
global mqttClient
mqttClient.publish(topic, message)
# Now block the thread until the future get's resolved
result = await _future
# Return the result
return result
# Start the broker and loop^forever in the main thread
mqttClient.connect("192.168.1.70", 1883, 60)
# The MQTT library will start a new thread that will continuously
# process outgoing and incoming messages through that separate thread.
# The main thread will be blocked so that the program does not exit
mqttClient.loop_forever()
It all runs fine, but the _future.set_result(payload) line does not seem to resolve the future. I never see the result printed.
It feels like there is not much missing to get this sorted. Any suggestions would be great.
Thanks
I think we are using the asyncio library the bad way, mixing it with multi-process/multi-threading parallelism.
Here is an implementation based on the multiprocessing module. When submitting a task for your remote, your library can return a Queue that the caller can use with the get() method: it return the value if available, else it suspend the thread, waiting for the value. Hence, the Queue acts as a Scala's Future or a JS Promise.
import multiprocessing
import time
from concurrent.futures.thread import ThreadPoolExecutor
import paho.mqtt.client as mqtt
import logging
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)s ## %(thread)d ## %(funcName)s ## %(message)s")
def remote_on_connect(client, *_):
logging.info("Connected")
client.subscribe("YAVA/API/TASK_START")
def remote_on_message(client, _, _1):
logging.info("Remotely processing your data")
time.sleep(1)
logging.info("Publishing result")
client.publish("YAVA/API/TASK_DONE", 42)
class Lib:
def __init__(self):
self.client = mqtt.Client()
self.executor = ThreadPoolExecutor(max_workers=1)
self.client.on_connect = Lib.on_connect
self.client.connect("test.mosquitto.org")
self.client.loop_start()
def stop(self):
self.client.loop_stop()
def execute(self):
cb, queue = self.get_cb()
self.client.on_message = cb
self.client.publish("YAVA/API/TASK_START", "foo")
return queue
#staticmethod
def on_connect(client, *_):
logging.info("Connected")
client.subscribe("YAVA/API/TASK_DONE")
def get_cb(self):
queue = multiprocessing.Queue(maxsize=1)
def cb(_0, _1, msg):
self.client.on_message = None
logging.info("Fetching back the result")
logging.info(str(msg.payload.decode("utf-8", "ignore")))
queue.put(42)
logging.info("Queue filled")
return cb, queue
def main():
remote_client = mqtt.Client()
remote_client.on_connect = remote_on_connect
remote_client.on_message = remote_on_message
remote_client.connect("test.mosquitto.org")
remote_client.loop_start()
lib = Lib()
future = lib.execute()
logging.info("Result is:")
logging.info(future.get())
remote_client.loop_stop()
lib.stop()
logging.info("Exiting")
if __name__ == '__main__':
main()
2019-11-19 15:08:34,433 ## 139852611577600 ## remote_on_connect ## Connected
2019-11-19 15:08:34,450 ## 139852603184896 ## on_connect ## Connected
2019-11-19 15:08:34,452 ## 139852632065728 ## main ## Result is:
2019-11-19 15:08:34,467 ## 139852611577600 ## remote_on_message ## Remotely processing your data
2019-11-19 15:08:35,469 ## 139852611577600 ## remote_on_message ## Publishing result
2019-11-19 15:08:35,479 ## 139852603184896 ## cb ## Fetching back the result
2019-11-19 15:08:35,479 ## 139852603184896 ## cb ## 42
2019-11-19 15:08:35,480 ## 139852603184896 ## cb ## Queue filled
2019-11-19 15:08:35,480 ## 139852632065728 ## main ## 42
2019-11-19 15:08:36,481 ## 139852632065728 ## main ## Exiting
As you can see in the output, the main method execute up to the future.get method (as show by the Result is: line early in the log). Then, processing happen in another thread, until putting a value inside the shared Queue. Now the future.get returns (because the value is available) and the main method proceed to the end.
Hope this can help you to achieve what you want, but any insights about better ways to achieve this, either with asyncio or with smaller data structure than Queue, are welcome.
First thing I can see is that you publish in the YAVA/API/TASK_START topic while checking that the topic is YAVA/API/TASK_DONE/ in your on_message callback. Hence, your _future never gets a result and the await _future never returns...
I advice you to add log. Add these lines at the start of your code:
import logging
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)s ## %(thread)d ## %(funcName)s ## %(message)s")
Then use logging.info(...) to trace your execution order.
I added some to your code (as well as changing the condition in on_message) and here is the output.
2019-11-19 11:37:10,440 ## 140178907485888 ## __init__ ## Using selector: EpollSelector
2019-11-19 11:37:10,478 ## 140178907485888 ## on_connect ## Connected
2019-11-19 11:37:10,478 ## 140178887976704 ## _simulateClient ## Enter simulate client
2019-11-19 11:37:10,479 ## 140178887976704 ## __init__ ## Using selector: EpollSelector
2019-11-19 11:37:10,480 ## 140178887976704 ## performAsyncTask ## Perform async task
2019-11-19 11:37:10,480 ## 140178887976704 ## pubAndWhaitForResponse ## Pub and wait
2019-11-19 11:37:10,481 ## 140178887976704 ## pubAndWhaitForResponse ## Publish
2019-11-19 11:37:10,481 ## 140178887976704 ## pubAndWhaitForResponse ## Await future: <Future pending created at /usr/lib/python3.7/asyncio/base_events.py:391>
2019-11-19 11:37:10,499 ## 140178907485888 ## on_message ## New message
2019-11-19 11:37:10,499 ## 140178907485888 ## on_message ## Topic: YAVA/API/TASK_DONE
2019-11-19 11:37:10,499 ## 140178907485888 ## on_message ## Filling future: <Future pending cb=[<TaskWakeupMethWrapper object at 0x7f7df0f5fd10>()] created at /usr/lib/python3.7/asyncio/base_events.py:391>
I also added a log after the _future.set_result(payload) line, but it never appears. So the set_result seems to hang or something like that...
You probably have to dig inside it to know why/where it hangs.
Edit
By the way, you are mixing many concepts: asyncio, threading, and mqtt (with its own loop).
Moreover, the asyncio.Future is not thread-safe, I think it's dangerous to use it as you do. While using debugger, to go inside the set_result method, I encounter an exception in the mqtt client class:
Non-thread-safe operation invoked on an event loop other than the current one
It is never reported on stdout/stderr, but you can maybe catch it in the on_log callback of your client.
Edit 2
Here is a more Pythonic example of your code. In this one, the set_result does not hang (the log just after is displayed) but it is the await in the main.
import asyncio
import time
import paho.mqtt.client as mqtt
import logging
logging.basicConfig(level=logging.DEBUG,
format="%(asctime)s ## %(thread)d ## %(funcName)s ## %(message)s")
def remote_on_connect(client, *_):
logging.info("Connected")
client.subscribe("YAVA/API/TASK_START")
def remote_on_message(client, _, _1):
logging.info("Remotely processing your data")
time.sleep(1)
logging.info("Publishing result")
client.publish("YAVA/API/TASK_DONE", 42)
class Lib:
def __init__(self):
self.client = mqtt.Client()
self.client.on_connect = Lib.on_connect
self.client.on_log = lambda x: logging.info("Log: %s", x)
self.client.connect("test.mosquitto.org")
self.client.loop_start()
def stop(self):
self.client.loop_stop()
def execute(self):
self.client.publish("YAVA/API/TASK_START", "foo")
cb, fut = Lib.get_cb()
self.client.on_message = cb
return fut
#staticmethod
def on_connect(client, *_):
logging.info("Connected")
client.subscribe("YAVA/API/TASK_DONE")
#staticmethod
def get_cb():
fut = asyncio.get_event_loop().create_future()
def cb(_0, _1, msg):
logging.info("Fetching back the result")
logging.info(str(msg.payload.decode("utf-8", "ignore")))
fut.set_result(42)
logging.info("Future updated")
return cb, fut
async def main():
remote_client = mqtt.Client()
remote_client.on_connect = remote_on_connect
remote_client.on_message = remote_on_message
remote_client.connect("test.mosquitto.org")
remote_client.loop_start()
lib = Lib()
future = lib.execute()
logging.info("Result is:")
await future
logging.info(future.result())
remote_client.loop_stop()
lib.stop()
logging.info("Exiting")
if __name__ == '__main__':
asyncio.run(main())
I have a tornado application which needs to run a blocking function on ProcessPoolExecutor. This blocking function employs a library which emits incremental results via blinker events. I'd like to collect these events and send them back to my tornado app as they occur.
At first, tornado seemed ideal for this use case because its asynchronous. I thought I could simply pass a tornado.queues.Queue object to the function to be run on the pool and then put() events onto this queue as part of my blinker event callback.
However, reading the docs of tornado.queues.Queue, I learned they are not managed across processes like multiprocessing.Queue and are not thread safe.
Is there a way to retrieve these events from the pool as they occur? Should I wrap multiprocessing.Queue so it produces Futures? That seems unlikely to work as I doubt the internals of multiprocessing are compatible with tornado.
[EDIT]
There are some good clues here: https://gist.github.com/hoffrocket/8050711
To collect anything but the return value of a task passed to a ProcessPoolExecutor, you must use a multiprocessing.Queue (or other object from the multiprocessing library). Then, since multiprocessing.Queue only exposes a synchronous interface, you must use another thread in the parent process to read from the queue (without reaching into implementation details. There's a file descriptor that could be used here, but we'll ignore that for now since it's undocumented and subject to change).
Here's a quick untested example:
queue = multiprocessing.Queue()
proc_pool = concurrent.futures.ProcessPoolExecutor()
thread_pool = concurrent.futures.ThreadPoolExecutor()
async def read_events():
while True:
event = await thread_pool.submit(queue.get)
print(event)
async def foo():
IOLoop.current.spawn_callback(read_events)
await proc_pool.submit(do_something_and_write_to_queue)
You can do it more simply than that. Here's a coroutine that submits four slow function calls to subprocesses and awaits them:
from concurrent.futures import ProcessPoolExecutor
from time import sleep
from tornado import gen, ioloop
pool = ProcessPoolExecutor()
def calculate_slowly(x):
sleep(x)
return x
async def parallel_tasks():
# Create futures in a randomized order.
futures = [gen.convert_yielded(pool.submit(calculate_slowly, i))
for i in [1, 3, 2, 4]]
wait_iterator = gen.WaitIterator(*futures)
while not wait_iterator.done():
try:
result = await wait_iterator.next()
except Exception as e:
print("Error {} from {}".format(e, wait_iterator.current_future))
else:
print("Result {} received from future number {}".format(
result, wait_iterator.current_index))
ioloop.IOLoop.current().run_sync(parallel_tasks)
It outputs:
Result 1 received from future number 0
Result 2 received from future number 2
Result 3 received from future number 1
Result 4 received from future number 3
You can see that the coroutine receives results in the order they complete, not the order they were submitted: future number 1 resolves after future number 2, because future number 1 slept longer. convert_yielded transforms the Futures returned by ProcessPoolExecutor into Tornado-compatible Futures that can be awaited in a coroutine.
Each future resolves to the value returned by calculate_slowly: in this case it's the same number that was passed into calculate_slowly, and the same number of seconds as calculate_slowly sleeps.
To include this in a RequestHandler, try something like this:
class MainHandler(web.RequestHandler):
async def get(self):
self.write("Starting....\n")
self.flush()
futures = [gen.convert_yielded(pool.submit(calculate_slowly, i))
for i in [1, 3, 2, 4]]
wait_iterator = gen.WaitIterator(*futures)
while not wait_iterator.done():
result = await wait_iterator.next()
self.write("Result {} received from future number {}\n".format(
result, wait_iterator.current_index))
self.flush()
if __name__ == "__main__":
application = web.Application([
(r"/", MainHandler),
])
application.listen(8888)
ioloop.IOLoop.instance().start()
You can observe if you curl localhost:8888 that the server responds incrementally to the client request.
I've been trying to make a bot in Slack that remains responsive even if it hasn't finished processing earlier commands, so it could go and do something that takes some time without locking up. It should return whatever is finished first.
I think I'm getting part of the way there: it now doesn't ignore stuff that's typed in before an earlier command is finished running. But it still doesn't allow threads to "overtake" each other - a command called first will return first, even if it takes much longer to complete.
import asyncio
from slackclient import SlackClient
import time, datetime as dt
token = "my token"
sc = SlackClient(token)
#asyncio.coroutine
def sayHello(waitPeriod = 5):
yield from asyncio.sleep(waitPeriod)
msg = 'Hello! I waited {} seconds.'.format(waitPeriod)
return msg
#asyncio.coroutine
def listen():
yield from asyncio.sleep(1)
x = sc.rtm_connect()
info = sc.rtm_read()
if len(info) == 1:
if r'/hello' in info[0]['text']:
print(info)
try:
waitPeriod = int(info[0]['text'][6:])
except:
print('Can not read a time period. Using 5 seconds.')
waitPeriod = 5
msg = yield from sayHello(waitPeriod = waitPeriod)
print(msg)
chan = info[0]['channel']
sc.rtm_send_message(chan, msg)
asyncio.async(listen())
def main():
print('here we go')
loop = asyncio.get_event_loop()
asyncio.async(listen())
loop.run_forever()
if __name__ == '__main__':
main()
When I type /hello 12 and /hello 2 into the Slack chat window, the bot does respond to both commands now. However it doesn't process the /hello 2 command until it's finished doing the /hello 12 command. My understanding of asyncio is a work in progress, so it's quite possible I'm making a very basic error. I was told in a previous question that things like sc.rtm_read() are blocking functions. Is that the root of my problem?
Thanks a lot,
Alex
What is happening is your listen() coroutine is blocking at the yield from sayHello() statement. Only once sayHello() completes will listen() be able to continue on its merry way. The crux is that the yield from statement (or await from Python 3.5+) is blocking. It chains the two coroutines together and the 'parent' coroutine can't complete until the linked 'child' coroutine completes. (However, 'neighbouring' coroutines that aren't part of the same linked chain are free to proceed in the meantime).
The simple way to release sayHello() without holding up listen() in this case is to use listen() as a dedicated listening coroutine and to offload all subsequent actions into their own Task wrappers instead, thus not hindering listen() from responding to subsequent incoming messages. Something along these lines.
#asyncio.coroutine
def sayHello(waitPeriod, sc, chan):
yield from asyncio.sleep(waitPeriod)
msg = 'Hello! I waited {} seconds.'.format(waitPeriod)
print(msg)
sc.rtm_send_message(chan, msg)
#asyncio.coroutine
def listen():
# connect once only if possible:
x = sc.rtm_connect()
# use a While True block instead of repeatedly calling a new Task at the end
while True:
yield from asyncio.sleep(0) # use 0 unless you need to wait a full second?
#x = sc.rtm_connect() # probably not necessary to reconnect each loop?
info = sc.rtm_read()
if len(info) == 1:
if r'/hello' in info[0]['text']:
print(info)
try:
waitPeriod = int(info[0]['text'][6:])
except:
print('Can not read a time period. Using 5 seconds.')
waitPeriod = 5
chan = info[0]['channel']
asyncio.async(sayHello(waitPeriod, sc, chan))
I'm trying to convert a simple syncronous server to an asyncronous version, the server receives post requestes and it retrieves the response from an external web service (amazon sqs). Here's the syncronous code
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
#....other stuff
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
self._sqs_send_queue.write(msg)
Reading Tornado documentation and some threads here I ended with this code using coroutines:
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
queue.write(msg)
Comparing the performances using siege I get that the second version is even worse than the original one, so probably there's something about coroutines and Torndado asyncronous programming that I didn't understand at all.
Could you please help me with this?
Edit: self._sqs_send_queue it's a queue object retrieved from boto interface and queue.write(msg) returns the message that has been written on the queue
tornado relies on you converting all your I/O to be non-blocking. Simply sticking the same code you were using before inside of a gen.Task will not improve performance at all, because the I/O itself is still going to block the event loop. Additionally, you need to make your post method a coroutine, and call _send_parking_spots_requests using yield for the code to behave properly. So, a "correct" solution would look something like this:
#gen.coroutine
def post(self):
...
yield self._send_parking_spots_request(data) # wait (without blocking the event loop) until the method is done
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
yield queue.write(msg, callback=callback) # This has to do non-blocking I/O.
In this example, queue.write would need to be some API that sends your request using non-blocking I/O, and executes callback when a response is received. Without knowing exactly what queue in your original example is, I can't specify exactly how that can be implemented in your case.
Edit: Assuming you're using boto, you may want to check out bototornado, which implements the exact same API I described above:
def write(self, message, callback=None):
"""
Add a single message to the queue.
:type message: Message
:param message: The message to be written to the queue
:rtype: :class:`boto.sqs.message.Message`
:return: The :class:`boto.sqs.message.Message` object that was written.