How to really make this operation async in python? - python

I have very recently started checking out asyncio for python. The use case is that, I hit an endpoint /refresh_data and the data is refreshed (from an S3 bucket). But this should be a non blocking operation, other API endpoints should still be able to be serviced.
So in my controller, I have:
def refresh_data():
myservice.refresh_data()
return jsonify(dict(ok=True))
and in my service I have:
async def refresh_data():
try:
s_result = await self.s3_client.fetch()
except (FileNotFound, IOError) as e:
logger.info("problem")
gather = {i.pop("x"):i async for i in summary_result}
# ... some other stuff
and in my client:
async def fetch():
result = pd.read_parquet("bucket", "pyarrow", cols, filts).to_dict(orient="col1")
return result
And when I run it, I see this error:
TypeError: 'async for' requires an object with __aiter__ method, got coroutine
I don't know how to move past this. Adding async definitely makes it return a coroutine type - but, either I have implemented this messily or I haven't fully understood asyncio package in Python. I have been working off of simple examples but I'm not sure what's wrong with what I've done here.

Related

How to call aws service to asynchronously in python aws lambda

I need to call to async function inside lambda function. When executing this, it's showing this error
await wasn't used with future
Here is my code
async def main():
bucketName = 'data-store-test'
folder = 'Contacts-Aggre'
lastModifiedFolderPath = await getLastModifiedBucketPath(bucketName,folder)
print('Received lastModifiedFolderPath:',lastModifiedFolderPath)
async def getLastModifiedBucketPath(bucketName, prefix):
loop = asyncio.get_running_loop()
bucket_objects = await asyncio.gather(*loop.run_in_executor(None, functools.partial(s3_client.list_objects_v2, Bucket=bucketName, Prefix=prefix)))
all = bucket_objects['Contents']
latest = max(all, key=lambda x: x['LastModified'])
folderPaths = latest['Key'].split('/')
lastModifiedFolder = folderPaths[1] if len(folderPaths) >=2 else folderPaths[0]
return lastModifiedFolder
def lambda_handler(event, context):
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
What is the issue in this code? I created another question also related to this. No one answered for me, that's why I created this.
In this getLastModifiedBucketPath method used to get specific bucket location last modified folder name. If I remove asyncio.gather part, then it's working. But I need to return value after it execute.
I'm just learning asyncio this week, so take this with a grain of salt, but my belief is:
loop.run_in_executor returns a future
This future is not awaited; when you try to splat it, you get the error (try out just that snippet in your REPL and you'll see)
gather needs the splatted argument to be a list, not a future
Try awaiting that run_in_executor before using it as an argument to gather and see what happens.

Concurrent execution of two python methods

I'm creating a script that is posting a message to both discord and twitter, depending on some input. I have to methods (in separate .py files), post_to_twitter and post_to_discord. What I want to achieve is that both of these try to execute even if the other fails (e.g. if there is some exception with login).
Here is the relevant code snippet for posting to discord:
def post_to_discord(message, channel_name):
client = discord.Client()
#client.event
async def on_ready():
channel = # getting the right channel
await channel.send(message)
await client.close()
client.run(discord_config.token)
and here is the snippet for posting to twitter part (stripped from the try-except blocks):
def post_to_twitter(message):
auth = tweepy.OAuthHandler(twitter_config.api_key, twitter_config.api_key_secret)
auth.set_access_token(twitter_config.access_token, twitter_config.access_token_secret)
api = tweepy.API(auth)
api.update_status(message)
Now, both of these work perfectly fine on their own and when being called synchronously from the same method:
def main(message):
post_discord.post_to_discord(message)
post_tweet.post_to_twitter(message)
However, I just cannot get them to work concurrently (i.e. to try to post to twitter even if discord fails or vice-versa). I've already tried a couple of different approaches with multi-threading and with asyncio.
Among others, I've tried the solution from this question. But got an error No module named 'IPython'. When I omitted the IPython line, changed the methods to async, I got this error: RuntimeError: Cannot enter into task <ClientEventTask state=pending event=on_ready coro=<function post_to_discord.<locals>.on_ready at 0x7f0ee33e9550>> while another task <Task pending name='Task-1' coro=<main() running at post_main.py:31>> is being executed..
To be honest, I'm not even sure if asyncio would be the right approach for my use case, so any insight is much appreciated.
Thank you.
In this case running the two things in completely separate threads (and completely separate event loops) is probably the easiest option at your level of expertise. For example, try this:
import post_to_discord, post_to_twitter
import concurrent.futures
def main(message):
with concurrent.futures.ThreadPoolExecutor() as pool:
fut1 = pool.submit(post_discord.post_to_discord, message)
fut2 = pool.submit(post_tweet.post_to_twitter, message)
# here closing the threadpool will wait for both futures to complete
# make exceptions visible
for fut in (fut1, fut2):
try:
fut.result()
except Exception as e:
print("error: ", e)

Forwarding Events with Python Faust

I am trying to forward messages to internal topics in faust. As suggested by faust in this example:
https://faust.readthedocs.io/en/latest/playbooks/vskafka.html
I have the following code
in_topic = app.topic("in_topic", internal=False, partitions=1)
batching = app.topic("batching", internal=True, partitions=1)
....
#app.agent(in_topic)
async def process(stream):
async for event in stream:
event.forward(batching)
yield
But i always get the following error when runnning my pytest:
AttributeError: 'str' object has no attribute 'forward'
Was this feature removed or do i need to specify the topic differently to get an event, or is this even an issue with pytest ?
You are using the syntax backwards, thats why!
in_topic = app.topic("in_topic", internal=False, partitions=1)
batching = app.topic("batching", internal=True, partitions=1)
....
#app.agent(in_topic)
async def process(stream):
async for event in stream:
batching.send(value=event)
yield
Should actually work.
Edit:
This is also the only they that I could make correct use of pytest with faust.
The sink method is only usable if you delete all but the last sink of the mocked agent, which does not seem intuitive.
If you first import your sink and then add this decorator to your test, everything should work fine:
from my_path import my_sink, my_agent
from unittest.mock import patch, AsyncMock
#patch(__name__ + '.my_sink.send', new_callable=AsyncMock)
def test_my_agent(test_app)
async with my_agent.test_context() as agent:
payload = 'This_works_fine'
agent.put(payload)
my_sink.send.assert_called_with(value=payload)
try the next one:
#app.agent(in_topic, sink=[batching])
async def process(stream):
async for event in stream:
yield event
The answer from #Florian Hall works fine. I also found out that the forward method is only implemented for Events, in my case i received a str. The reason could be the difference between Faust Topics and Channels. Another thing is that using pytest and the Faust test_context() behaves weird, for example it forces you to yield even if you don't have a sink defined.

Mixing Synchronous and A-sync code in Python

I'm trying to convert a synchronous flow in Python code which is based on callbacks to an A-syncronious flow using asyncio.
Basically the code interacts a lot with TCP/UNIX sockets. It reads data from the sockets, manipulates it to make decisions and writes stuff back to the other side. This is going on over multiple sockets at once and data is shared between the contexts to make decisions sometimes.
EDIT :: The code currently is mostly based on registering a callback to a central entity for a specific socket, and having that entity run the callback when the relevant socket is readable (something like "call this function when that socket has data to be read"). Once the callback is called - a bunch of stuff happens, and eventually a new callback is registered for when new data is available. The central entity runs a select over all sockets registered to figure out which callbacks should be called.
I'm trying to do this without refactoring my entire code and making this as seamless as possible to the programmer - so I was trying to think about it like so - all code should run the same way as it does today - but whenever the current code does a socket.recv() to get new data - the process would yield execution to other tasks. When the read returns, it should go back to handling the data from the same point using the new data it got.
To do this, I wrote a new class called AsyncSocket - which interacts with the IO streams of asyncIO and placed the Async/await statements almost solely in there - thinking that I would implement the recv method in my class to make it look like a "regular IO socket" to the rest of my code.
So far - this is my understanding of what A-sync programming should allow.
Now to the problem :
My code awaits for clients to connect - when it does, each client's context is allowed to read and write from it's own connection.
I've simplified to flow to the following to clarify the problem:
class AsyncSocket():
def __init__(self,reader,writer):
self.reader = reader
self.writer = writer
def recv(self,numBytes):
print("called recv!")
data = self.read_mitigator(numBytes)
return data
async def read_mitigator(self,numBytes):
print("Awaiting of AsyncSocket.reader.read")
data = await self.reader.read(numBytes)
print("Done Awaiting of AsyncSocket.reader.read data is %s " % data)
return data
def mit2(aSock):
return mit3(aSock)
def mit3(aSock):
return aSock.recv(100)
async def echo_server(reader, writer):
print ("New Connection!")
aSock = AsyncSocket(reader,writer) # create a new A-sync socket class and pass it on the to regular code
while True:
data = await some_func(aSock) # this would eventually read from the socket
print ("Data read is %s" % (data))
if not data:
break
writer.write(data) # echo everything back
async def main(host, port):
server = await asyncio.start_server(echo_server, host, port)
await server.serve_forever()
asyncio.run(main('127.0.0.1', 5000))
mit2() and mit3() are synchronous functions that do stuff with the data on the way back before returning to the main client's loop - but here I'm just using them as empty functions.
The problem starts when I play with the implementation of some_func().
A pass through implementation (edit: kind-of-works) - but still has issues :
def some_func(aSock):
try:
return (mit2(aSock)) # works
except:
print("Error!!!!")
While an implementation which reads the data and does something with it - like adding a suffix before returning, throws an error:
def some_func(aSock):
try:
return (mit2(aSock) + "something") # doesn't work
except:
print("Error!!!!")
The error (as far as I understand it) means it's not really doing what it should:
New Connection!
called recv!
/Users/user/scripts/asyncServer.py:36: RuntimeWarning: coroutine 'AsyncSocket.read_mitigator' was never awaited
return (mit2(aSock) + "something") # doesn't work
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Error!!!!
Data read is None
And the echo server obviously doesn't work.
Obviously my code looks more like option #2 with a lot more stuff in some_func(),mit2() and mit3() - but I can't get this to work. I'm fairly new in using asyncio/async/await - so what (rather basic concept I guess) am I missing?
This code won't work as envisioned:
def recv(self,numBytes):
print("called recv!")
data = self.read_mitigator(numBytes)
return data
async def read_mitigator(self,numBytes):
...
You cannot call an async function from a sync function and get the result, you must await it, which ensures that you return to the event loop in case the data is not yet ready. This mismatch between async and sync code is sometimes referred to as the issue of function color.
Since your code is already using non-blocking sockets and an event loop, a good approach to porting it to asyncio might be to first switch to the asyncio event loop. You can use event loop methods like sock_recv to request data:
def start():
loop = asyncio.get_event_loop()
sock = make_socket() # make sure it's non-blocking
future_data = loop.sock_recv(sock, 1024)
future_data.add_done_callback(continue_read)
# return to the event loop - when some data is ready
# continue_read will be invoked
def continue_read(future):
data = future.result()
print('got', data)
# ... do something with data, e.g. process it
# and call sock_sendall with the response
asyncio.get_event_loop().call_soon(start())
asyncio.get_event_loop().run_forever()
Once you have the program working in that mode, you can start moving to coroutines, which allow the code to look like sync code, but work in exactly the same way:
async def start():
loop = asyncio.get_event_loop()
sock = make_socket() # make sure it's non-blocking
data = await loop.sock_recv(sock, 1024)
# data is available "immediately", meaning the coroutine gets
# automatically suspended when awaiting data that is not yet
# ready, and automatically re-scheduled when the data is ready
print('got', data)
asyncio.run(start())
The next step can be eliminating make_socket and switching to asyncio streams.

How to wait for coroutines to complete synchronously within method if event loop is already running?

I'm trying to create a Python-based CLI that communicates with a web service via websockets. One issue that I'm encountering is that requests made by the CLI to the web service intermittently fail to get processed. Looking at the logs from the web service, I can see that the problem is caused by the fact that frequently these requests are being made at the same time (or even after) the socket has closed:
2016-09-13 13:28:10,930 [22 ] INFO DeviceBridge - Device bridge has opened
2016-09-13 13:28:11,936 [21 ] DEBUG DeviceBridge - Device bridge has received message
2016-09-13 13:28:11,937 [21 ] DEBUG DeviceBridge - Device bridge has received valid message
2016-09-13 13:28:11,937 [21 ] WARN DeviceBridge - Unable to process request: {"value": false, "path": "testcube.pwms[0].enabled", "op": "replace"}
2016-09-13 13:28:11,936 [5 ] DEBUG DeviceBridge - Device bridge has closed
In my CLI I define a class CommunicationService that is responsible for handling all direct communication with the web service. Internally, it uses the websockets package to handle communication, which itself is built on top of asyncio.
CommunicationService contains the following method for sending requests:
def send_request(self, request: str) -> None:
logger.debug('Sending request: {}'.format(request))
asyncio.ensure_future(self._ws.send(request))
...where ws is a websocket opened earlier in another method:
self._ws = await websockets.connect(websocket_address)
What I want is to be able to await the future returned by asyncio.ensure_future and, if necessary, sleep for a short while after in order to give the web service time to process the request before the websocket is closed.
However, since send_request is a synchronous method, it can't simply await these futures. Making it asynchronous would be pointless as there would be nothing to await the coroutine object it returned. I also can't use loop.run_until_complete as the loop is already running by the time it is invoked.
I found someone describing a problem very similar to the one I have at mail.python.org. The solution that was posted in that thread was to make the function return the coroutine object in the case the loop was already running:
def aio_map(coro, iterable, loop=None):
if loop is None:
loop = asyncio.get_event_loop()
coroutines = map(coro, iterable)
coros = asyncio.gather(*coroutines, return_exceptions=True, loop=loop)
if loop.is_running():
return coros
else:
return loop.run_until_complete(coros)
This is not possible for me, as I'm working with PyRx (Python implementation of the reactive framework) and send_request is only called as a subscriber of an Rx observable, which means the return value gets discarded and is not available to my code:
class AnonymousObserver(ObserverBase):
...
def _on_next_core(self, value):
self._next(value)
On a side note, I'm not sure if this is some sort of problem with asyncio that's commonly come across or whether I'm just not getting it, but I'm finding it pretty frustrating to use. In C# (for instance), all I would need to do is probably something like the following:
void SendRequest(string request)
{
this.ws.Send(request).Wait();
// Task.Delay(500).Wait(); // Uncomment If necessary
}
Meanwhile, asyncio's version of "wait" unhelpfully just returns another coroutine that I'm forced to discard.
Update
I've found a way around this issue that seems to work. I have an asynchronous callback that gets executed after the command has executed and before the CLI terminates, so I just changed it from this...
async def after_command():
await comms.stop()
...to this:
async def after_command():
await asyncio.sleep(0.25) # Allow time for communication
await comms.stop()
I'd still be happy to receive any answers to this problem for future reference, though. I might not be able to rely on workarounds like this in other situations, and I still think it would be better practice to have the delay executed inside send_request so that clients of CommunicationService do not have to concern themselves with timing issues.
In regards to Vincent's question:
Does your loop run in a different thread, or is send_request called by some callback?
Everything runs in the same thread - it's called by a callback. What happens is that I define all my commands to use asynchronous callbacks, and when executed some of them will try to send a request to the web service. Since they're asynchronous, they don't do this until they're executed via a call to loop.run_until_complete at the top level of the CLI - which means the loop is running by the time they're mid-way through execution and making this request (via an indirect call to send_request).
Update 2
Here's a solution based on Vincent's proposal of adding a "done" callback.
A new boolean field _busy is added to CommunicationService to represent if comms activity is occurring or not.
CommunicationService.send_request is modified to set _busy true before sending the request, and then provides a callback to _ws.send to reset _busy once done:
def send_request(self, request: str) -> None:
logger.debug('Sending request: {}'.format(request))
def callback(_):
self._busy = False
self._busy = True
asyncio.ensure_future(self._ws.send(request)).add_done_callback(callback)
CommunicationService.stop is now implemented to wait for this flag to be set false before progressing:
async def stop(self) -> None:
"""
Terminate communications with TestCube Web Service.
"""
if self._listen_task is None or self._ws is None:
return
# Wait for comms activity to stop.
while self._busy:
await asyncio.sleep(0.1)
# Allow short delay after final request is processed.
await asyncio.sleep(0.1)
self._listen_task.cancel()
await asyncio.wait([self._listen_task, self._ws.close()])
self._listen_task = None
self._ws = None
logger.info('Terminated connection to TestCube Web Service')
This seems to work too, and at least this way all communication timing logic is encapsulated within the CommunicationService class as it should be.
Update 3
Nicer solution based on Vincent's proposal.
Instead of self._busy we have self._send_request_tasks = [].
New send_request implementation:
def send_request(self, request: str) -> None:
logger.debug('Sending request: {}'.format(request))
task = asyncio.ensure_future(self._ws.send(request))
self._send_request_tasks.append(task)
New stop implementation:
async def stop(self) -> None:
if self._listen_task is None or self._ws is None:
return
# Wait for comms activity to stop.
if self._send_request_tasks:
await asyncio.wait(self._send_request_tasks)
...
You could use a set of tasks:
self._send_request_tasks = set()
Schedule the tasks using ensure_future and clean up using add_done_callback:
def send_request(self, request: str) -> None:
task = asyncio.ensure_future(self._ws.send(request))
self._send_request_tasks.add(task)
task.add_done_callback(self._send_request_tasks.remove)
And wait for the set of tasks to complete:
async def stop(self):
if self._send_request_tasks:
await asyncio.wait(self._send_request_tasks)
Given that you're not inside an asynchronous function you can use the yield from keyword to effectively implement await yourself. The following code will block until the future returns:
def send_request(self, request: str) -> None:
logger.debug('Sending request: {}'.format(request))
future = asyncio.ensure_future(self._ws.send(request))
yield from future.__await__()

Categories