Python 3 asyncio with aioboto3 seems sequential

Python 3 asyncio with aioboto3 seems sequential - python

I am porting a simple python 3 script to AWS Lambda.
The script is simple: it gathers information from a dozen of S3 objects and returns the results.
The script used multiprocessing.Pool to gather all the files in parallel. Though multiprocessing cannot be used in an AWS Lambda environment since /dev/shm is missing.
So I thought instead of writing a dirty multiprocessing.Process / multiprocessing.Queue replacement, I would try asyncio instead.
I am using the latest version of aioboto3 (8.0.5) on Python 3.8.
My problem is that I cannot seem to gain any improvement between a naive sequential download of the files, and an asyncio event loop multiplexing the downloads.
Here are the two versions of my code.
import sys
import asyncio
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
import boto3
import aioboto3
BUCKET = 'some-bucket'
KEYS = [
'some/key/1',
[...]
'some/key/10',
]
async def download_aio():
"""Concurrent download of all objects from S3"""
async with aioboto3.client('s3') as s3:
objects = [s3.get_object(Bucket=BUCKET, Key=k) for k in KEYS]
objects = await asyncio.gather(*objects)
buffers = await asyncio.gather(*[o['Body'].read() for o in objects])
def download():
"""Sequentially download all objects from S3"""
s3 = boto3.client('s3')
for key in KEYS:
object = s3.get_object(Bucket=BUCKET, Key=key)
object['Body'].read()
def run_sequential():
download()
def run_concurrent():
loop = asyncio.get_event_loop()
#loop.set_default_executor(ProcessPoolExecutor(10))
#loop.set_default_executor(ThreadPoolExecutor(10))
loop.run_until_complete(download_aio())
The timing for both run_sequential() and run_concurrent() are quite similar (~3 seconds for a dozen of 10MB files).
I am convinced the concurrent version is not, for multiple reasons:
I tried switching to Process/ThreadPoolExecutor, and I the processes/threads spawned for the duration of the function, though they are doing nothing
The timing between sequential and concurrent is very close to the same, though my network interface is definitely not saturated, and the CPU is not bound either
The time taken by the concurrent version increases linearly with the number of files.
I am sure something is missing, but I just can't wrap my head around what.
Any ideas?

After loosing some hours trying to understand how to use aioboto3 correctly, I decided to just switch to my backup solution.
I ended up rolling my own naive version of multiprocessing.Pool for use within an AWS lambda environment.
If someone stumble across this thread in the future, here it is. It is far from perfect, but easy enough to replace multiprocessing.Pool as-is for my simple cases.
from multiprocessing import Process, Pipe
from multiprocessing.connection import wait
class Pool:
"""Naive implementation of a process pool with mp.Pool API.
This is useful since multiprocessing.Pool uses a Queue in /dev/shm, which
is not mounted in an AWS Lambda environment.
"""
def __init__(self, process_count=1):
assert process_count >= 1
self.process_count = process_count
#staticmethod
def wrap_pipe(pipe, index, func):
def wrapper(args):
try:
result = func(args)
except Exception as exc: # pylint: disable=broad-except
result = exc
pipe.send((index, result))
return wrapper
def __enter__(self):
return self
def __exit__(self, exc_type, exc_value, exc_traceback):
pass
def map(self, function, arguments):
pending = list(enumerate(arguments))
running = []
finished = [None] * len(pending)
while pending or running:
# Fill the running queue with new jobs
while len(running) < self.process_count:
if not pending:
break
index, args = pending.pop(0)
pipe_parent, pipe_child = Pipe(False)
process = Process(
target=Pool.wrap_pipe(pipe_child, index, function),
args=(args, ))
process.start()
running.append((index, process, pipe_parent))
# Wait for jobs to finish
for pipe in wait(list(map(lambda t: t[2], running))):
index, result = pipe.recv()
# Remove the finished job from the running list
running = list(filter(lambda x: x[0] != index, running))
# Add the result to the finished list
finished[index] = result
return finished

it's 1.5 years later and aioboto3 is still not well documented or supported.
The multithreading option is good. but AIO is an easier and more clear implementation
I don't actually know what's wrong with your AIO code. It's even not running now because of the updates I guess. but using aiobotocore this code worked. my test was with 100 images. in the sequential code, it takes 8 sec. in average. in IO it was less than 2.
with 1000 images it was 17 sec.
import asyncio
from aiobotocore.session import get_session
async def download_aio(s3,bucket,file_name):
o = await s3.get_object(Bucket=bucket, Key=file_name)
x = await o['Body'].read()
async def run_concurrent():
tasks =[]
session = get_session()
async with session.create_client('s3') as s3:
for k in KEYS[:100]:
tasks.append(asyncio.ensure_future(get_object(s3,BUCKET,k)))
await asyncio.gather(*tasks)

Related

Callback from Ctypes sometimes fails

I have registered a python callback with a dll using the ctypes library. When the callback is triggered, i try to free up an asyncio future i have set up. Since the callback happens in a separate thread that gets spawned by the dll, i use the loop.call_soon_threadsafe() function to get back to the eventloop that started it all.
Mostly this works fine, but every once in a while the future fails to be unblocked. In the minimal example here this also happens sometimes, but here i see that in those cases the callback doesn't even arrive (or at least the corresponding print doesn't happen).
I tried this only with python 3.8.5 so far. Is there some race condition here that i did not notice?
Here's a minimal example:
import asyncio
import os
class testClass:
loop = None
future = None
exampleDll = None
def finish(self):
#now in the right c thread and eventloop.
print("callback in eventloop")
self.future.set_result(999)
def trampoline(self):
#still in the other c thread
self.loop.call_soon_threadsafe(self.finish)
def example_callback(self):
#in another c thread, so we need to do threadsafety stuff
print("callback has arrived")
self.trampoline()
return
async def register_and_wait(self):
self.loop = asyncio.get_event_loop()
self.future=self.loop.create_future()
callback_type = ctypes.CFUNCTYPE(None)
callback_as_cfunc = callback_type(self.example_callback)
#now register the callback and wait
self.exampleDll.fnminimalExample(callback_as_cfunc, ctypes.c_int(1))
await self.future
print("future has finished")
def main(self):
path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "minimalExample.dll")
#print(path)
ctypes.cdll.LoadLibrary(path)
#for easy access
self.exampleDll = ctypes.cdll.minimalExample
asyncio.run(self.register_and_wait())
if __name__ == "__main__":
for i in range(0,100000):
print(i)
test = testClass()
test.main()
You can get the compiled example dll and its source from the repository here to reproduce.

The issue (at least in this minimal example) does not show up any more if i reuse the same eventloop instead of spawning a new one for every iteration with asyncio.run
The problem is thus fixed, but it doesn't feel right.

How to create redis workers dynamically without blocking the main thread?

I want to have a queue - worker management tool, that allows adding new queues, and registering jobs to those queues, with workers spawned to handle those jobs.
I have this code so far:
from redis import Redis
from rq import Queue, Retry, Worker
class WorkerPool: # TODO: find a better name
def __init__(self):
self._queues = {}
self._workers = []
self._redis_conn = Redis()
def _get_queue(self, name):
try:
return self._queues[name]
except KeyError:
new_queue = Queue(name, connection=self._redis_conn)
self._queues[name] = new_queue
new_worker = Worker([new_queue], connection=self._redis_conn, name=name)
new_worker.work() # Blocking :(
return new_queue
def add_job(self, queue, func, *func_args):
q = self._get_queue(queue)
job = q.enqueue(func, *func_args, retry=Retry(max=3))
return job
As can be seen - the work() function blocks execution, while I want it to work in the background. I guess I can just create another thread here - and call work() from one thread, while the main thread returns the job, however, this seems a bit awkward to me. Is there a built-in Redis (or other known module) solution for this?
PS, better names for my class are welcome :)
This is my take on multiprocessing it (threading won't work due to signals sent from illegal threads):
import multiprocessing as mp
from redis import Redis
from rq import Queue, Retry, Worker
class WorkerPool: # TODO: find a better name
def __init__(self):
self._queues = {}
self._worker_procs = []
self._redis_conn = Redis()
def __del__(self):
for proc in self._worker_procs:
proc.kill()
def _get_queue(self, name):
try:
return self._queues[name]
except KeyError:
new_queue = Queue(name, connection=self._redis_conn)
self._queues[name] = new_queue
new_worker = Worker([new_queue], connection=self._redis_conn, name=name)
worker_process = mp.Process(target=new_worker.work)
worker_process.start()
self._worker_procs.append(worker_process)
return new_queue
def add_job(self, queue, func, *func_args):
q = self._get_queue(queue)
job = q.enqueue(func, *func_args, retry=Retry(max=3))
return job
Not sure how good this is, but it seems to do what I want for now

If you only need small-scale multiprocessing, tied to one main process, all running on the one machine, take a look at the multiprocessing module and the concurrent.futures module and their Pool and ProcessPoolExecutor objects. Unless you have specific requirements, it's probably better to use the Pool or ProcessPoolExecutor rather than start up Process objects manually. (In that case Redis may or may not be overkill.)
If your needs are larger-scale, workers across multiple machines, there's a whole category of software for running these; RabbitMQ is one widely-known one, but it's just one of several, each with its own strengths and weaknesses. Each of the cloud providers (if you're in the cloud) also has its own offering for this functionality. You probably want to read up on the features of several of the off-the-shelf solutions, decide which one is a good match, then set that up.
That said, I have in the past implemented a custom Redis-based queueing system; sometimes you really do need something not provided by any of the existing solutions. In that situation, the design will be heavily influenced by what features you do need. In my case, it was fine-grained priorities...

Object local to a thread in multiprocessing.dummy.Pool

I'm using multiprocessing.dummy.Pool to issue RESTful API calls in parallel.
For now the code looks like:
from multiprocessing.dummy import Pool
def onecall(args):
env = args[0]
option = args[1]
return env.call(option) # call() returns a list
def call_all():
threadpool = Pool(processes=4)
all_item = []
for item in threadpool.imap_unordered(onecall, ((create_env(), x) for x in range(100))):
all_item.extend(item)
return all_item
In the code above, env object wraps a requests.Session() object and thus is in charge of maintaining connection session. The 100 tasks use 100 different env objects. Thus, each task just creates 1 connection, make 1 API call, and disconnect.
However, to enjoy the benefit of HTTP keep-alive, I want the 100 tasks to share 4 env objects (one object per thread) so each connection serves multiple API calls one-by-one. How should I achieve that?

Using threading.local seems to work.
from multiprocessing.dummy import Pool
import threading
tlocal = threading.local()
def getEnv():
try:
return tlocal.env
except AttributeError:
tlocal.env = create_env()
return tlocal.env
def onecall(args):
option = args[0]
return getEnv().call(option) # call() returns a list
def call_all():
threadpool = Pool(processes=4)
all_item = []
for item in threadpool.imap_unordered(onecall, ((x,) for x in range(100))):
all_item.extend(item)
return all_item

python thread cannot create more than 800

below is my code and im really new to python. from my below code, i will actually create multiple threads (above 1000). but at some point, nearly 800 threads, i get an error message saying "error:cannot start new thread". i did read some about threadpool. i couldnt really understand. in my code, how can i implement threadpool? or at least please explain to me in a simple way
#!/usr/bin/python
import threading
import urllib
lock = threading.Lock()
def get_wip_info(query_str):
try:
temp = urllib.urlopen(query_str).read()
except:
temp = 'ERROR'
return temp
def makeURLcall(arg1, arg2, arg3, file_output, dowhat, result) :
url1 = "some URL call with args"
url2 = "some URL call with args"
if dowhat == "IN" :
result = get_wip_info(url1)
elif dowhat == "OUT" :
result = get_wip_info(url2)
lock.acquire()
report = open(file_output, "a")
report.writelines("%s - %s\n"%(serial, result))
report.close()
lock.release()
return
testername = "arg1"
stationcode = "arg2"
dowhat = "OUT"
result = "PASS"
file_source = "sourcefile.txt"
file_output = "resultfile.txt"
readfile = open(file_source, "r")
Data = readfile.readlines()
threads = []
for SNs in Data :
SNs = SNs.strip()
print SNs
thread = threading.Thread(target = makeURLcalls, args = (SNs, args1, testername, file_output, dowhat, result))
thread.start()
threads.append(thread)
for thread in threads :
thread.join()

Don't implement your own thread pool, use the one that ships with Python.
On Python 3, you can use concurrent.futures.ThreadPoolExecutor to use threads explicitly, on Python 2.6 and higher, you can import Pool from multiprocessing.dummy which is similar to the multiprocessing API, but backed by threads instead of processes.
Of course, if you need to do CPU bound work in CPython (the reference interpreter), you'd want to use multiprocessing proper, not multiprocessing.dummy; Python threads are fine for I/O bound work, but the GIL makes them pretty bad for CPU bound work.
Here's code to replace your explicit use of Threads with multiprocessing.dummy's Pool, using a fixed number of workers that each complete tasks as fast as possible one after another, rather than having an infinite number of one job threads. First off, since the local I/O is likely to be fairly cheap, and you want to synchronize the output, we'll make the worker task return the resulting data rather than write it out itself, and have the main thread do the write to local disk (removing the need for locking, as well as the need for opening the file over and over). This changes makeURLcall to:
# Accept args as a single sequence to ease use of imap_unordered,
# and unpack on first line
def makeURLcall(args):
arg1, arg2, arg3, dowhat, result = args
url1 = "some URL call with args"
url2 = "some URL call with args"
if dowhat == "IN" :
result = get_wip_info(url1)
elif dowhat == "OUT" :
result = get_wip_info(url2)
return "%s - %s\n" % (serial, result)
And now for the code that replaces your explicit thread use:
import multiprocessing.dummy as mp
from contextlib import closing
# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
open(file_output, 'w') as outf,\
closing(mp.Pool(32)) as pool:
# Define generator that creates tuples of arguments to pass to makeURLcall
# We also read the file in lazily instead of using readlines, to
# start producing results faster
tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
# Pulls and writes results from the workers as they become available
outf.writelines(pool.imap_unordered(makeURLcall, tasks))
# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up

parallelly execute blocking calls in python

I need to do a blocking xmlrpc call from my python script to several physical server simultaneously and perform actions based on response from each server independently.
To explain in detail let us assume following pseudo code
while True:
response=call_to_server1() #blocking and takes very long time
if response==this:
do that
I want to do this for all the servers simultaneously and independently but from same script

Use the threading module.

Boilerplate threading code (I can tailor this if you give me a little more detail on what you are trying to accomplish)
def run_me(func):
while not stop_event.isSet():
response= func() #blocking and takes very long time
if response==this:
do that
def call_to_server1():
#code to call server 1...
return magic_server1_call()
def call_to_server2():
#code to call server 2...
return magic_server2_call()
#used to stop your loop.
stop_event = threading.Event()
t = threading.Thread(target=run_me, args=(call_to_server1))
t.start()
t2 = threading.Thread(target=run_me, args=(call_to_server2))
t2.start()
#wait for threads to return.
t.join()
t2.join()
#we are done....

You can use multiprocessing module
import multiprocessing
def call_to_server(ip,port):
....
....
for i in xrange(server_count):
process.append( multiprocessing.Process(target=call_to_server,args=(ip,port)))
process[i].start()
#waiting process to stop
for p in process:
p.join()

You can use multiprocessing plus queues. With one single sub-process this is the example:
import multiprocessing
import time
def processWorker(input, result):
def remoteRequest( params ):
## this is my remote request
return True
while True:
work = input.get()
if 'STOP' in work:
break
result.put( remoteRequest(work) )
input = multiprocessing.Queue()
result = multiprocessing.Queue()
p = multiprocessing.Process(target = processWorker, args = (input, result))
p.start()
requestlist = ['1', '2']
for req in requestlist:
input.put(req)
for i in xrange(len(requestlist)):
res = result.get(block = True)
print 'retrieved ', res
input.put('STOP')
time.sleep(1)
print 'done'
To have more the one sub-process simply use a list object to store all the sub-processes you start.
The multiprocessing queue is a safe object.
Then you may keep track of which request is being executed by each sub-process simply storing the request associated to a workid (the workid can be a counter incremented when the queue get filled with new work). Usage of multiprocessing.Queue is robust since you do not need to rely on stdout/err parsing and you also avoid related limitation.
Then, you can also set a timeout on how long you want a get call to wait at max, eg:
import Queue
try:
res = result.get(block = True, timeout = 10)
except Queue.Empty:
print error

Use twisted.
It has a lot of useful stuff for work with network. It is also very good at working asynchronously.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3 asyncio with aioboto3 seems sequential - python

Related

Callback from Ctypes sometimes fails

How to create redis workers dynamically without blocking the main thread?

Object local to a thread in multiprocessing.dummy.Pool

python thread cannot create more than 800

parallelly execute blocking calls in python

Categories

Resources