Async coroutine doesn't seem to finish - python

Just poking my toe into asynchronous programming in Python, and ran into an interesting application for it where I need to gather file sizes on about 10 files each on ~100 machines to see which machines aren't purging their log files appropriately.
My synchronous approach was:
File_info = namedtuple("File_info", "machineinfo size")
machines = utils.list_machines() # the computers being queried
# each machine object has attributes like "name", "IP", and "site_id" among others
file_sizes = {}
# {filename: [File_info, ...], ...}
for m in machines:
print(f"Processing {m}...") # this is "Processing {m}...".format(m=m)
# isn't Python 3.6 awesome?!
for path in glob.glob(f"//{m.IP}/somedir/*.dbf"):
fname = os.path.split(path)[-1].lower()
machineinfo = (m.site_id,
size = os.stat(path).st_size
file_sizes.setdefault(fname, []).append(File_info(registerinfo, size))
This works great, but takes a long time with the network operations pulling all those globs and stats. I wanted to use Python 3.5's async/await syntax with asyncio to asynchronize those calls. Here's what I came up with:
File_info = namedtuple("File_info", "machineinfo size")
machines = utils.list_machines()
file_sizes = {}
# {filename: [File_info, ...], ...}
async def getfilesizes(machine, loop):
machineinfo = machine.site_id,
paths = glob.glob(f"//{machine.IP}/somedir/*.dbf")
coros = [getsize(path) for path in paths]
results = loop.run_until_complete(asyncio.gather(*coros))
sizes = {fname: File_info(machineinfo, size) for (fname, size) in results}
return sizes
async def getsize(path):
return os.path.split(path)[-1], os.stat(path).st_size
loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*(getfilesizes(m, loop) for m in machines)))
for result in results:
# I have a problem here since my dict values are lists that need to extend
# not overwrite, but that's not relevant for the error I'm getting
However the script hangs inside the outer loop.run_until_complete section. What am I doing wrong?

A coroutine that wants to run another coroutine (as getfilesizes does with getsize) should await it rather than scheduling it in the event loop.
async def getfilesizes(machine): # changed func sig
machineinfo = machine.site_id,
paths = glob.glob(f"//{machine.IP}/somedir/*.dbf")
coros = [getsize(path) for path in paths]
results = await asyncio.gather(*coros) # await the results instead!
sizes = {fname: File_info(machineinfo, size) for (fname, size) in results}
return sizes
Since asyncio.gather creates one Future from any number of coroutines, this await functionally acts on the whole group of coroutines and grabs all their results at once.


Python Async what is causing the memory leak?

I am downloading zip files and looking inside them to check their contents for a few million items, but I am constantly accruing memory and I will eventually go OOM, even with small semaphores.
Consider the block:
async def zip_reader(self, blobFileName, blobEndPoint, semaphore):
# access blob
async with ClientSecretCredential(TENANT, CLIENTID, CLIENTSECRET) as credential:
async with BlobServiceClient(account_url="", credential=credential, max_single_get_size=64 * 1024 * 1024, max_chunk_get_size=32 * 1024 * 1024) as blob_service_client:
async with blob_service_client.get_blob_client(container=blobEndPoint, blob=blobFileName) as blob_client:
async with semaphore:"Starting: {blobFileName}, {blobEndPoint}")
# open bytes
writtenbytes = io.BytesIO()
# write file to it
stream = await blob_client.download_blob(max_concurrency=25)
stream = await stream.readinto(writtenbytes)
# zipfile
f = ZipFile(writtenbytes)
# file list
file_list = [s for s in f.namelist()]
# send to df
t_df = pd.DataFrame({'fileList': file_list})
# add fileName
t_df['blobFileName'] = blobFileName
t_df['blobEndPoint'] = blobEndPoint
if semaphore.locked():
await asyncio.sleep(1)"Completed: {blobFileName}")
# clean up here; also tried del on objs here as well
return t_df
async def cleanup(self):
await asyncio.sleep(1)
async def async_file_as_bytes_generator(self, blobFileName, blobEndPoint, semaphore):
main caller
semaphore = asyncio.Semaphore(value=semaphore)
return await asyncio.gather(*[self.zip_reader(fn, ep, semaphore) for fn, ep in zip(blobFileName, blobEndPoint)], # also tried attaching here)
asyncio.gather has no strategy to limit the number of simultaneous tasks in execution at all. Your semaphore may limit how many are being fetched and processed at once - but gather will wait for all data frames to be avalible, and return all at once.
Instead of using a single await asyncio.gather use something like asyncio.wait with a timeout, and keep control of how many tasks are running, yielding the complete dataframes as they become ready.
And then, you didn't show the remaining of your program leading to the call to async_file_as_bytes_generator, but it will have to consume the dataframes as they are yielded and dispose of them, of course.
Also: no need to do explicit calls to gc.collect ever: this is a no-operation. Python does free your memory if your program is correct, and keep no references to objects consuming it. Otherwise there is nothing gc.collect could do anyway.
Your "main caller" can be something along this - but as I denoted, you have to check the code that calls it so that it consumes each dataframe at once, and not expect a list with all dataframes as your current code do.
async def async_file_as_bytes_generator(self, blobFileName, blobEndPoint, task_limit):
main caller
semaphore = asyncio.Semaphore(value=task_limit)
all_tasks = {self.zip_reader(fn, ep, semaphore) for fn, ep in zip(blobFileName, blobEndPoint)}
current_tasks = set()
while all_tasks or current_tasks:
while all_tasks and len(current_tasks < task_limit):
done, incomplete = await asyncio.wait(current_tasks, return_when=asyncio.FIRST_COMPLETED)
for task in done:
# optionally check for task exception
yield task.result()
current_tasks = incomplete

Concurrency issue data corruption asyncio python-can locking/queues - nested dictionaries

I am at the end of a very long journey...
Here is the long story if you are interested.
Sorry for the incredibly long code snippet but I am not sure where I am going wrong so I thought more is more.
The code is as follows request_inst() requests instrumentation data using the dict request_info from an MCU, the MCU responds and this is picked up by the listener. obtain_message() creates a future with which to store all_data that is yielded from the listener with msg = await reader.get_message(). I attempt to structure this process with lock. store_data() is where I store the response data from the MCU, this is a dict called all_data. all_data when printed outside of the listener appears with zero values as shown below. The purpose of the code is to make all_data available outside of the event loop but currently even with this implementation I cannot get all_data to appear without zero values showing up in the dict.
import asyncio
import can
from can.notifier import MessageRecipient
from typing import List
freq = 0.0003
# this is the respond ids and the and the parameter ids of the data
# stored data is suppose to fill up the None with a value
all_data = {268439810: {16512: [None], 16513: [None], 16514: [None], 16515: [None]},
268444162: {16512: [None], 16513: [None], 16514: [None], 16515: [None]}}
request_info = {286326784: {16512, 16513, 16514, 16515},
287440896: {16512, 16513, 16514, 16515}}
# all the request ids in that have been configured
cm4_read_ids = [286326784, 287440896]
# all the response ids in that have been configured
mcu_respond_ids = [268439810, 268444162]
# finds arb id and pid and logs the data from the message
# async def store_data(arb_id, msg_data, msg_dlc):
async def store_data(msg: can.Message, lock):
pid = int.from_bytes([0:2], 'little')
arb_id = msg.arbitration_id
if arb_id in mcu_respond_ids:
async with lock:
if msg.dlc == 5:
all_data[arb_id][pid][0] = int.from_bytes([2:4], 'little', signed=False)
elif msg.dlc == 7:
all_data[arb_id][pid][0] = int.from_bytes([2:6], 'little', signed=False)
return all_data
async def request_inst(bus: can.Bus):
print('Request inst active')
while True:
for key in request_info:
for val in request_info[key]:
pid = int(val)
pidbytes = pid.to_bytes(2, 'little')
msg = can.Message(arbitration_id=key, data=pidbytes)
await asyncio.sleep(freq)
# await store_data(reader)
async def message_obtain(reader: can.AsyncBufferedReader, lock):
print('Started it the get message process')
while True:
await asyncio.sleep(0.01)
msg = await reader.get_message()
future = await store_data(msg, lock)
async with lock:
print('This is the future')
async def main() -> None:
with can.Bus(
interface="socketcan", channel="can0", receive_own_messages=True
) as bus:
# Create Notifier with an explicit loop to use for scheduling of callbacks
loop = asyncio.get_running_loop()
reader = can.AsyncBufferedReader()
lock = asyncio.Lock()
listeners: List[MessageRecipient] = [reader]
notifier = can.Notifier(bus, listeners, loop=loop)
task1 = asyncio.create_task(request_inst(bus))
task2 = asyncio.create_task(message_obtain(reader, lock))
await asyncio.gather(task1, task2)
except KeyboardInterrupt:
if __name__ == "__main__":
The issue I am seeing is that even on this cut down one page wonder I am seeing what I believe to be concurrency issues.
As you can see I have tried locking but still I am seeing these zero values appear in all_data See below where pid 16514's balue becomes 0 when the messages being returned by the MCU are not zero.
n.b. the output below where the incorrect data is shown is the output from print(future)
The real value should never be 0 as it is a measured value.
b6 = (1016872961).to_bytes(4, 'little')
struct.unpack('<f', b6)
Am I doing anything very stupid? It feels like I am not accessing the data in the listener correctly despite using lock when all_data is being modified.
If I print from the listeners the data is always correct even when all_data is returning 0 values.
If anyone is able to help me it would be much appreciated.
It appears the problem was not software related and the zeros were real. Every day is a learning day!
This is a PCAN image of the highlighted send and a response shown at the top.
EDIT: Confirmed MCU response issue - looks like the firmware on the MCU. My Rigol wouldn't trigger on the ID so I had to 1/8 video and then screen shot that to catch it in the act. You can see the response is all 7 bytes of nothing.
Ok ok ok, it turns out I was doing something really silly. I was using nested dictionaries to store data about different ids but the keys were the same. After some investigation using id(id1[some_pid1]) and id(id2[some_pid1]) I discovered the keys had the same memory address.
data_dict = {id1: {some_pid1: value, some_pid2: value},
id2: {some_pid1: value, some_pid2: value}}
This appeared to all that it was a race condition but actually I was just writing zeros (which turned out to be forced from the MCU) to the wrong id because it shared a key with the other id.

Socket error (An operation was attempted on something that is not a socket) on aiohttp function

async def simultaneous_chunked_download(urls_paths, label):
timeout = ClientTimeout(total=60000)
sem = asyncio.Semaphore(5)
async with aiohttp.ClientSession(timeout=timeout, connector=aiohttp.TCPConnector(verify_ssl=False)) as cs:
async def _fetch(r, path):
async with sem:
async with, "wb") as f:
async for chunk in r.content.iter_any():
if not chunk:
size = await f.write(chunk)
if not indeterminate:
bar._done += size
if indeterminate:
bar._done += 1
indeterminate = False
total_length = 0
tasks = []
for url, path in urls_paths.items():
r = await cs.get(url)
if not indeterminate:
total_length += r.content_length
except Exception:
indeterminate = True
tasks.append(_fetch(r, path))
verbose_print(f"url: {url},\npath: {path}\n\n")
if not indeterminate:
bar = progress.Bar(
expected_size=total_length, label=label, width=28, hide=False
bar = progress.Bar(
expected_size=len(tasks), label=label, width=28, hide=False
logger._pause_file_output = True
bar._done = 0
await asyncio.gather(*tasks)
logger._pause_file_output = False
The function I have above is for downloading a dictionary of urls asynchronously and then printing out a progress bar. An example of its usage:
The code itself runs perfectly fine, however i keep getting these errors:
Whilst benign, they are an eyesore and could point towards my lack of knowledge on both http and asynchronous code, so i would rather try and get it fixed. However im at a loss on where or what is causing it, especially as i like i said the code runs perfectly fine regardless.
If you would like a more practical hands on attempt at recreating this the full code is on my github repo on the dev branch:
Most of the program can be disregarding if you are testing this out, just press the install button and the problematic code will show itself towards the end.
Bare in mind this is a spotify themer so if you have spotify/spicetify installed you will want to use a vm.
# Create App = QtWidgets.QApplication(sys.argv)
# Configure asyncio loop to work with PyQt5
loop = QEventLoop(
# Setup GUI
globals.gui = gui.MainWindow()
# Set off loop
with loop:
class MainWindow(QuickWidget):
def __init__(self):
self.exit_request = asyncio.Event()
def closeEvent(self, *args):
Asyncio and aiohttp have some problems when running a lot of tasks concurrently on Windows, I've been having a lot of problems with it lately.
There are some workarounds available, the ones I use most are:
# set this before your event loop initialization or main function
loop = asyncio.ProactorEventLoop()

Using concurrent.futures within a for statement

I store QuertyText within a pandas dataframe. Once I've loaded all the queries into I want to conduct an analysis again each query. Currently, I have ~50k to evaluate. So, doing it one by one, will take a long time.
So, I wanted to implement concurrent.futures. How do I take the individual QueryText stored within fullAnalysis as pass it to concurrent.futures and return the output as a variable?
Here is my entire code:
import pandas as pd
import time
import gensim
import sys
import warnings
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import as_completed
fullAnalysis = pd.DataFrame()
def fetch_data(jFile = 'ProcessingDetails.json'):
print("Fetching data...please wait")
#read JSON file for latest dictionary file name
baselineDictionaryFileName = 'Dictionary/Dictionary_05-03-2020.json'
#copy data to pandas dataframe
labelled_data = pd.read_json(baselineDictionaryFileName)
#Add two more columns to get the most similar text and score
labelled_data['SimilarText'] = ''
labelled_data['SimilarityScore'] = float()
print("Data fetched from " + baselineDictionaryFileName + " and there are " + str(labelled_data.shape[0]) + " rows to be evalauted")
return labelled_data
def calculateScore(inputFunc):
warnings.filterwarnings("ignore", category=DeprecationWarning)
model = gensim.models.Word2Vec.load('w2v_model_bigdata')
inp = inputFunc
out = dict()
strEvaluation = inp.split("most_similar ",1)[1]
#while inp != 'quit':
split_inp = inp.split()
if split_inp[0] == 'help':
elif split_inp[0] == 'similarity' and len(split_inp) >= 3:
elif split_inp[0] == 'most_similar' and len(split_inp) >= 2:
for pair in model.most_similar(positive=[split_inp[1]]):
out.update({pair[0]: pair[1]})
except KeyError as ke:
#print(str(ke) + "\n")
inp = input()
return out
def main():
with ThreadPoolExecutor(max_workers=5) as executor:
for i in range(len(fullAnalysis)):
text = fullAnalysis['QueryText'][i]
arg = 'most_similar'+ ' ' + text
#for item in, arg):
output =, arg)
return output
if __name__ == "__main__":
fullAnalysis = fetch_data()
results = main()
print(f'results: {results}')
The Python Global Interpreter Lock or GIL allows only one thread to hold control of the Python interpreter. Since your function calculateScore might be cpu-bound and requires the interpreter to execute its byte code, you may be gaining little by using threading. If, on the other hand, it were doing mostly I/O operations, it would be giving up the GIL for most of its running time allowing other threads to run. But that does not seem to be the case here. You probably should be using the ProcessPoolExecutor from concurrent.futures (try it both ways and see):
def main():
with ProcessPoolExecutor(max_workers=None) as executor:
the_futures = {}
for i in range(len(fullAnalysis)):
text = fullAnalysis['QueryText'][i]
arg = 'most_similar'+ ' ' + text
future = executor.submit(calculateScore, arg)
the_futures[future] = i # map future to request
for future in as_completed(the_futures): # results as they become available not necessarily the order of submission
i = the_futures[future] # the original index
result = future.result() # the result
If you omit the max_workers parameter (or specify a value of None) from the ProcessPoolExecutor constructor, the default will be the number of processors you have on your machine (not a bad default). There is no point in specifying a value larger than the number of processors you have.
If you do not need to tie the future back to the original request, then the_futures can just be a list to which But simplest yest in not even to bother to use the as_completed method:
def main():
with ProcessPoolExecutor(max_workers=5) as executor:
the_futures = []
for i in range(len(fullAnalysis)):
text = fullAnalysis['QueryText'][i]
arg = 'most_similar'+ ' ' + text
future = executor.submit(calculateScore, arg)
# wait for the completion of all the results and return them all:
results = [f.result() for f in the_futures()] # results in creation order
return results
It should be mentioned that code that launches the ProcessPoolExecutor functions should be in a block governed by a if __name__ = '__main__':. If it isn't you will get into a recursive loop with each subprocess launching the ProcessPoolExecutor. But that seems to be the case here. Perhaps you meant to use the ProcessPoolExecutor all along?
I don't know what the line ...
model = gensim.models.Word2Vec.load('w2v_model_bigdata')
... in function calculateStore does. It may be the one i/o-bound statement. But this appears to be something that does not vary from call to call. If that is the case and model is not being modified in the function, shouldn't this statement be moved out of the function and computed just once? Then this function would clearly run faster (and be clearly cpu-bound).
The exception block ...
except KeyError as ke:
#print(str(ke) + "\n")
inp = input()
... is puzzling. You are inputting a value that will never be used right before returning. If this is to pause execution, there is no error message being output.
With Booboo assistance, I was able to update code to include ProcessPoolExecutor. Here is my updated code. Overall, processing has been speed up by more than 60%.
I did run into a processing issue and found this topic BrokenPoolProcess that addresses the issue.
output = {}
thePool = {}
def main(labelled_data, dictionaryRevised):
args = sys.argv[1:]
with ProcessPoolExecutor(max_workers=None) as executor:
for i in range(len(labelled_data)):
text = labelled_data['QueryText'][i]
arg = 'most_similar'+ ' '+ text
output = winprocess.submit(
executor, calculateScore, arg
thePool[output] = i #original index for future to request
for output in as_completed(thePool): # results as they become available not necessarily the order of submission
i = thePool[output] # the original index
text = labelled_data['QueryText'][i]
result = output.result() # the result
maximumKey = max(result.items(), key=operator.itemgetter(1))[0]
maximumValue = result.get(maximumKey)
labelled_data['SimilarText'][i] = maximumKey
labelled_data['SimilarityScore'][i] = maximumValue
return labelled_data, dictionaryRevised
if __name__ == "__main__":
start = time.perf_counter()
print("Starting to evaluate Query Text for labelling...")
output_Labelled_Data, output_dictionary_revised = preProcessor()
output,dictionary = main(output_Labelled_Data, output_dictionary_revised)
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')

Converting graph traversal to multiprocessing in Python

I've been working on a graph traversal algorithm over a simple network and I'd like to run it using multiprocessing since it it going to require a lot of I/O bounded calls when I scale it over the full network. The simple version runs pretty fast:
already_seen = {}
already_seen_get = already_seen.get
GH_add_node = GH.add_node
GH_add_edge = GH.add_edge
GH_has_node = GH.has_node
GH_has_edge = GH.has_edge
def graph_user(user, depth=0):
logger.debug("Searching for %s", user)
logger.debug("At depth %d", depth)
users_to_read = followers = following = []
if already_seen_get(user):
logging.debug("Already seen %s", user)
return None
result = [x.value for x in list(view[user])]
if result:
result = result[0]
following = result['following']
followers = result['followers']
users_to_read = set().union(following, followers)
if not GH_has_node(user):
logger.debug("Adding %s to graph", user)
for follower in users_to_read:
if not GH_has_node(follower):
logger.debug("Adding %s to graph", follower)
if depth < max_depth:
graph_user(follower, depth + 1)
if GH_has_edge(follower, user):
GH[follower][user]['weight'] += 1
GH_add_edge(user, follower, {'weight': 1})
Its actually significantly faster than my multiprocessing version:
to_write = Queue()
to_read = Queue()
to_edge = Queue()
already_seen = Queue()
def fetch_user():
seen = {}
read_get = to_read.get
read_put = to_read.put
write_put = to_write.put
edge_put = to_edge.put
seen_get = seen.get
while True:
logging.debug("Begging for a user")
user = read_get(timeout=1)
if seen_get(user):
logging.debug("Adding %s", user)
seen[user] = True
result = [x.value for x in list(view[user])]
write_put(user, timeout=1)
if result:
result = result.pop()
logging.debug("Got user %s and result %s", user, result)
following = result['following']
followers = result['followers']
users_to_read = list(set().union(following, followers))
[edge_put((user, x, {'weight': 1})) for x in users_to_read]
[read_put(y, timeout=1) for y in users_to_read if not seen_get(y)]
except Empty:
logging.debug("Fetches complete")
def write_node():
users = []
users_app = users.append
write_get = to_write.get
while True:
user = write_get(timeout=1)
logging.debug("Writing user %s", user)
except Empty:
logging.debug("Users complete")
return users
def write_edge():
edges = []
edges_app = edges.append
edge_get = to_edge.get
while True:
edge = edge_get(timeout=1)
logging.debug("Writing edge %s", edge)
except Empty:
logging.debug("Edges Complete")
return edges
if __name__ == '__main__':
pool = Pool(processes=1)
users = pool.apply_async(write_node)
edges = pool.apply_async(write_edge)
What I can't figure out is why the single process version is so much faster. In theory, the multiprocessing version should be writing and reading simultaneously. I suspect there is lock contention on the queues and that is the cause of the slow down but I don't really have any evidence of that. When I scale the number of fetch_user processes it seems to run faster, but then I have issues with synchronizing the data seen across them. So some thoughts I've had are
Is this even a good application for
multiprocessing? I was originally
using it because I wanted to be able
to fetch from the db in parallell.
How can I avoid resource contention when reading and writing from the same queue?
Did I miss some obvious caveat for the design?
What can I do to share a lookup table between the readers so I don't keep fetching the same user twice?
When increasing the number of fetching processes they writers eventually lock. It looks like the write queue is not being written to, but the read queue is full. Is there a better way to handle this situation than with timeouts and exception handling?
Queues in Python are synchronized. This means that only one thread at a time can read/write, this will definitely provoke a bottleneck in your app.
One better solution is to distribute the processing based on a hash function and assign the processing to the threads with a simple module operation. So for instance if you have 4 threads you could have 4 queues:
thread_queues = []
for i in range(4):
thread_queues = Queue()
for user in user_list:
user_hash=hash(user.user_id) #hash in here is just shortcut to some standard hash utility
thread_id = user_hash % 4
# From here ... your pool of threads access thread_queues but each thread ONLY accesses
# one queue based on a numeric id given to each of them.
Most of hash functions will distribute evenly your data. I normally use UMAC. But maybe you can just try with the hash function from the Python String implementation.
Another improvement would be to avoid the use of Queues and use a non-sync object, such a list.
