Gathering results from an asyncio task list with exceptions

Gathering results from an asyncio task list with exceptions - python

The code...
import asyncio
import random
from time import perf_counter
from typing import Iterable
from pprint import pprint
async def coro(n, i, threshold=0.4):
await asyncio.sleep(i)
if i > threshold:
# For illustration's sake - some coroutines may raise,
# and we want to accomodate that and just test for exception
# instances in the results of asyncio.gather(return_exceptions=True)
raise Exception(f"{i} of Task-{n} is too high")
return i
async def main(it: Iterable, timeout: float) -> tuple:
tasks = [asyncio.create_task(coro(i+1, d), name=f"Task-{i+1}") for i, d in enumerate(it)]
await asyncio.wait(tasks, timeout=timeout)
return tasks # *not* (done, pending)
timeout = 0.5
random.seed(444)
n = 10
it = [random.random() for _ in range(n)]
start = perf_counter()
tasks = asyncio.run(main(it=it, timeout=timeout))
elapsed = perf_counter() - start
print(f"Done main({n}) in {elapsed:0.2f} seconds\n")
pprint(tasks)
print('----')
# does not work from here on....
res = []
for t in tasks:
try:
r = t.result() # gives an error!!!
except Exception as e:
res.append(e)
else:
res.append(r)
pprint(res)
...does not work for collection of the task results. It fails with ...
Traceback (most recent call last):
File "c:\Users\user\Documents\user\projects\learn\asyncio\wrap_gather_in_timeout.py", line 8, in coro
await asyncio.sleep(i)
File "C:\Users\user\AppData\Local\Programs\Python\Python39\lib\asyncio\tasks.py", line 654, in sleep
return await future
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\Users\user\Documents\user\projects\learn\asyncio\wrap_gather_in_timeout.py", line 35, in <module>
r = t.result()
asyncio.exceptions.CancelledError
Task exception was never retrieved
future: <Task finished name='Task-7' coro=<coro() done, defined at c:\Users\user\Documents\user\projects\learn\asyncio\wrap_gather_in_timeout.py:7> exception=Exception('i too high')>
Traceback (most recent call last):
File "c:\Users\user\Documents\user\projects\learn\asyncio\wrap_gather_in_timeout.py", line 13, in coro
raise Exception("i too high")
Exception: i too high
The code was run in python 3.9.
Any idea where I am going wrong and why?
Is it because the tasks need to be cancelled after it throws an exception? I could not successfully implement it.
Inspired by: Solution to wrapping asyncio.gather SO

Your code works, the issue why you are not able to create res successfully is because the code does not raise just the normal Exception class. Since the task fails it ends up calling asyncio.exceptions.CancelledError which if we take a look in the documentation inherits from BaseException not Exception. This change is new as of Python 3.8 and since you are using Python 3.9 that change is live. Changing your code slightly to the following yields:
res = []
for t in tasks:
try:
r = t.result() # gives an error!!!
except BaseException as e:
res.append(e)
continue
res.append(r)
print(res)
[0.3088946587429545,
0.01323751590501987,
Exception('0.4844375347808497 of Task-3 is too high'),
asyncio.exceptions.CancelledError(),
asyncio.exceptions.CancelledError(),
asyncio.exceptions.CancelledError(),
Exception('0.4419557492849159 of Task-7 is too high'),
0.3113884366691503,
0.07422124156714727,
asyncio.exceptions.CancelledError()]

Related

Terminating try block in python after n seconds

I am trying to impose a TimeoutException on a try statement after n seconds. I have found a library which handles this called signal which would be perfect but I'm running into an error I have a hard time getting around. (This similar SO question is answered with the signal library.)
This is the boiled down code representing the problem:
import multiprocessing
from multiprocessing.dummy import Pool
def main():
listOfLinks = []
threadpool = Pool(2)
info = threadpool.starmap(processRunSeveralTimesInParallel,zip(enumerate(listOfLinks)))
threadpool.close()
def processRunSeveralTimesInParallel(listOfLinks):
#The following is pseudo code representing what I would like to do:
loongSequenceOfInstructions()
for i in range(0,10):
try for n seconds:
doSomething(i)
except (after n seconds):
handleException()
return something
When implementing the above question's solution with the signal library, I get the following error:
File "file.py", line 388, in main
info = threadpool.starmap(processRunSeveralTimesInParallel,zip(enumerate(listOfLinks)))
File "/Users/user/anaconda3/envs/proj/lib/python3.8/multiprocessing/pool.py", line 372, in starmap
return self._map_async(func, iterable, starmapstar, chunksize).get()
File "/Users/user/anaconda3/envs/proj/lib/python3.8/multiprocessing/pool.py", line 771, in get
raise self._value
File "/Users/user/anaconda3/envs/proj/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/user/anaconda3/envs/proj/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "file.py", line 193, in processRunSeveralTimesInParallel
signal.signal(signal.SIGALRM, signal_handler)
File "/Users/user/anaconda3/envs/proj/lib/python3.8/signal.py", line 47, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread
Any idea how to cap the time just on a try block within a method run as a thread? Thank you!
Important information:
I am using the multiprocessing library to run several processes at the same time in parallel. From the error statement above, I suspect that the signal and multiprocessing libraries conflict.
The methods in the try statement are selenium (find_element_by_xpath) methods. However there are no timeout arguments available.

Newly Updated Answer
If you are a way of looking for timing out without using signals, here is one way. First, since you are using threading, let's make it explicit and let's use the concurrent.futures module, which has a lot of flexibility.
When a "job" is submitted to the pool executor, a Future instance is returned immediately without blocking until a result call is made on that instance. You can specify a timeout value such that if the result is not available within the timeout period, an exception will be thrown. The idea is to pass to the worker thread the ThreadPoolExecutor instance and for it to run the critical piece of code that must be completed within a certain time period within its own worker thread. A Future instance will be created for that timed code but this time the result call will specify a timeout value:
from concurrent.futures import ThreadPoolExecutor, TimeoutError
import time
def main():
listOfLinks = ['a', 'b', 'c', 'd', 'e']
futures = []
"""
To prevent timeout errors due to lack of threads, you need at least one extra thread
in addition to the ones being created here so that at least one time_critical thread
can start. Of course, ideally you would like all the time_critical threads to be able to
start without waiting. So, whereas the minimum number of max_workers would be 6 in this
case, the ideal number would be 5 * 2 = 10.
"""
with ThreadPoolExecutor(max_workers=10) as executor:
# pass executor to our worker
futures = [executor.submit(processRunSeveralTimesInParallel, tuple, executor) for tuple in enumerate(listOfLinks)]
for future in futures:
result = future.result()
print('result is', result)
def processRunSeveralTimesInParallel(tuple, executor):
link_number = tuple[0]
link = tuple[1]
# long running sequence of instructions up until this point and then
# allow 2 seconds for this part:
for i in range(10):
future = executor.submit(time_critical, link, i)
try:
future.result(timeout=2) # time_critical does not return a result other than None
except TimeoutError:
handle_exception(link, i)
return link * link_number
def time_critical(link, trial_number):
if link == 'd' and trial_number == 7:
time.sleep(3) # generate a TimeoutError
def handle_exception(link, trial_number):
print(f'There was a timeout for link {link}, trial number {trial_number}.')
if __name__ == '__main__':
main()
Prints:
result is
result is b
result is cc
There was a timeout for link d, trial number 7.
result is ddd
result is eeee
Using Threading and Multiprocessing
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor, TimeoutError
import os
import time
def main():
listOfLinks = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
futures = []
cpu_count = os.cpu_count()
with ThreadPoolExecutor(max_workers=cpu_count) as thread_executor, ProcessPoolExecutor(max_workers=cpu_count) as process_executor:
# pass executor to our worker
futures = [thread_executor.submit(processRunSeveralTimesInParallel, tuple, process_executor) for tuple in enumerate(listOfLinks)]
for future in futures:
result = future.result()
print('result is', result)
def processRunSeveralTimesInParallel(tuple, executor):
link_number = tuple[0]
link = tuple[1]
# long running sequence of instructions up until this point and then
# allow 2 seconds for this part:
for i in range(10):
future = executor.submit(time_critical, link, i)
try:
future.result(timeout=2) # time_critical does not return a result other than None
except TimeoutError:
handle_exception(link, i)
return link * link_number
def time_critical(link, trial_number):
if link == 'd' and trial_number == 7:
time.sleep(3) # generate a TimeoutError
def handle_exception(link, trial_number):
print(f'There was a timeout for link {link}, trial number {trial_number}.')
if __name__ == '__main__':
main()
Prints:
result is
result is b
result is cc
There was a timeout for link d, trial number 7.
result is ddd
result is eeee
result is fffff
result is gggggg
result is hhhhhhh
result is iiiiiiii
result is jjjjjjjjj
Multiprocessing Exclusively
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Process
import os
import time
def main():
listOfLinks = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
futures = []
workers = os.cpu_count() // 2
with ProcessPoolExecutor(max_workers=workers) as process_executor:
# pass executor to our worker
futures = [process_executor.submit(processRunSeveralTimesInParallel, tuple) for tuple in enumerate(listOfLinks)]
for future in futures:
result = future.result()
print('result is', result)
def processRunSeveralTimesInParallel(tuple):
link_number = tuple[0]
link = tuple[1]
# long running sequence of instructions up until this point and then
# allow 2 seconds for this part:
for i in range(10):
p = Process(target=time_critical, args=(link, i))
p.start()
p.join(timeout=2) # don't block for more than 2 seconds
if p.exitcode is None: # subprocess did not terminate
p.terminate() # we will terminate it
handle_exception(link, i)
return link * link_number
def time_critical(link, trial_number):
if link == 'd' and trial_number == 7:
time.sleep(3) # generate a TimeoutError
def handle_exception(link, trial_number):
print(f'There was a timeout for link {link}, trial number {trial_number}.')
if __name__ == '__main__':
main()
Prints:
result is
result is b
result is cc
There was a timeout for link d, trial number 7.
result is ddd
result is eeee
result is fffff
result is gggggg
result is hhhhhhh
result is iiiiiiii
result is jjjjjjjjj

asyncio + asyncpg + pandas: obtain pandas.df with async selects from db - ERROR

Edited my code - NOW it WORKS
I'm trying to obtain some date from my Postgres db through asyncpg connection pool asynchronously.
Basically my db contain about 100 different tables (per city) and i'm trying to gather all the data in one frame as fast as it possible.
import pandas as pd
import asyncpg
import asyncio
from time import time
def make_t():
lst = []
# iterator for sql tuple
for i in ['a',
'b',
'c']:
i1 = i
sql = """
SELECT
'%s' as city,
MAX(starttime) AS max_ts
FROM
"table_%s"
"""
lst.append(sql % (i, i1))
return tuple(lst)
async def get_data(pool, sql):
start = time()
async with pool.acquire() as conn:
stmt = await conn.prepare(sql)
columns = [a.name for a in stmt.get_attributes()]
data = await stmt.fetch()
print(f'Exec time: {time() - start}')
return pd.DataFrame(data, columns=columns)
async def main():
dsn = 'postgres://user:pass#127.0.0.1:5432/my_base'
cT = ['city', 'max_ts']
sqls = make_t()
pool = await asyncpg.create_pool(dsn=dsn, max_size=50)
start = time()
tasks = []
for sql in sqls:
tasks.append(loop.create_task(get_data(pool, sql)))
tasks = await asyncio.gather(*tasks)
df = pd.DataFrame(columns=cT)
for task in tasks:
# form df from corutine results
df = df.append(task.result())
print(f'total exec time: {time() - start} secs')
print('exiting main')
return df
loop = asyncio.get_event_loop()
df = loop.run_until_complete(main())
loop.close()
print('exiting program')
Python 3.6.5 :: Anaconda, Inc.
Gets me this error:
Traceback (most recent call last):
File "", line 319, in
File "/Users/fixx/anaconda3/lib/python3.6/asyncio/base_events.py", line
468, in run_until_complete
return future.result()
File "", line 308, in main
File "/Users/fixx/anaconda3/lib/python3.6/asyncio/tasks.py", line 594, in gather
for arg in set(coros_or_futures):
TypeError: unhashable type: 'list'
I cant figure out, why? My sqls in tuple!

asyncio.gather accepts coroutines as individual arguments, and you are sending it a list of tasks. You have to use the * operator to call gather correctly:
tasks = await asyncio.gather(*tasks)

Django celery - asyncio - daemonic process are not allowed to have children

I can see similar questions have been asked before but those are running multi processors and not executors. Therefore I am unsure how to fix this.
the GitHub issue also say its resolved in 4.1 https://github.com/celery/celery/issues/1709
I am using
celery==4.1.1
django-celery==3.2.1
django-celery-beat==1.0.1
django-celery-results==1.0.1
My script as as follows, ive tried to cut it down to show relevant code only.
#asyncio.coroutine
def snmp_get(ip, oid, snmp_user, snmp_auth, snmp_priv):
results=[]
snmpEngine = SnmpEngine()
errorIndication, errorStatus, errorIndex, varBinds = yield from getCmd(
...
)
...
for varBind in varBinds:
results.append(' = '.join([x.prettyPrint() for x in varBind]))
snmpEngine.transportDispatcher.closeDispatcher()
return results
def create_link_data_record(link_data):
obj = LinkData.objects.create(
...
)
return 'data polled for {} record {} created'.format(link_data.hostname, obj.id)
async def retrieve_data(link, loop):
from concurrent.futures import ProcessPoolExecutor
executor = ProcessPoolExecutor(2)
poll_interval = 60
results = []
# credentials:
...
print('polling data for {} on {}'.format(hostname,link_mgmt_ip))
# create link data obj
link_data = LinkDataObj()
...
# first poll for speeds
download_speed_data_poll1 = await snmp_get(link_mgmt_ip, down_speed_oid % link_index ,snmp_user, snmp_auth, snmp_priv)
download_speed_data_poll1 = await snmp_get(link_mgmt_ip, down_speed_oid % link_index ,snmp_user, snmp_auth, snmp_priv)
# check we were able to poll
if 'timeout' in str(get_snmp_value(download_speed_data_poll1)).lower():
return 'timeout trying to poll {} - {}'.format(hostname ,link_mgmt_ip)
upload_speed_data_poll1 = await snmp_get(link_mgmt_ip, up_speed_oid % link_index, snmp_user, snmp_auth, snmp_priv)
# wait for poll interval
await asyncio.sleep(poll_interval)
# second poll for speeds
download_speed_data_poll2 = await snmp_get(link_mgmt_ip, down_speed_oid % link_index, snmp_user, snmp_auth, snmp_priv)
upload_speed_data_poll2 = await snmp_get(link_mgmt_ip, up_speed_oid % link_index, snmp_user, snmp_auth, snmp_priv)
# create deltas for speed
down_delta = int(get_snmp_value(download_speed_data_poll2)) - int(get_snmp_value(download_speed_data_poll1))
up_delta = int(get_snmp_value(upload_speed_data_poll2)) - int(get_snmp_value(upload_speed_data_poll1))
...
results.append(await loop.run_in_executor(executor, create_link_data_record, link_data))
return results
def get_link_data():
link_data = LinkTargets.objects.all()
# create loop
loop = asyncio.get_event_loop()
if asyncio.get_event_loop().is_closed():
loop = asyncio.new_event_loop()
asyncio.set_event_loop(asyncio.new_event_loop())
# create tasks
tasks = [asyncio.ensure_future(retrieve_data(link, loop)) for link in link_data]
if tasks:
start = time.time()
done, pending = loop.run_until_complete(asyncio.wait(tasks))
loop.close()
the error below which references the run_in_executor code
[2018-05-24 14:13:00,840: ERROR/ForkPoolWorker-3] Task exception was never retrieved
future: <Task finished coro=<retrieve_data() done, defined at /itapp/itapp/monitoring/jobs/link_monitoring.py:130> exception=AssertionError('daemonic processes are not allowed to have children',)>
Traceback (most recent call last):
File "/itapp/itapp/monitoring/jobs/link_monitoring.py", line 209, in retrieve_data
link_data.last_change = await loop.run_in_executor(executor, timestamp, (link_data.link_target_id, link_data.service_status))
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 639, in run_in_executor
return futures.wrap_future(executor.submit(func, *args), loop=self)
File "/usr/local/lib/python3.6/concurrent/futures/process.py", line 466, in submit
self._start_queue_management_thread()
File "/usr/local/lib/python3.6/concurrent/futures/process.py", line 427, in _start_queue_management_thread
self._adjust_process_count()
File "/usr/local/lib/python3.6/concurrent/futures/process.py", line 446, in _adjust_process_count
p.start()
File "/usr/local/lib/python3.6/multiprocessing/process.py", line 103, in start
'daemonic processes are not allowed to have children'
AssertionError: daemonic processes are not allowed to have children

Try with Celery 5-devel
pip install git+https://github.com/celery/celery#5.0-devel
As per below issue
https://github.com/celery/celery/issues/3884
Celery 5.0 will support asyncio. We currently do not support it.
And then there is also below SO thread on same
How to combine Celery with asyncio?

with keyword works when openning file but not when calling a function

I'm experimenting with multiprocessing module and copying example codes from this page. Here is one example:
#!/usr/bin/python
from multiprocessing import Pool
from time import sleep
def f(x):
return x*x
if __name__ == '__main__':
# start 4 worker processes
with Pool(processes=4) as pool:
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10)))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print(i)
# evaluate "f(10)" asynchronously
res = pool.apply_async(f, [10])
print(res.get(timeout=1)) # prints "100"
# make worker sleep for 10 secs
res = pool.apply_async(sleep, [10])
print(res.get(timeout=1)) # raises multiprocessing.TimeoutError
# exiting the 'with'-block has stopped the pool
After running this code I get:
Traceback (most recent call last):
File "example01.py", line 11, in <module>
with Pool(processes=4) as pool:
AttributeError: __exit__
Somehow I've find that this is due to with keyword. However this code is also using with and it is running:
#!/usr/bin/python
with open("input.csv", "wb") as filePath:
pass
filePath.close()
When I want to run mentioned example I have to modify it following way:
#!/usr/bin/python
from multiprocessing import Pool
from time import sleep
import traceback
def f(x):
return x*x
if __name__ == '__main__':
# start 4 worker processes
# with Pool(processes=4) as pool:
try:
pool = Pool(processes = 4)
# print "[0, 1, 4,..., 81]"
print(pool.map(f, range(10)))
# print same numbers in arbitrary order
for i in pool.imap_unordered(f, range(10)):
print(i)
# evaluate "f(10)" asynchronously
res = pool.apply_async(f, [10])
print(res.get(timeout=1)) # prints "100"
# make worker sleep for 10 secs
res = pool.apply_async(sleep, [10])
print(res.get(timeout=1)) # raises multiprocessing.TimeoutError
# exiting the 'with'-block has stopped the pool
# http://stackoverflow.com/questions/4990718/python-about-catching-any-exception
# http://stackoverflow.com/questions/1483429/how-to-print-an-error-in-python
# http://stackoverflow.com/questions/1369526/what-is-the-python-keyword-with-used-for
except Exception as e:
print "Exception happened:"
print type(e)
print str(e)
print traceback.print_exc()
The output then looks like this:
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
0
1
4
9
16
25
36
49
64
81
100
Exception happened:
<class 'multiprocessing.TimeoutError'>
Traceback (most recent call last):
File "example01_mod.py", line 29, in <module>
print(res.get(timeout=1)) # raises multiprocessing.TimeoutError
File "/usr/lib/python2.7/multiprocessing/pool.py", line 563, in get
raise TimeoutError
TimeoutError
None
Why I get an error when using with keyword, are those codes (with and try-catch) equivalents? I'm using python 2.7.10.

This PEP describes the context manager interface which is needed for using the 'with' keyword. Python 2.7's version of the Pool class does not support this interface so you cannot use the 'with' keyword in the way you described.
You can just rewrite the code to not use with and join/terminate the Pool directly as in the example here or you can upgrade to Python 3 which does support 'with' for Pools.

You have wrongly assumed that the multiprocessor Pool class of Python 2.7 returns a context manager. Instead it returns a List of worker processes. The Pool was converted to a context manager object from Python 3.4.3 and if your version predates that, you cannot use it with a context manager.
To use with the with statement, the expression should return an object with enter and exit methods so you get the familiar error.
AttributeError: __exit_

How in Python find where exception was raised

How to determine in what function exception was raised. For example exist two functions: 'foo' and 'bar'. In 'foo' exception will raised randomly.
import random
def foo():
if random.randint(1, 10) % 2:
raise Exception
bar()
def bar():
raise Exception
try:
foo()
except Exception as e:
print "Exception raised in %s" % ???

import inspect
try:
foo()
except Exception as e:
print "Exception raised in %s" % inspect.trace()[-1][3]

I use the traceback module, like so:
import traceback
try:
1 / 0
except Exception:
print traceback.format_exc()
This gives the following output:
Traceback (most recent call last):
File "<ipython-input-3-6b05b5b621cb>", line 2, in <module>
1 / 0
ZeroDivisionError: integer division or modulo by zero
If the code runs from a file, the traceback will tell the line and character number of where the error occured :)
EDIT:
To accomodate the comment from Habibutsu: Yes, it's useful for printing, but when needed to get more info (for example function name) - not suitable
The doc-pages tell you how to extract the trace programmatically: http://docs.python.org/2/library/traceback.html
From the page linked above:
>>> import traceback
>>> def another_function():
... lumberstack()
...
>>> def lumberstack():
... traceback.print_stack()
... print repr(traceback.extract_stack())
... print repr(traceback.format_stack())
...
>>> another_function()
File "<doctest>", line 10, in <module>
another_function()
File "<doctest>", line 3, in another_function
lumberstack()
File "<doctest>", line 6, in lumberstack
traceback.print_stack()
[('<doctest>', 10, '<module>', 'another_function()'),
('<doctest>', 3, 'another_function', 'lumberstack()'),
('<doctest>', 7, 'lumberstack', 'print repr(traceback.extract_stack())')]
[' File "<doctest>", line 10, in <module>\n another_function()\n',
' File "<doctest>", line 3, in another_function\n lumberstack()\n',
' File "<doctest>", line 8, in lumberstack\n print repr(traceback.format_stack())\n']
The doc-string for traceback.extract_stack is the same as for traceback.extract_tb
traceback.extract_tb(traceback[, limit])
Return a list of up to limit “pre-processed” stack trace entries
extracted from the traceback object traceback. It is useful for
alternate formatting of stack traces. If limit is omitted or None, all
entries are extracted. A “pre-processed” stack trace entry is a
quadruple (filename, line number, function name, text) representing
the information that is usually printed for a stack trace. The text is
a string with leading and trailing whitespace stripped; if the source
is not available it is None.

What is your goal? If you are worried about bar and foo throwing the same exception type and the caller not being able to differentiate between them, just derive a new exception class:
import random
class FooException(Exception):
"""An exception thrown only by foo."""
def foo():
if random.randint(1,10) % 2:
raise FooException
bar()
def bar():
raise Exception
try:
foo()
except FooException:
print "Exception raised in foo..."
except:
print "Exception raised in bar (probably)..."

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Gathering results from an asyncio task list with exceptions - python

Related

Terminating try block in python after n seconds

asyncio + asyncpg + pandas: obtain pandas.df with async selects from db - ERROR

Django celery - asyncio - daemonic process are not allowed to have children

with keyword works when openning file but not when calling a function

How in Python find where exception was raised

Categories

Resources