I have a coroutine as follows:
async def download():
downloader = DataManager()
downloader.download()
DataManager.download() method looks like:
def download(self):
start_multiple_docker_containers()
while True:
check_containers_statuses()
sleep(N) # synchronous sleep from time module
Is this a good practice? If no, how can I use asyncio.sleep in download()?
Or maybe such code structure is conceptually wrong?
Here's my solution:
import asyncio
import time
# Mocks of domain-specific functions
# ----------------------------------
def get_container_status(container_id, initial_time):
"""This mocks container status to change to 'exited' in 10 seconds"""
if time.time() - initial_time < 10:
print("%s: container %s still running" % (time.time(), container_id))
return 'running'
else:
print("%s: container %s exited" % (time.time(), container_id))
return 'exited'
def is_download_complete(container_id, initial_time):
"""This mocks download finished in 20 seconds after program's start"""
if time.time() - initial_time < 20:
print("%s: download from %s in progress" % (time.time(), container_id))
return False
else:
print("%s: download from %s done" % (time.time(), container_id))
return True
def get_downloaded_data(container_id):
return "foo"
# Coroutines
# ----------
async def container_exited(container_id, initial_time):
while True:
await asyncio.sleep(1) # == setTimeout(1000), != sleep(1000)
if get_container_status(container_id, initial_time) == 'exited':
return container_id
async def download_data_by_container_id(container_id, initial_time):
container_id = await container_exited(container_id, initial_time)
while True:
await asyncio.sleep(1)
if is_download_complete(container_id, initial_time):
return get_downloaded_data(container_id)
# Main loop
# ---------
if __name__ == "__main__":
initial_time = time.time()
loop = asyncio.get_event_loop()
tasks = [
asyncio.ensure_future(download_data_by_container_id("A", initial_time)),
asyncio.ensure_future(download_data_by_container_id("B", initial_time))
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Results in:
1487334722.321165: container A still running
1487334722.321412: container B still running
1487334723.325897: container A still running
1487334723.3259578: container B still running
1487334724.3285959: container A still running
1487334724.328662: container B still running
1487334725.3312798: container A still running
1487334725.331337: container B still running
1487334726.3340318: container A still running
1487334726.33409: container B still running
1487334727.336779: container A still running
1487334727.336842: container B still running
1487334728.339425: container A still running
1487334728.339506: container B still running
1487334729.34211: container A still running
1487334729.342168: container B still running
1487334730.3448708: container A still running
1487334730.34493: container B still running
1487334731.34754: container A exited
1487334731.347598: container B exited
1487334732.350253: download from A in progress
1487334732.3503108: download from B in progress
1487334733.354369: download from A in progress
1487334733.354424: download from B in progress
1487334734.354686: download from A in progress
1487334734.3548028: download from B in progress
1487334735.358371: download from A in progress
1487334735.358461: download from B in progress
1487334736.3610592: download from A in progress
1487334736.361115: download from B in progress
1487334737.363115: download from A in progress
1487334737.363211: download from B in progress
1487334738.3664992: download from A in progress
1487334738.36656: download from B in progress
1487334739.369131: download from A in progress
1487334739.36919: download from B in progress
1487334740.371079: download from A in progress
1487334740.37119: download from B in progress
1487334741.374521: download from A done
1487334741.3745651: download from B done
As for the sleep() function - no, you shouldn't use it. It blocks the whole python interpreter for 1 second, which is not what you want.
Remember, you don't have parallelism (threads etc.), you have concurrency.
I.e. you have a python interpreter with just 1 thread of execution, where your main loop and all your coroutines run, preempting each other. You want your interpreter to spend 99.999% of its working time in that main loop, created by asyncio, polling sockets and waiting for timeouts.
All your coroutines should return as fast as possible and definitely shouldn't contain blocking sleep - if you call it, it blocks the whole interpreter and prevents main loop from getting information from sockets or running coroutines in response to data, arriving to those sockets.
So, instead you should await asyncio.sleep() which is essentially equivalent to Javascript's setTimeout() - it just tells the main loop that in certain time it should wake this coroutine up and continue running it.
Suggested reading:
https://snarky.ca/how-the-heck-does-async-await-work-in-python-3-5/
https://docs.python.org/3/library/asyncio.html
It's most likely a bad practice, as time.sleep() will block everything, while you only want to block the specific coroutine (i guess).
you are making a sync operation in async world.
What about the following pattern?
async def download():
downloader = DataManager()
downloader.start_multiple_docker_containers()
while True:
downloader.check_containers_statuses()
await syncio.sleep(N)
I'm new at asyncio, but it seems that if you run sync code like this
f = app.loop.run_in_executor(None, your_sync_function, app,param1,param2,...)
then your_sync_function is running in a separate thread, and you can do time.sleep() without disturbing the asyncio loop. It blocks the loop executor's thread, but not the asyncio thread. At least, this is what it seems to do.
If you want to send messages from your_sync_function back to asyncio's loop, look into the janus library
More tips on this:
https://carlosmaniero.github.io/asyncio-handle-blocking-functions.html
Related
I was thinking to use multiprocess package to run a function in parallel but I need to pass a different value of parameter every run (every 1 sec).
e.g.)
def foo(list):
while True:
<do something with list>
sleep(1000)
def main():
process = multiprocess.Process(target=foo, args=(lst))
process.start()
<keep updating lst>
This will cause a foo() function running with the same parameter value over and over. How can I work around in this scenario?
Armed with the knowledge of what you're actually trying to do, i.e.
The foo function does an http post call to save some logs (batch) to the storage. The main function is getting text logs (save log to the batch) while running a given shell script. Basically, I'm trying to do batching for logging.
the answer is to use a thread and a queue for message passing (multiprocessing.Process and multiprocessing.Queue would also work, but aren't really necessary):
import threading
import time
from queue import Queue
def send_batch(batch):
print("Sending", batch)
def save_worker(queue: Queue):
while True:
batch = queue.get()
if batch is None: # stop signal
break
send_batch(batch)
queue.task_done()
def main():
batch_queue = Queue()
save_thread = threading.Thread(target=save_worker, args=(batch_queue,))
save_thread.start()
log_batch = []
for x in range(42): # pretend this is the shell script outputting things
message = f"Message {x}"
print(message)
log_batch.append(message)
time.sleep(0.1)
if len(log_batch) >= 7: # could also look at wallclock
batch_queue.put(log_batch)
log_batch = []
if log_batch:
batch_queue.put(log_batch) # send the last batch
print("Script stopped, waiting for worker to finish")
batch_queue.put(None) # stop signal
save_thread.join()
if __name__ == "__main__":
main()
import threading
import time
def run_every_second(param):
# your function code here
print(param)
# create a list of parameters to pass to the function
params = [1, 2, 3, 4]
# create and start a thread for each parameter
for param in params:
t = threading.Thread(target=run_every_second, args=(param,))
t.start()
time.sleep(1)
# wait for all threads to complete
for t in threads:
t.join()
This will create a new thread for each parameter, and each thread will run the run_every_second function with the corresponding parameter. The threads will run concurrently, so the functions will be executed in parallel. There will be a 1-second time lapse between the start of each thread.
I want to execute a task after certain time, so I have tried a countdown timer with a condition of being finished (when countdown variable = 0, the task is performed). The thing is that I don't want to stop the execution of the main program while performing the countdown. I have tried this:
import time
def countdown(num_of_secs):
while(num_of_secs):
time.sleep(1)
num_of_secs -= 1
return num_of_secs
So, I run my code setting a number of seconds to the countdown, and when this countdown reaches the 0 value, a task must be executed. Using this code (it uses a while), when I call my function "countdown" it stops the execution of the main program, so it is the same as a big time.sleep. I want to carry out this countdown in the background, without stopping other actions until the countdown finishes and the task starts.
Thank you
Another alternative is by using threading.
I've got a simple example here with 2 Threads where the working thread is waiting for the countdown thread to finish and starting. The Main is still working fine.
import threading
import time
def do_something():
countdown_thread.join()
print("Starting Task")
time.sleep(3)
print("Finished Task")
def countdown(num_of_secs):
while(num_of_secs):
time.sleep(1)
num_of_secs -= 1
print(num_of_secs)
if __name__ == '__main__':
countdown_thread = threading.Thread(target=countdown, args=(3,))
work_thread = threading.Thread(target=do_something)
countdown_thread.start()
work_thread.start()
while True:
print("Main doing something")
time.sleep(1)
Example picture for multithreading: Sequential vs Threading
Usually python only has a single program flow, so every instruction needs to complete before the next one can get executed.
For your case you need asynchronicity, with e.g. asyncio.sleep(5) as a separate task in the same event loop.
import asyncio
async def sleeper():
print('Holding...')
await asyncio.sleep(5)
print('Doing Work!')
async def work():
print('Doing work')
print('while')
print('the other guy is sleeping')
async def main():
await asyncio.gather(sleeper(), work())
asyncio.run(main())
The most common and easiest way to implement this would be with a Timer object from the threading library. It would go as follows:
import threading
import time
i = 0
done = False
def show_results():
print("results from GPIO readings")
print("=)")
global done
done = True # signal end of while loop
def read_GPIO():
print("reading GPIO...")
t = threading.Timer(60, show_results) # task will trigger after 60 seconds
t.start()
# your while loop would go here
read_GPIO() # do work
while not done:
print("waiting", i) # doing work while waiting for timer
time.sleep(1)
i += 1
pass
Notice that the time library is used only for illustrative purposes. You could also start the timer recursively to check periodically GPIOs and print results or trigger an event. For more information on the threading library or the Timer object check the docs
I am using Azure functions to run a Python script that launches multiple threads (for performance reasons). Everything is working as expected, except for the fact that only the info logs from the main() thread appear on the Azure Functions log.
All the logs that I am using in the "secondary" threads that I start in main() do not appear in the Azure Functions logs.
Is there a way to ensure that the logs from the secondary threads show on the Azure Functions log?
The modules that I am using are "logging" and "threading".
I am using Python 3.6; I have already tried to lower the logging level in the secondary threads, but this did not help unfortunately.
The various secondary thread functions are in different modules.
My function has a structure similar to the following pseudo-code:
def main()->None:
logging.basicConfig(level=logging.INFO)
logging.info("Starting the process...")
thread1 = threading.Thread(target=foo,args=("one arg",))
thread2 = threading.Thread(target=foo,args=("another arg",))
thread3 = threading.Thread(target=foo,args=("yet another arg",))
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
return
# in another module
def foo(st:str)->None:
logging.basicConfig(level=logging.INFO)
logging.info(f"Starting thread for arg {st}")
The current Azure log output is:
INFO: Starting the process...
INFO: "All threads started successfully!"
I would like it to be something like:
INFO: Starting the process...
INFO: Starting thread for arg one arg
INFO: Starting thread for arg another arg
INFO: Starting thread for arg yet another arg
INFO: All threads started successfully!
(of course the order of the secondary threads could be anything)
Azure functions Python worker framework sets AsyncLoggingHandler as a handler to the root logger. From this handler to its destination it seems logs are filtered along the path by an invocation_id.
An invocation_id is set if the framework starts threads itself, as it does for the main sync function. On the other hand if we start threads ourselves from the main function, we must set the invocation_id in the started thread for the logs to reach its destination.
This azure_functions_worker.dispatcher.get_current_invocation_id function checks if the current thread has a running event loop. If no running loop is found, it just checks azure_functions_worker.dispatcher._invocation_id_local, which is thread local storage, for an attribute named v for the value of invocation_id.
Because the threads we start doesn't have a running event loop, we have to get invocation_id from the context and set it on azure_functions_worker.dispatcher._invocation_id_local.v in every thread we start.
The invocation_id is made available by the framework in context parameter of main function.
Tested it on Ubuntu 18.04, azure-functions-core-tools-4 and Python 3.8.
import sys
import azure.functions as func
import logging
import threading
# import thread local storage
from azure_functions_worker.dispatcher import (
_invocation_id_local as tls,
)
def main(req: func.HttpRequest, context: func.Context) -> func.HttpResponse:
logging.info("Starting the process...")
thread1 = threading.Thread(
target=foo,
args=(
context,
"one arg",
),
)
thread2 = threading.Thread(
target=foo,
args=(
context,
"another arg",
),
)
thread3 = threading.Thread(
target=foo,
args=(
context,
"yet another arg",
),
)
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
name = req.params.get("name")
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get("name")
if name:
return func.HttpResponse(
f"Hello, {name}. This HTTP triggered function executed successfully."
)
else:
return func.HttpResponse(
"This HTTP triggered function executed successfully. Pass a name in the query string or in the request body for a personalized response.",
status_code=200,
)
# in another module
def foo(context, st: str) -> None:
# invocation_id_local = sys.modules[
# "azure_functions_worker.dispatcher"
# ]._invocation_id_local
# invocation_id_local.v = context.invocation_id
tls.v = context.invocation_id
logging.info(f"Starting thread for arg {st}")
https://github.com/Azure/azure-functions-python-worker/blob/81b84102dc14b7d209ad7e00be68f25c37987c1e/azure_functions_worker/dispatcher.py
This must be something in your Azure setup: in a non-Azure setup, it works as expected. You should add join() calls for your threads. And basicConfig() should be called only once, from a main entry point.
Are your threads I/O bound? Due to the GIL, having multiple compute-bound threads doesn't give your code any performance advantages. It might be better to structure your code around concurrent.futures.ProcessPoolExecutor or multiprocessing.
Here is a Repl which shows a slightly modified version of your code working as expected.
I may be wrong but I suspect azure to run your main function in a daemon thread.
Quoting https://docs.python.org/3/library/threading.html: The entire Python program exits when no alive non-daemon threads are left.
When not setting daemon in the Thread constructor, it reuses the value of the father thread.
You can check this is your issue by printing thread1.daemon before starting your childs threads.
Anyway, I can reproduce the issue on my pc writing (without any Azure, just plain python3):
def main():
logging.basicConfig(level=logging.INFO)
logging.info("Starting the process...")
thread1 = threading.Thread(target=foo,args=("one arg",),daemon=True)
thread2 = threading.Thread(target=foo,args=("another arg",),daemon=True)
thread3 = threading.Thread(target=foo,args=("yet another arg",),daemon=True)
thread1.start()
thread2.start()
thread3.start()
logging.info("All threads started successfully!")
return
def foo(st):
for i in range(2000): # Giving a bit a of time for race condition to happen
print ('tamere', file = open('/dev/null','w'))
logging.basicConfig(level=logging.INFO)
logging.info(f"Starting thread for arg {st}")
main()
If I force daemon to False / leave it undefined, it work. Thus I guess your issue is that azure start your main function in a daemon thread, and since you don't override daemon flag to False, the whole process exit instantly.
PD: I know nothing about Azure, there is a possibility that you are indeed trying to do something the wrong way and there is another interface to do exactly what you want but in the way Azure expect you to. So this answer is potentially just an explanation of what happens rather than real guidance.
Azure functions is an async environment.
If you define an async def, it'll be run with asyncio.
Otherwise it'll be run with concurrent.futures.ThreadPoolExecutor.
It's better to define your functions async.
Threading works. You don't need to start threads manually. Thread pool executes your blocking code. You have to make it work for you.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#python_threadpool_thread_count
i found a similar problem:
(Instance variables not being updated Python when using Multiprocessing),
but still do not know the solutionn for my task.
The task is to stop a scapy sniff function after the completness of a testskript. the running duration of single testscripts can vary greatly (from some seconds till hours). My sniff function runs in a separate threat. The testscript calls an init Funktion in the beginning which calls the sniff Function from an other modul.
#classmethod
def SaveFullTrafficPcap(self, TestCase, Termination):
try:
Full_Traffic = []
PktList = []
FullPcapName = Settings['GeneralSettings']['ResultsPath']+TestCase.TestCaseName +"Full_Traffic_PCAP.pcap"
#while Term.Termination < 1:
Full_Traffic = sniff(lfilter=None, iface=str(Settings['GeneralSettings']['EthInterface']), store=True, prn = lambda x: Full_Traffic.append(x), count=0, timeout=Term.Termination)
print(Full_Traffic)
wrpcap(FullPcapName, Full_Traffic)
except(Exception):
SYS.ABS_print("No full traffic PCAP file wirtten!\n")
At the end of the testscript an exit function is called. In the exit function I set Term.Termination parameter to 1 and wait for 5 sec, but it doesnt work. The sniff function is stoped by the system and i get no file"FullPCAPName"
If count or timeout get a value, the code works without problemms and i get my FullPCAPName file with he complet traffic on my Interface.
Have anybody hinds how i can stopt the sniff function regulary after finisching the testscript?
Use of the stop_filter command as specified here worked for me. I've duplicated HenningCash's code below for convenience:
import time, threading
from scapy.all import sniff
e = threading.Event()
def _sniff(e):
a = sniff(filter="tcp port 80", stop_filter=lambda p: e.is_set())
print("Stopped after %i packets" % len(a))
print("Start capturing thread")
t = threading.Thread(target=_sniff, args=(e,))
t.start()
time.sleep(3)
print("Try to shutdown capturing...")
e.set()
# This will run until you send a HTTP request somewhere
# There is no way to exit clean if no package is received
while True:
t.join(2)
if t.is_alive():
print("Thread is still running...")
else:
break
print("Shutdown complete!")
However, you still have to wait for a final packet to be sniffed, which might not be ideal in your scenario.
now i solved the problem with global variables. It is not nice, but it works well.
Nevertheless I am interested in a better solution for the variable sniff stop.
stop_var = ""
def stop():
global stop_var
stop_var.stop()
def start():
"""
your code
"""
global stop_var
stop_var = AsyncSniffer(**arg)
stop_var=start()
I am trying to build a application that will run a bash script every 10 minutes. I am using apscheduler to accomplish this and when i run my code from terminal it works like clock work. However when i try to run the code from another module it crashes i suspect that the calling module is waiting for the "schedule" module to finish and then crash when that never happens.
Error code
/bin/bash: line 1: 13613 Killed ( python ) < /tmp/vIZsEfp/26
shell returned 137
Function that calls schedule
def shedual_toggled(self,widget):
prosessSchedular.start_background_checker()
Schedule Program
def schedul_check():
"""set up to call prosess checker every 10 mins"""
print "%s check ran" %(counter)
counter =+ 1
app = prosessCheckerv3.call_bash() < calls the bash file
if app == False:
print "error with bash"
return False
else:
prosessCheckerv3.build_snap_shot(app)
def start_background_checker():
scheduler = BackgroundScheduler()
scheduler.add_job(schedul_check, 'interval', minutes=10)
scheduler.start()
while True:
time.sleep(2)
if __name__ == '__main__':
start_background_checker()
this program simply calls another ever 10 mins. As a side note i have been trying to stay as far away from multi-threading as possible but if that is required so be it.
Well I managed to figure it out my self. The issue that GTK+ is not thread safe so the timed module need to be either be ran in another thread or else you can realise/enter the thread before/after calling the module.
I just did it like this.
def shedual_toggeld(self,widget):
onOffSwitch = widget.get_active()
""" After main GTK has logicly finished all GUI work run thread on toggel button """
thread = threading.Thread(target=self.call_schedual, args=(onOffSwitch,))
thread.daemon = True
thread.start()
def call_schedual(self, onOffSwitch):
if onOffSwitch == True:
self.sch.start_background_checker()
else:
self.sch.stop_background_checker()
This article goes through it in more detail. Hopefully some one else will find this useful.
http://blogs.operationaldynamics.com/andrew/software/gnome-desktop/gtk-thread-awareness