Let's say this is the primary file I would run in terminal i.e locusts -f main.py. Is it possible to have it also run code after the locust instance is terminated or when time limit is reached? Possibly a cleanup script or sending the csv reports generated somewhere.
class setup(HttpUser):
wait_time = between(3, 5)
host = 'www.example.com'
tasks = [a, b, c...]
#do something after time limit reached
There is a test_stop event (https://docs.locust.io/en/stable/extending-locust.html) as well as a quitting event (https://docs.locust.io/en/stable/api.html#locust.event.Events.quitting) you can use for this purpose.
Related
I am using Slurm job manager on an HPC cluster. Sometimes there are situations, when a job is canceled due to time limit and I would like to finish my program gracefully.
As far as I understand, the process of cancellation occurs in two stages exactly for a software developer to be able to finish the program gracefully:
srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** JOB 18522559 ON ncm0317 CANCELLED AT 2020-12-14T19:42:43 DUE TO TIME LIMIT ***
You can see that I am given 62 seconds to finish the job the way I want it to finish (by saving some files, etc.).
Question: how to do this? I understand that first some Unix signal is sent to my job and I need to respond to it correctly. However, I cannot find in the Slurm documentation any information on what this signal is. Besides, I do not exactly how to handle it in Python, probably, through exception handling.
In Slurm, you can decide which signal is sent at which moment before your job hits the time limit.
From the sbatch man page:
--signal=[[R][B]:]<sig_num>[#<sig_time>]
When a job is within sig_time seconds of its end time, send it the signal sig_num.
So set
#SBATCH --signal=B:TERM#05:00
to get Slurm to signal the job with SIGTERM 5 minutes before the allocation ends. Note that depending on how you start your job, you might need to remove the B: part.
In your Python script, use the signal package. You need to define a "signal handler", a function that will be called when the signal is receive, and "register" that function for a specific signal. As that function is disrupting the normal flow when called , you need to keep it short and simple to avoid unwanted side effects, especially with multithreaded code.
A typical scheme in a Slurm environment is to have a script skeleton like this:
#! /bin/env python
import signal, os, sys
# Global Boolean variable that indicates that a signal has been received
interrupted = False
# Global Boolean variable that indicates then natural end of the computations
converged = False
# Definition of the signal handler. All it does is flip the 'interrupted' variable
def signal_handler(signum, frame):
global interrupted
interrupted = True
# Register the signal handler
signal.signal(signal.SIGTERM, signal_handler)
try:
# Try to recover a state file with the relevant variables stored
# from previous stop if any
with open('state', 'r') as file:
vars = file.read()
except:
# Otherwise bootstrap (start from scratch)
vars = init_computation()
while not interrupted and not converged:
do_computation_iteration()
# Save current state
if interrupted:
with open('state', 'w') as file:
file.write(vars)
sys.exit(99)
sys.exit(0)
This first tries to restart computations left by a previous run of the job, and otherwise bootstraps it. If it was interrupted, it lets the current loop iteration finish properly, and then saves the needed variables to disk. It then exits with the 99 return code. This allows, if Slurm is configured for it, to requeue the job automatically for further iteration.
If slurm is not configured for it, you can do it manually in the submission script like this:
python myscript.py || scontrol requeue $SLURM_JOB_ID
In most programming languages, Unix signals are captured using a callback. Python is no exception. To catch Unix signals using Python, just use the signal package.
For example, to gracefully exit:
import signal, sys
def terminate_signal(signalnum, handler):
print ('Terminate the process')
# save results, whatever...
sys.exit()
# initialize signal with a callback
signal.signal(signal.SIGTERM, terminate_signal)
while True:
pass # work
List of possible signals. SIGTERM is the one used to "politely ask a program to terminate".
I'm quite new at python and for a while I try to fight specific problem. I have function to listen and print radio frames.To do that I'm using NRF24 Lib and whole function is so easy. The point is that I run this function and from time to time I need to terminate it and again run. So in code it looks like
def recv():
radio.openWritingPipe(pipes[0])
radio.openReadingPipe(1, pipes[1])
radio.startListening()
radio.stopListening()
radio.printDetails()
radio.startListening()
while True:
pipe = [0]
while not radio.available(pipe):
time.sleep(10000/1000000.0)
recv_buffer = []
radio.read(recv_buffer)
print(recv_buffer)
I run this function from a server side and now I want to stop it and run again? There is it posible ? why I just cant recv.kill()? I read about threading, multiprocessing but all this didn't give me proper result.
How I run it:
from multiprocessing import Process
def request_handler(api: str, arg: dict) -> dict:
process_radio = Process(target=recv())
if api == 'start_radio':
process_radio.start()
...
elif api == 'stop_radio':
process_radio.terminate():
...
...
There is no way to stop a Python thread "from the outside." If the thread goes into a wait state (e.g. not running because it's waiting for radio.recv() to complete) there's nothing you can do.
Inside a single process the threads are autonomous, and the best you can do it so set a flag for the thread to action (by terminating) when it examines it.
As you have already discovered, it appears, you can terminate a subprocess, but you then have the issue of how the processes communicate with each other.
Your code and the test with it don't really give enough information (there appear to be several NRF24 implementations in Python) to debug the issues you report.
I have a python script that collect data from a database every minutes by timestamp.
Every minutes this script collect data from a given table in DB by that match the current time with a dely of 1 minutes:
For example at ' 216-04-12 14:53 ' the script will look for data
that match ' 216-04-12 14:52 ' in the DB and so on...
Now I want the script to save the last timestamp it collected from the data base before exiting and that for any type of exit (keyboard interrupt, system errors, data base points of failure etc.)
Is there a simple way to do what I want knowing that I can't modify the dataBase
Python's atexit module may help you here. You can import atexit and register functions to run when the program exits.
See this post, namely:
import atexit
def exit_handler():
print 'My application is ending!'
atexit.register(exit_handler)
That will work in most exit situations.
Another, more robust answer from that same thread:
def main():
try:
execute_app() # Stuff that happens before program exits
finally:
handle_cleanup() # Stuff that happens when program exits
if __name__=='__main__':
main()
The above is a bit more reliable...
The MOST reliable way would be to write your output every minute, each time overwriting the previous output. That way no matter whether your exit cleanup fires or not, you still have your data from the minute the program exited.
You could use atexit.register() from module atexit to register cleanup functions. If you register functions f, g, h in that order. At program exit these will be executed in the reverse order, h, g, f.
But one thing to note: These functions will be invoked upon normal program exit. Normal means exits handled by python. So won't work in weird cases.
Hi I wrote a Python program that should run unattended. What it basically does is fetching some data via http get requests in a couple of threads and fetching data via websockets and the autobahn framework. Running it for 2 days shows me that it has a growing memory demand and even stops without any notice.
The documentation says I have to run the reactor as last line of code in the app.
I read that yappi is capable of profiling threaded applications
Here is some pseudo code
from autobahn.twisted.websocket import WebSocketClientFactory,connectWS
if __name__ == "__main__":
#setting up a thread
#start the thread
Consumer.start()
xfactory = WebSocketClientFactory("wss://url")
cex_factory.protocol = socket
## SSL client context: default
##
if factory.isSecure:
contextFactory = ssl.ClientContextFactory()
else:
contextFactory = None
connectWS(xfactory, contextFactory)
reactor.run()
The example from the yappi project site is the following:
import yappi
def a():
for i in range(10000000): pass
yappi.start()
a()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
So I could put yappi.start() at the beginning and yappi.get_func_stats().print_all() plus yappi.get_thread_stats().print_all() after reactor.run() but since this code is never executed I will never get it executed.
So how do I profile a program like that ?
Regards
It's possible to use twistd profilers by the following way:
twistd -n --profile=profiling_results.txt --savestats --profiler=hotshot your_app
hotshot is a default profiler, you are also able to use cprofile.
Or you can run twistd from your python script by means of:
from twistd.scripts import run
run()
And add necessary parameters to script by sys.argv[1:1] = ["--profile=profiling_results.txt", ...]
After all you can convert hotshot format to calltree by means of:
hot2shot2calltree profiling_results.txt > calltree_profiling
And open generated calltree_profiling file:
kcachegrind calltree_profiling
There is a project for profiling of asynchronous execution time twisted-theseus
You can also try tool of pycharm: thread concurrency
There is a related question here sof
You can also run your function by:
reactor.callWhenRunning(your_function, *parameters_list)
Or by reactor.addSystemEventTrigger() with event description and your profiling function call.
Using python 2.7 on linux/solaris, I need to spawn several OS commands in parallel; if a thread/command takes longer than a provided timeout, it kills the command. I've been trying to do this with threads but having trouble getting the return codes.
The command I am running is a simple dd command to an NFS-mounted filesystem. If this dd command times out, I need to grab the name of the mount point that failed and store this for later. I don't necessarily need the return code form the dd command but would like to collect the output. (my next project is a latency tool that will need the output from dd)
Here is my pseudo code:
<spawn several threads with each one running an OS command>
<after my timeout window, loop thru any remaining active threads and terminate them, or terminate the os command that thread is running>
<gather a list of threads (actually the mountpoint that I provided the thread), that timed out and save for later>
Am I going about this right or is there a more appropriate approach.
Much thx for any help.
Sure. Something like this:
task_inputs = [
'input1',
'input2',
]
class Task(object):
def __init__(self, task_input):
self.task_input = task_input
def __call__(self):
"""Do your thing..."""
task_threads = []
for task_input in task_inputs:
task = Task(task_input)
thr = Thread(target=task)
thr.start()
task_threads.append((thr, task))
# wait for timeout
naughty_tasks = []
for thr, task in task_threads:
if thr.is_alive():
# still running!
"""Do your termination thing..."""
naughty_tasks.append(task)
# now you have a list of Task object with their original input ( naughty_tasks[i].task_input )