Interrupt and restart write operation - python

I am currently running a piece of Python code on a cluster. Part of the rules enforced upon me by slurm are that there is a timelimit on the wallclock run time of my code. This isn't really a problem most times as I can simply checkpoint my code using pickle and then restart it.
At the end of the code I, however, need to write out all my data (I can't write until all calculations have been finished) which can take some time as very large pieces of data can be gathered.
My problem is now that in some cases the code gets terminated by slurm because it exceeded its run time allowance.
Is there some way of interrupting a write operation, stopping the code and then restarting where I left off?

Assuming you put your data in a list or tuple.
perhaps a generator function?
#Create generator function
def Generator():
data=['line1','line2','line3','line4']:
for i in data:
yield i
output=Generator() #reference it
.......
......
if [time conditions is true]:
file-open("myfile","a")
file.write(str(next(output))
else:
[Do something]
You can also use try capturing the exception and restart your main function
try:
MainFunction() #main function with generator next calls
except [you error Error]:
MainFunction() #restart main function

Related

thread.join() does not work when target function uses subprocess.run()

I have a function that runs another function in the background as a command line using subprocess.run(). When this function runs generates as many processes as I gave as input. For instance:
# Importing libraries
import subprocess
import threading
# Function with background function inside
def func(input_value, processes):
# Running background function as many times as processes I input
for i in range(processes)
subprocess.run("nohup python3 func_background " + input_value + "&\n", shell= True)
So if I specify 50, I will run 50 processes in the background with the same input value. The problem is that now I am trying to run different input_values with many processes each of them. Of course If I run all input values at the same time I will max up the CPU, so I was trying to create one thread for each input value and run one by one waiting for each new thread to finish until I run the next (using thread.join()). Like this:
# Defining in input values
list_input_values = [input_value_1, input_value_2]
# Looping over input values creating one thread with each of them
for i in list_input_values:
# Creating thread
th = threading.Thread(target=func, args=(i,4))
# Starting thread an trying to make the loop wait to start a new one
th.deamon = True
th.start()
th.join()
This works, but it runs both threads instead of one by one and I don't know if its because of the loop or because the target function that I specify finishs but the subprocess.run() inside no. Is there any way to solve this by waiting for the background function to finish or am I doing something wrong?
Thank you in advance
This is a comment, not an answer. I can't include code in a comment. I asked, why create threads if you aren't going to let the threads run concurrently?
You said;
I was trying to find a way to don't let the loop go to the next iteration until the first one is finished, beciuase the first itration will use all the possible cores and then when they finish pass to the next iteration.
OK, So why don't you just do this?
list_input_values = [input_value_1, input_value_2]
for i in list_input_values:
func(i,4)

Best way of run more than 1 function at the same time but if one crashes, continue running all the others and be able to fix the function that crashed

I'm creating a script that scrapes data from sites. I have at least 10 sites to scrape. Each site is one .ipynb file (Then I transform them to .py to execute). It could happen that one site changes so the scraping code would need to be changed.
I have the following
def ex_scrape_site1():
%run "scrape\\scrape_site1.py"
def ex_scrape_site2():
%run "scrape\\scrape_site2.py"
def ex_scrape_site3():
%run "scrape\\scrape_site3.py"
.
.
.
(10 so far)
I'm currently using a list with all functions and then I'm doing a for loop on the list to generate a thread per each function. Like this:
funcs = [ex_scrape_site1, ex_scrape_site2, ex_scrape_site3]
Then, I'm executing them doing the following:
while True:
threads = []
for func in funcs:
threads.append(Thread(target = func))
[thread.start() for thread in threads] # start threads
[thread.join() for thread in threads] # Wait for all to complete
So here it's executing all the functions in paralell which is OK. However, if 1 crashes I have to stop everything and fix the error.
Is there a way to:
When something happens in one of the scraping functions, I want to be able to amend the broken function, but continue running all the others.
Since I'm using a 'join()' I have to wait until all the scrapes finish, then it'll iterate again. How could I iterate over each function individually, do not wait until all of them finish, then start the process again?
I though of use Airflow, do you think this could make sense to implement?

How to make list of twisted deferred operations truly async

I am new to using Twisted library, I want to make a list of operations async. Take example of the following pseudo code:
#defer.inlineCallbacks
def getDataAsync(host):
data = yield AsyncHttpAPI(host) # some asyc api which returns deferred
return data
#defer.inlineCallbacks
def funcPrintData():
hosts = []; # some list of hosts, say 1000 in number
for host in hosts:
data = yield getDataAsync(host)
# why doesn't the following line get printed as soon as first result is available
# it waits for all getDataAsync to be queued before calling the callback and so print data
print(data)
Please comment if the question is not clear. Is there a better way of doing this? Should I instead be using the DeferredList ?
The line:
data = yield getDataAsync(host)
means "stop running this function until the getDataAsync(host) operation has completed. If the function stops running, the for loop can't get to any subsequent iterations so those operations can't even begin until after the first getDataAsync(host) has completed. If you want to run everything concurrently then you need to not stop running the function until all of the operations have started. For example:
ops = []
for host in hosts:
ops.append(getDataAsync(host))
After this runs, all of the operations will have started regardless of whether or not any have finished.
What you do with ops depends on whether you want results in the same order as hosts or if you want them all at once when they're all ready or if you want them one at a time in the order the operations succeed.
DeferredList is for getting them all at once when they're all ready as a list in the same order as the input list (ops):
datas = yield DeferredList(ops)
If you want to process each result as it becomes available, it's easier to use addCallback:
ops = []
for host in hosts:
ops.append(getDataAsync(host).addCallback(print))
This still doesn't yield so the whole group of operations are started. However, the callback on each operation runs as soon as that operation has a result. You're still left with a list of Deferred instances in ops which you can still use to wait for all of the results to finish if you want or attach overall error handling to (at least one of those is a good idea otherwise you have dangling operations that you can't easily account for in callers of funcPrintDat).

How to run a python script multiple times simultaneously using python and terminate all when one has finished

Maybe it's a very simple question, but I'm new in concurrency. I want to do a python script to run foo.py 10 times simultaneously with a time limit of 60 sec before automatically abort. The script is a non deterministic algorithm, hence all executions takes different times and one will be finished before the others. Once the first ends, I would like to save the execution time, the output of the algorithm and after that kill the rest of the processes.
I have seen this question run multiple instances of python script simultaneously and it looks very similar, but how can I add time limit and the possibility of when the first one finishes the execution, kills the rest of processes?
Thank you in advance.
I'd suggest using the threading lib, because with it you can set threads to daemon threads so that if the main thread exits for whatever reason the other threads are killed. Here's a small example:
#Import the libs...
import threading, time
#Global variables... (List of results.)
results=[]
#The subprocess you want to run several times simultaneously...
def run():
#We declare results as a global variable.
global results
#Do stuff...
results.append("Hello World! These are my results!")
n=int(input("Welcome user, how much times should I execute run()? "))
#We run the thread n times.
for _ in range(n):
#Define the thread.
t=threading.Thread(target=run)
#Set the thread to daemon, this means that if the main process exits the threads will be killed.
t.setDaemon(True)
#Start the thread.
t.start()
#Once the threads have started we can execute tha main code.
#We set a timer...
startTime=time.time()
while True:
#If the timer reaches 60 s we exit from the program.
if time.time()-startTime>=60:
print("[ERROR] The script took too long to run!")
exit()
#Do stuff on your main thread, if the stuff is complete you can break from the while loop as well.
results.append("Main result.")
break
#When we break from the while loop we print the output.
print("Here are the results: ")
for i in results:
print(f"-{i}")
This example should solve your problem, but if you wanted to use blocking commands on the main thread the timer would fail, so you'd need to tweak this code a bit. If you wanted to do that move the code from the main thread's loop to a new function (for example def main(): and execute the rest of the threads from a primary thread on main. This example may help you:
def run():
pass
#Secondary "main" thread.
def main():
#Start the rest of the threads ( in this case I just start 1).
localT=threading.Thread(target=run)
localT.setDaemon(True)
localT.start()
#Do stuff.
pass
#Actual main thread...
t=threading.Thread(target=main)
t.setDaemon(True)
t.start()
#Set up a timer and fetch the results you need with a global list or any other method...
pass
Now, you should avoid global variables at all costs as sometimes they may be a bit buggy, but for some reason the threading lib doesn't allow you to return values from threads, at least i don't know any methods. I think there are other multi-processing libs out there that do let you return values, but I don't know anything about them so I can't explain you anything. Anyways, I hope that this works for you.
-Update: Ok, I was busy writing the code and I didn't read the comments in the post, sorry. You can still use this method but instead of writing code inside the threads, execute another script. You could either import it as a module or actually run it as a script, here's a question that may help you with that:
How to run one python file in another file?

How to make a python script do something before exiting

I have a python script that collect data from a database every minutes by timestamp.
Every minutes this script collect data from a given table in DB by that match the current time with a dely of 1 minutes:
For example at ' 216-04-12 14:53 ' the script will look for data
that match ' 216-04-12 14:52 ' in the DB and so on...
Now I want the script to save the last timestamp it collected from the data base before exiting and that for any type of exit (keyboard interrupt, system errors, data base points of failure etc.)
Is there a simple way to do what I want knowing that I can't modify the dataBase
Python's atexit module may help you here. You can import atexit and register functions to run when the program exits.
See this post, namely:
import atexit
def exit_handler():
print 'My application is ending!'
atexit.register(exit_handler)
That will work in most exit situations.
Another, more robust answer from that same thread:
def main():
try:
execute_app() # Stuff that happens before program exits
finally:
handle_cleanup() # Stuff that happens when program exits
if __name__=='__main__':
main()
The above is a bit more reliable...
The MOST reliable way would be to write your output every minute, each time overwriting the previous output. That way no matter whether your exit cleanup fires or not, you still have your data from the minute the program exited.
You could use atexit.register() from module atexit to register cleanup functions. If you register functions f, g, h in that order. At program exit these will be executed in the reverse order, h, g, f.
But one thing to note: These functions will be invoked upon normal program exit. Normal means exits handled by python. So won't work in weird cases.

Categories