How to loop through different config files one at a time? - python

I want to train a ML model with different configurations. I have a few config files in the config folder and hope to keep testing them one after another so that I don't have to wait for each training to finish and manually run train.py each time.
I thought I could just use a for loop like this:
from train import train_on_config
configs = ['config1.cfg', 'config2.cfg', 'config3.cfg']
for config in configs:
train_on_config(config)
Since each run of train_on_config(str: config) would take hours, my concern is that if the for loop is going to wait for each run to finish or will it iterate through configs right away and have all 3 runs of train_on_config at the same time, which in my case I would like to avoid.
I am aware of cron for Linux, but that seems to only to schedule regular occurrances, not one after another for different configs...
Overall, I just want to make sure the for loop will run train_on_config(str: config) one at a time.

Code is executed sequentially. Each call will execute after the previous one finishes.

Related

Queue manager for experiments in neural network training

I am conducting experiments with neural networks using Keras + TensorFlow backend. I do this using GPU on my PC, running Windows 7.
My workflow looks like the following.
I create a small python script that defines a model then runs model.fit_generator with ~50 epochs and early stopping, if validation accuracy does not improve after 10-15 epochs. Then I run a terminal with a command like python model_v3_4_5.py
Usually one epoch takes about 1.5 hours. During this period some new ideas (training parameters or new architecture) come into my head.
Then I create a new python script...
During experiments I've found that it is better not to train several models in parallel. I've experienced doubling of epoch time and strange decrease of validation accuracy.
Therefore, I'd like to wait until the first training finishes then run the second one. Simultaneously, I'd like to avoid idling of my PC and run a new training immediately after the first one has finished.
But I don't know exactly when the first training finishes, therefore, running commands like timeout <50 hours> && python model_v3_4_6.py would be a dumb solution.
Then I need some kind of a queue manager.
One solution that have come to my mind is installing Jenkins slave on my PC and use queues that Jenkins provides. As far as I remember, Jenkins has issues with GPU access.
Another variant - training models in the Jupyter notebook in separate cells. However, I cannot see queue of cell execution here. And this is a topic, being discussed.
Update. Next variant. Add to the model scripts some code, retrieving current GPU state (does it run NN currently?) and wait if it is calculating. This will produce issues in case of several scripts (more than one bright new idea :) ) waiting for GPU to idle.
Are there any other variants?
Finally, I've come up to the simple cmd script
set PYTHONPATH=%CD%
:start
for %%m in (train_queue\model*.py) do (
python %%m
del %%m
)
timeout 5
goto start
One creates a subdirectory train_queue and puts scripts with models in it. All scripts log their output to files, whose names contain timestamps.
This script also calls timeout program

Persist Completed Pipeline in Luigi Visualiser

I'm starting to port a nightly data pipeline from a visual ETL tool to Luigi, and I really enjoy that there is a visualiser to see the status of jobs. However, I've noticed that a few minutes after the last job (named MasterEnd) completes, all of the nodes disappear from the graph except for MasterEnd. This is a little inconvenient, as I'd like to see that everything is complete for the day/past days.
Further, if in the visualiser I go directly to the last job's URL, it can't find any history that it ran: Couldn't find task MasterEnd(date=2015-09-17, base_url=http://aws.east.com/, log_dir=/home/ubuntu/logs/). I have verified that it ran successfully this morning.
One thing to note is that I have a cron that runs this pipeline every 15 minutes to check for a file on S3. If it exists, it runs, otherwise it stops. I'm not sure if that is causing the removal of tasks from the visualiser or not. I've noticed it generates a new PID every run, but I couldn't find a way to persist one PID/day in the docs.
So, my questions: Is it possible to persist the completed graph for the current day in the visualiser? And is there a way to see what has happened in the past?
Appreciate all the help
I'm not 100% positive if this is correct, but this is what I would try first. When you call luigi.run, pass it --scheduler-remove-delay. I'm guessing this is how long the scheduler waits before forgetting a task after all of its dependents have completed. If you look through luigi's source, the default is 600 seconds. For example:
luigi.run(["--workers", "8", "--scheduler-remove-delay","86400")], main_task_cls=task_name)
If you configure the remove_delay setting in your luigi.cfg then it will keep the tasks around for longer.
[scheduler]
record_task_history = True
state_path = /x/s/hadoop/luigi/var/luigi-state.pickle
remove_delay = 86400
Note, there is a typo in the documentation ("remove-delay" instead of remove_delay") which is being fixed under https://github.com/spotify/luigi/issues/2133

running 2 python scripts without them effecting each other

I have 2 python scripts I'm trying to run side by side. However, each of them have to open and close and reopen independently from each other. Also, one of the scripts is running inside a shell script.
Flaskserver.py & ./pyinit.sh
Flaskserver.py is just a flask server that needs to be restarted everynow and again to load a new page. (cant define all pages as the html is interchangeable). the pyinit is runs as xinit ./pyinit.sh (its selenium-webdriver pythoncode)
So when the Flaskserver changes and restarts the ./pyinit needs to wait about 20 seconds then restart as well.
Either one of these can create errors so I need to be able to check if Flaskserver has an error before restarting ./pyinit if ./pyinit errors i need to set the Flaskserver to a default value and then relaunch both of them.
I know a little about subprocess but I'm unsure on how it can deal with errors and stop-start code.
Rather than using sub-process I would recommend you to create a different thread for your processes using multithread.
Multithreading will not solve the problem if global variables are colliding, but by running them in different scripts, while you might solve this, you might collide in something else like a log file.
Now, if you keep both processes running from a single process that takes care of keeping them separated and assigning different global variables where necessary, you should be able to keep a better control. Using things like join and lock from the multithreading library, will also ensure that they don't collide and it should be easy to put a process to sleep while the other is running (as per waiting 20 secs).
You can keep a thread list as a global variable, as well as your lock. I have done this successfully with CherryPy's server for example. Any more details about multithreading look into the question I linked above, it's very well explained.

Recommendation on how to write a good python wrapper LSF

I am creating a python wrapper script and was wondering what'd be a good way to create it.
I want to run code serially. For example:
Step 1.
Run same program (in parallel - the parallelization is easy because I work with an LSF system so I just submit three different jobs).
I run the program in parallel, and each run takes one fin.txt and outputs one fout.txt, i.e., when they all run they would produce 3 output files from the three input files, f1in.txt, f2in.txt, f3in.txt, f1out.txt, f2out.txt, f3out.txt.
(in the LSF system) When each run of the program is completed successfully, it produces a log file output, f1log.out, f2log.out, f3log.out.
The log files output are of this form, i.e., f1log.out would look something like this if it runs successfully.
------------------------------------------------------------
# LSBATCH: User input
------------------------------------------------------------
Successfully completed.
Resource usage summary:
CPU time : 86.20 sec.
Max Memory : 103 MB
Max Swap : 881 MB
Max Processes : 4
Max Threads : 5
The output (if any) is above this job summary.
Thus, I'd like my wrapper to check (every 5 min or so) for each run (1,2,3) if the log file has been created, and if it has been created I'd like the wrapper to check if it was successfully completed (aka, if the string Successfully completed appears in the log file).
Also if the one of the runs finished and produces a log file that was not successfully completed I'd like my wrapper to end and report that run (k=1,2,3) was not completed.
After that,
Step2. If all three runs are successfully completed I would run another program that takes those three files as input... else I'd print an error.
Basically in my question I am looking for two things:
Does it sound like a good way to write a wrapper?
How in python I can check the existence of a file, and search for a pattern every certain time in a good way?
Note. I am aware that LSF has job dependencies but I find this way to be more clear and easy to work with, though may not be optimal.
I'm a user of an LSF system, and my major gripes are exit handling, and cleanup. I think a neat idea would be to send a batch job array that has for instance: Initialization Task, Legwork Task, Cleanup Task. The LSF could complete all three and send a return code to the waiting head node. Alot of times LSF works great to send one job or command, but it isn't really set up to handle systematic processing.
Other than that I wish you luck :)

Automate Python Script

I'm running a python script manually that fetches data in JSON format.How do I automate this script to run automatically on an hourly basis?
I'm working on Windows7.Can I use tools like Task scheduler?If I can use it,what do I need to put in the batch file?
Can I use tools like Task scheduler?
Yes. Any tool that can run arbitrary programs can run your Python script. Pick the one you like best.
If I can use it,what do I need to put in the batch file?
What batch file? Task Scheduler takes anything that can be run, with arguments—a C program, a .NET program, even a document with a default app associated with it. So, there's no reason you need a batch file. Use C:\Python33\python.exe (or whatever the appropriate path is) as your executable, and your script's path (and its arguments, if any) as the arguments. Just as you do when running the script from the command line.
See Using the Task Scheduler in MSDN for some simple examples, and Task Scheduler Schema Elements or Task Scheduler Scripting Objects for reference (depending on whether you want to create the schedule in XML, or via the scripting interface).
You want to create an ExecAction with Path set to "C:\Python33\python.exe" and Arguments set to "C:\MyStuff\myscript.py", and a RepetitionPattern with Interval set to "PT1H". You should be able to figure out the rest from there.
As sr2222 points out in the comments, often you end up scheduling tasks frequently, and needing to programmatically control their scheduling. If you need this, you can control Task Scheduler's scripting interface from Python, or build something on top of Task Scheduler, or use a different tool that's a bit easier to get at from Python and has more helpful examples online, etc.—but when you get to that point, take a step back and look at whether you're over-using OS task scheduling. (If you start adding delays or tweaking times to make sure the daily foo1.py job never runs until 5 minutes after the most recent hourly foo0.py has finished its job, you're over-using OS task scheduling—but it's not always that obvious.)
May I suggest WinAutomation or AutoMate. These two do the exact same thing, except the UI is a little different. I prefer WinAutomation, because the scripts are a little easier to build.
Yes, you can use the Task Scheduler to run the script on an hourly bases.
To execute a python script via a Batch File, use the following code:
start path_to_python_exe path_to_python_file
Example:
start C:\Users\harshgoyal\AppData\Local\Continuum\Anaconda3\python.exe %UserProfile%\Documents\test_script.py
If python is set as Window’s Environment Window then you can reduce the syntax to:
start python %UserProfile%\Documents\test_script.py
What I generally do is run the batch file once via Task Scheduler and within the python script I call a thread/timer every hour.
class threading.Timer(interval, function, args=None, kwargs=None)

Categories