Parallel Python for loop

Parallel Python for loop - python

I work primarily with arcgis and pci flavours of python 2.7. I have a number of processes that I've created that run outside of these programs but use these libraries. They are run via .bat files through cmd.
Currently, they run the processes in a series of for loops. And each for loop processes sequentially. I was wondering if there was a way to run the processing within the for loop for each object in the list at the same time. That is in parallel. The only way I can think of this is opening a cmd for each object in the list, and running the processing separately.
Is what I am asking even possible? Where should I look for solutions?

Look into Subprocess So youd want a new commandline window created in the background where test.bat runs in parallel.. and in your case you don't want to wait for the command to complete before you continue your program, so use subprocess.Popen instead (may be something to look into as
well)
subprocess.call
Run the command described by args. Wait for command to complete, then return the returncode attribute.
If you want to start an external program from your python script pass the program's filename to subprocess.Popen() on Ubuntu Linux you would enter something like
>>>import subprocess
>>>subprocess.Popen('/usr/bin/gnome-...')
<subprocess.Popen Object at 0x7f2bcf93b20
The Return value is a Popen object which has two useful methods : poll() & wait()
poll() is like asking your friend if he has finished running the code you gave him.
wait() is like waiting for your friend to finish working on his code before you keep working on yours.(something you might want to look into)

Related

Python restart script in another cmd's window

is there a way to restart another script in another shell?
i have script that sometimes stuck waiting to read email from gmail and imap. from another script i would like to restart the main one but without stopping the execution of the second
i have tried:
os.system("C:\Users\light\Documents\Python\BOTBOL\Gmail\V1\send.py")
process = subprocess.Popen(["python", "C:\Users\light\Documents\Python\BOTBOL\Gmail\V1\send.py"])
but both run the main in the second's shell
EDIT:
sorry, for shell i mean terminal window

After your last comment and as the syntax show that you are using Windows, I assume that you want to launch a Python script in another console. The magic word here is START if you want that the launching execute in parallel with the new one, or START /W if you want to wait for the end of the subprocess.
In your case, you could use:
subprocess.call(["cmd.exe", "/c", "START", "C:\Path\To\PYTHON.EXE",
"C:\Users\light\Documents\Python\BOTBOL\Gmail\V1\send.py"])

Subprocess has an option called shell which is what you want. Os calls are blocking which means that only after the command is completed will the interpreter move to the next line. On the other hand subprocess popens are non blocking, however both these commands will spawn off child process from the process running this code. If you want to run in shell and get access shell features to execute this , try the shell = True in subprocess.

I could try and explain everything you need but I think this video will do it better: Youtube Video about multithreading
This will allow you to run 2 things f.e.
Have 1 run on checkin email and the other one on inputs so it wont stop at those moments and making multiple 'shelves' possible, as they are parallel.
If you really want to have a different window for this, i am sorry and I can not help.
Hope this was were you were looking for.

Python and Scheduling Computation

I wish to schedule a computation to occur after my current computation in Python is finished. Note that my Python interpreter is running through emacs.
For example I am currently running:
>>> for i in range(2, 5):
... tn.TweetNetwork.create_subnetworks(i)
...
I made a simple mistake and meant to type range(1,5). This has been running for at least 4 hours and should run for another few hours. That being said I do not want to re-execute the loop with the correction and lose all that has been computed.
As I am not by the computer 24/7, how can I schedule Python to execute the function `tn.TweetNetwork.create_subnetworks(1)?
I use emacs 24.3 and ubuntu 12.04 LTS, let me know if you need more information. All help is greatly appreciated!
EDIT: I like the answer posted, however I do not know how to find the PID. I am running a Python interpreter through emacs. So how would I find that out?

This was too much for the comment, but this isn't a complete reply.
To get a process started by Emacs:
M-x list-processes,
identify the process you want to get the id of
M-:(process-id (get-process "name-of-the-process")).
But this will give you the process of the interpreter, not any other process started from it.
If you then need to get all processes spawned through that process, you can do:
$ pstree PID
Where PID is the one you obtained earlier from Emacs.

I think, the easiest way is to write another script that wait until your process finished and runs tn.TweetNetwork.create_subnetworks(1). This will work only if your create_subnetworks does not access any global variables and does and write all results into database/file/etc.
# Write script similar to these
import os, time
print "Wait until old script completed..."
while os.path.exists("/proc/SCRIPT_PID"):
time.sleep(1)
print "Execute create_subnetworks..."
tn = ...
tn.TweetNetwork.create_subnetworks(1)
Connect to your computer by SSH, get process id by ps axu | grep script_name and run this new script.

If Tyler comment does not help, you may eval the following piece of code:
(defun foo (ignored)
(remove-hook 'comint-output-filter-functions 'foo)
(run-with-timer 1 nil (lambda()
(goto-char (point-max))
(insert "tn.TweetNetwork.create_subnetworks(1)")
(comint-send-input))))
(add-hook 'comint-output-filter-functions 'foo)
It defines a function that will insert the command you need to insert in the python inferior buffer, a second after the invocation of that function (the delay is for avoid recursive loops).
Then it setup the invocation of that function upon the event where the inferior process (python, in your case) writes anything. In your case, that would be the ">>>" prompt, that python writes when ready. If your code is generating output, this approach won't work.
If you are using comint in other buffers (shell, sql, ...) you would need to make variable comint-output-filter-functions local to your python interactive buffer (with make-variable-buffer-local)

Python Not Waiting for MATLAB to Finish

I am interfacing a small MATLAB script with Python via the subprocess module. As follows:
cmd='(matlab -nosplash -nodesktop -r "optimizer;quit;")'
p = subprocess.Popen(cmd,stdin=None,stdout=None,shell=True)
#subprocess.Popen.wait(p)
#p.wait()
print "DONE?"
But "DONE" is being printed even before MATLAB starts! My entire code past it is breaking because of this.
I have tried:
Using os.system() calls (This is where I started, but I read on SO that its deprecated)
Using p.wait() and subprocess.Popen.wait. Both don't work.
Using a manual pause of 3 minutes (Max. time MATLAB takes to finish on average) Super Sloppy.
What am I missing?

Works fine for me:
import subprocess
retcode = subprocess.call(["matlab", "-nosplash", "-nodesktop", "-r", "quit;"])
print "DONE", retcode
Split the command arguments accordingly, use only options that you actually require (no need for shell=True, for example), use the function that directly does what you are after (call), i.e., call and wait for completion.
Depending on your installation (see http://www.mathworks.com/help/matlab/ref/matlabwindows.html), Matlab may be launched in a way such that it immediately quits. To handle that, add "-wait" to your argument list.

Start Matlab with the "-wait" flag. From the documenation:
"MATLAB is started by a separate starter program which normally launches MATLAB and then immediately quits. Using this option tells the starter program not to quit until MATLAB has terminated. This option is useful when you need to process the results from MATLAB in a script. Calling MATLAB with this option blocks the script from continuing until the results are generated."

Based on your response to my comment, let me answer your question with what I did for my application, that had a similar process to yours (albeit in C#). Instead of trying to force your process to wait for MATLAB to finish up (which is obviously not working right now), just wait for that CSV file to be written to. If you're worried about possibly having duplicates, then just append the current date and time to the end of the file, and that should do the trick.

Constantly monitor a program/process using Python

I am trying to constantly monitor a process which is basically a Python program. If the program stops, then I have to start the program again. I am using another Python program to do so.
For example, say I have to constantly run a process called run_constantly.py. I initially run this program manually, which writes its process ID to the file "PID" (in the location out/PROCESSID/PID).
Now I run another program which has the following code to monitor the program run_constantly.py from a Linux environment:
def Monitor_Periodic_Process():
TIMER_RUNIN = 1800
foo = imp.load_source("Run_Module","run_constantly.py")
PROGRAM_TO_MONITOR = ['run_constantly.py','out/PROCESSID/PID']
while(1):
# call the function checkPID to see if the program is running or not
res = checkPID(PROGRAM_TO_MONITOR)
# if res is 0 then program is not running so schedule it
if (res == 0):
date_time = datetime.now()
scheduler.add_cron_job(foo.Run_Module, year=date_time.year, day=date_time.day, month=date_time.month, hour=date_time.hour, minute=date_time.minute+2)
scheduler.start()
scheduler.get_jobs()
time.sleep(TIMER_NOT_RUNIN)
continue
else:
#the process is running sleep and then monitor again
time.sleep(TIMER_RUNIN)
continue
I have not included the checkPID() function here. checkPID() basically checks if the process ID still exists (i.e. if the program is still running) and if it does not exist, it returns 0. In the above program, I check if res == 0, and if so, then I use Python's scheduler to schedule the program. However, the major problem that I am currently facing is that the process ID of this program and the run_constantly.py program turns to be same once I schedule the run_constantly.py using the scheduler.add_cron_job() function. So if the program run_constantly.py crashes, the following program still thinks that the run_constantly.py is running (since both process IDs are same), and therefore continues to go into the else loop to sleep and monitor again.
Can someone tell me how to solve this issue? Is there a simple way to constantly monitor a program and reschedule it when it has crashed?

There are many programs that can do this.
On Ubuntu there is upstart (installed by default)
Lots of people like http://supervisord.org/
monit as mentioned by #nathan
If you are looking for a python alternative there is a library that has just been released called circus which looks interesting.
And pretty much every linux distro probably has one of these built in.
The choice is really just down to which one you like better, but you would be far better off using one of these than writing it yourself.
Hope that helps

If you are willing to control the monitored program directly from python instead of using cron, have a look at the subprocess module :
The subprocess module allows you to spawn new processes,
connect to their input/output/error pipes, and obtain their return codes.
Check examples like track process status with python on SO for examples and references.

You could just use monit
http://mmonit.com/monit/
It monitors processes and restarts them (and other things.)

I thought I'd add a more versatile solution, which is one that I personally use all the time as well.
It's name is Immortal (source is at https://github.com/immortal/immortal)
To have it monitor and instantly restart a program if it stops, simply run the following command:
immortal <command>
So in your case I would run run_constantly.py like so:
immortal python run_constantly.py
The command ps aux | grep run_constantly.py should return 2 process IDs, one for the Immortal command, and one for the separate command Immortal started (just the regular command. As long as the Immortal process is running, run_constantly.py will stay running.

Re-read environment of parent process in python

I've written a little Python (2.7.2+) module (called TWProcessing) that can be described as an improvised batch manager. The way it works is that I pass it a long list of commands that it will then run in parallel, but limiting the total number of simultaneous processes. That way, if I have 500 commands I would like to run, it will loop through all of them, but only running X of them at a time so as to not overwhelm the machine. The value of X can be easily set when declaring an instance of this batch manager (the class is called TWBatchManager) :
batch = TWProcessing.TWBatchManager(MaxJobs=X)
I then add a list of jobs to this object in a very straightforward manner :
batch.Queue.append(/CMD goes here/)
Where Queue is a list of commands that the batch manager will run. When the queue has been filled, I then call Run() which loops through all the commands, only running X at a time :
batch.Run()
So far, everything works fine. Now what I'd like to do is be able to change the value of X (i.e. the maximum number of processes running at once) dynamically i.e. while the processes are still running. My old way of doing this was rather straightforward. I had a file called MAXJOBS that the class would know to look at, and, if it existed, it would check it regularly to see if the desired value has changed. Now I'd like to try something a bit more elegant. I would like to be able to write something along the lines of export MAXJOBS=newX in the bash shell that launched the script containing the batch manager, and have the batch manager realize that this is now the value of X it should be using. Obviously os.environ['MAXJOBS'] is not what I'm looking for, because this is a dictionary that is loaded on startup. os.getenv('MAXJOBS') doesn't cut it either, because the export will only affect child processes that the shell will spawn from then on. So what I need is a way to get back to the environment of the parent process that launched my python script. I know os.ppid will give me the parent pid, but I have no idea how to get from there to the parent environment. I've poked around the interwebz to see if there was a way in which the parent shell could modify the child process environment, and I've found that people tend to insist I not try anything like that, lest I be prepared to do some of the ugliest things one can possibly do with a computer.
Any ideas on how to pull this off? Granted my "read from a standard text file" idea is not so ugly, but I'm new to Python and am therefore trying to challenge myself to do things in an elegant and clean manner to learn as much as I can. Thanks in advance for your help.

For me it looks that you are asking for inter-process communication between a bash script and a python program.
I'm not completely sure about all your requirements, but it might be a candidate for a FIFO (named pipe):
1) make the fifo:
mkfifo batch_control
2) Start the python - server, which reads from the fifo. (Note: the following is only a minimalistic example; you must adapt things:
while True:
fd = file("batch_control", "r")
for cmd in fd:
print("New command [%s]" % cmd[:-1])
fd.close()
3) From the bash script you can than 'send' things to the python server by echo-ing strings into the fifo:
$ echo "newsize 800" >batch_control
$ echo "newjob /bin/ps" >batch_control
The output of the python server is:
New command [newsize 800]
New command [newjob /bin/ps]
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.