How to tell process id within Python - python

I am working with a cluster system over linux (www.mosix.org) that allows me to run jobs and have the system run them on different computers. Jobs are run like so:
mosrun ls &
This will naturally create the process and run it on the background, returning the process id, like so:
[1] 29199
Later it will return. I am writing a Python infrastructure that would run jobs and control them. For that I want to run jobs using the mosrun program as above, and save the process ID of the spawned process (29199 in this case). This naturally cannot be done using os.system or commands.getoutput, as the printed ID is not what the process prints to output... Any clues?
Edit:
Since the python script is only meant to initially run the script, the scripts need to run longer than the python shell. I guess it means the mosrun process cannot be the script's child process. Any suggestions?
Thanks

Use subprocess module. Popen instances have a pid attribute.

Looks like you want to ensure the child process is daemonized -- PEP 3143, which I'm pointing to, documents and points to a reference implementation for that, and points to others too.
Once your process (still running Python code) is daemonized, be it by the means offered in PEP 3143 or others, you can os.execl (or other os.exec... function) your target code -- this runs said target code in exactly the same process which we just said is daemonized, and so it keeps being daemonized, as desired.
The last step cannot use subprocess because it needs to run in the same (daemonized) process, overlaying its executable code -- exactly what os.execl and friends are for.
The first step, before daemonization, might conceivably be done via subprocess, but that's somewhat inconvenient (you need to put the daemonize-then-os.exec code in a separate .py): most commonly you'd just want to os.fork and immediately daemonize the child process.
subprocess is quite convenient as a mostly-cross-platform way to run other processes, but it can't really replace Unix's good old "fork and exec" approach for advanced uses (such as daemonization, in this case) -- which is why it's a good thing that the Python standard library also lets you do the latter via those functions in module os!-)

Thanks all for the help. Here's what I did in the end, and seems to work ok. The code uses python-daemon. Maybe something smarter should be done about transferring the process id from the child to the father, but that's the easier part.
import daemon
def run_in_background(command, tmp_dir="/tmp"):
# Decide on a temp file beforehand
warnings.filterwarnings("ignore", "tempnam is a potential security")
tmp_filename = os.tempnam(tmp_dir)
# Duplicate the process
pid = os.fork()
# If we're child, daemonize and run
if pid == 0:
with daemon.DaemonContext():
child_id = os.getpid()
file(tmp_filename,'w').write(str(child_id))
sp = command.split(' ')
os.execl(*([sp[0]]+sp))
else:
# If we're a parent, poll for the new file
n_iter = 0
while True:
if os.path.exists(tmp_filename):
child_id = int(file(tmp_filename, 'r').read().strip())
break
if n_iter == 100:
raise Exception("Cannot read process id from temp file %s" % tmp_filename)
n_iter += 1
time.sleep(0.1)
return child_id

Related

Python Multiprocessing - sending inputs to child processes

I am using the multiprocessing module in python to launch few processes in parallel. These processes are independent of each other. They generate their own output and write out the results in different files. Each process calls an external tool using the subprocess.call method.
It was working fine until I discovered an issue in the external tool where due to some error condition it goes into a 'prompt' mode and waits for the user input. Now in my python script I use the join method to wait till all the processes finish their tasks. This is causing the whole thing to wait for this erroneous subprocess call. I can put a timeout for each of the process but I do not know in advance how long each one is going to run and hence this option is ruled out.
How do I figure out if any child process is waiting for an user input and how do I send an 'exit' command to it? Any pointers or suggestions to relevant modules in python will be really appreciated.
My code here:
import subprocess
import sys
import os
import multiprocessing
def write_script(fname,e):
f = open(fname,'w')
f.write("Some useful cammnd calling external tool")
f.close()
subprocess.call(['chmod','+x',os.path.abspath(fname)])
return os.path.abspath(fname)
def run_use(mname,script):
print "ssh "+mname+" "+script
subprocess.call(['ssh',mname,script])
if __name__ == '__main__':
dict1 = {}
dict['mod1'] = ['pp1','ext2','les3','pw4']
dict['mod2'] = ['aaa','bbb','ccc','ddd']
machines = ['machine1','machine2','machine3','machine4']
log_file.write(str(dict1.keys()))
for key in dict1.keys():
arr = []
for mod in dict1[key]:
d = {}
arr.append(mod)
if ((mod == dict1[key][-1]) | (len(arr)%4 == 0)):
for i in range(0,len(arr)):
e = arr.pop()
script = write_script(e+"_temp.sh",e)
d[i] = multiprocessing.Process(target=run_use,args=(machines[i],script,))
d[i].daemon = True
for pp in d:
d[pp].start()
for pp in d:
d[pp].join()
Since you're writing a shell script to run your subcommands, can you simply tell them to read input from /dev/null?
#!/bin/bash
# ...
my_other_command -a -b arg1 arg2 < /dev/null
# ...
This may stop them blocking on input and is a really simple solution. If this doesn't work for you, read on for some other options.
The subprocess.call() function is simply shorthand for constructing a subprocess.Popen instance and then calling the wait() method on it. So, your spare processes could instead create their own subprocess.Popen instances and poll them with poll() method on the object instead of wait() (in a loop with a suitable delay). This leaves them free to remain in communication with the main process so you can, for example, allow the main process to tell the child process to terminate the Popen instance with the terminate() or kill() methods and then itself exit.
So, the question is how does the child process tell whether the subprocess is awaiting user input, and that's a trickier question. I would say perhaps the easiest approach is to monitor the output of the subprocess and search for the user input prompt, assuming that it always uses some string that you can look for. Alternatively, if the subprocess is expected to generate output continually then you could simply look for any output and if a configured amount of time goes past without any output then you declare that process dead and terminate it as detailed above.
Since you're reading the output, actually you don't need poll() or wait() - the process closing its output file descriptor is good enough to know that it's terminated in this case.
Here's an example of a modified run_use() method which watches the output of the subprocess:
def run_use(mname,script):
print "ssh "+mname+" "+script
proc = subprocess.Popen(['ssh',mname,script], stdout=subprocess.PIPE)
for line in proc.stdout:
if "UserPrompt>>>" in line:
proc.terminate()
break
In this example we assume that the process either gets hung on on UserPrompt>>> (replace with the appropriate string) or it terminates naturally. If it were to get stuck in an infinite loop, for example, then your script would still not terminate - you can only really address that with an overall timeout, but you didn't seem keen to do that. Hopefully your subprocess won't misbehave in that way, however.
Finally, if you don't know in advance the prompt that will be giving from your process then your job is rather harder. Effectively what you're asking to do is monitor an external process and know when it's blocked reading on a file descriptor, and I don't believe there's a particularly clean solution to this. You could consider running a process under strace or similar, but that's quite an awful hack and I really wouldn't recommend it. Things like strace are great for manual diagnostics, but they really shouldn't be part of a production setup.

Need to find which program called the python script

I am using a build system(waf) which is a wrapper around python. There are some programs(perl scripts,exe's etc) calling the python build system. When I execute the build scripts from cmd.exe, I need to find out the program that called it. My OS is windows 7. I tried getting the parent PID in a python module and it returns "cmd" as PPID and "python.exe" as PID, so that approach did not help me in finding what I am looking for.
I believe I should be looking at some stacktraces on a OS level, but am not able to find how to do it. Please help me with the approach I should take or a possible code snippet. I just need to know the name of the script or program that called the system, example caller.perl, callload.exe
Thank you
Though I am not sure why it would be needed but this is a fun problem in itself, so here are few tips, once you have parent PID loop thru processes and get name e.g.
using WMI
import wmi
c = wmi.WMI ()
for process in c.Win32_Process ():
if process.ProcessId == ppid:
print process.ProcessId, process.Name
I think you can do same thing using win32 API, e.g.
processes = win32process.EnumProcesses()
for pid in processes:
if pid == ppid:
handle = win32api.OpenProcess(win32con.PROCESS_ALL_ACCESS,
False, pid)
exe = win32process.GetModuleFileNameEx(handle, 0)
This will work for simple cases when progA directly executes progB but if there is a long chain of child process in between, it may not be good solution. Best way for a generic case would be for calling program to tell his identity by passing it as argument e.g.
progB --calledfrom progA
modify the python script to add an argument to it, stating which file called it. then log it into a logger file. all scripts calling it will have to identify themselves to the python script via the argument vector.
For example:
foo.pl calls yourfile.py as:
yourfile.py /path/to/foo.pl
yourfile.py:
def main(argv):
logger.print(argv[1])
I was able to use process explorer to see the chain of processes called and was able to retrieve the name by just traversing the parent. Thanks for all who replied.

How to get memory usage of an external program - python

I am trying to get the memory usage of an external program within my python script. I have tried using the script http://code.activestate.com/recipes/286222/ as follows:
m0 = memory()
subprocess.call('My program')
m1 = memory(m0)
print m1
But this seems to be just giving me the memory usage of the python script rather than 'My program'. Is there a way of outputting the memory usage of the program for use within the python script?
Try using Psutil
import psutil
import subprocess
import time
SLICE_IN_SECONDS = 1
p = subprocess.Popen('calling/your/program')
resultTable = []
while p.poll() == None:
resultTable.append(psutil.get_memory_info(p.pid))
time.sleep(SLICE_IN_SECONDS)
If you look at the recipe you will see the line:
_proc_status = '/proc/%d/status' % os.getpid()
I suggest you replace the os.getpid() with the process id of your child process. As #Neal said, as I was typing this you need to use Popen and get the pid attribute of the returned object.
However, you have a possible race condition because you don't know at what state the child process is at, and the memory usage will vary anyway.
You may want to check out the psutil module: http://code.google.com/p/psutil/. The Process Management section on the homepage gives you examples of getting memory usage for a running process specified by the pid.
Do you want to spawn the process you are monitoring in your script as well? If so, you probably don't want to use subprocess.call as this will wait for the program to exit and you won't be able to monitor it while it's running. If you want to spawn the process then monitor it, you probably want to use Popen http://docs.python.org/library/subprocess.html#subprocess.Popen. This will allow you to spawn the process, get the pid, hand the pid to psutil, then monitor the memory usage.
I know this is an older post, but it's the only one that appears when I google this issue, so, I want to add the updated version of this:
import psutil
import humanfriendly
proc = subprocess.Popen("...Your process...")
SLICE_IN_SECONDS = 1
while proc.poll() is None:
p = psutil.Process(proc.pid)
mem_status = "RSS {}, VMS: {}".format(humanfriendly.format_size(p.memory_info().rss),
humanfriendly.format_size(p.memory_info().vms))
time.sleep(SLICE_IN_SECONDS)
print(mem_status)
I used humanfriendly here, to make the values more readable, but it's not required.
The RSS and VMS values are on all os, and there may be other values depending on the os you're using: https://psutil.readthedocs.io/en/latest/#psutil.Process.memory_info

getting ProcessId within Python code

I am in Windows and Suppose I have a main python code that calls python interpreter in command line to execute another python script ,say test.py .
So test.py is executed as a new process.How can I find the processId for this porcess in Python ?
Update:
To be more specific , we have os.getpid() in os module. It returns the current process id.
If I have a main program that runs Python interpreter to run another script , how can I get the process Id for that executing script ?
If you used subprocess to spawn the shell, you can find the process ID in the pid property:
sp = subprocess.Popen(['python', 'script.py'])
print('PID is ' + str(sp.pid))
If you used multiprocessing, use its pid property:
p = multiprocessing.Process()
p.start()
# Some time later ...
print('PID is ' + str(p.pid))
It all depends on how you're launching the second process.
If you're using os.system or similar, that call won't report back anything useful about the child process's pid. One option is to have your 2nd script communicate the result of os.getpid() back to the original process via stdin/stdout, or write it to a predetermined file location. Another alternative is to use the third-party psutil library to figure out which process it is.
On the other hand, if you're using the subprocess module to launch the script, the resulting "popen" object has an attribute popen.pid which will give you the process id.
You will receive the process ID of the newly created process when you create it. At least, you will if you used fork() (Unix), posix_spawn(), CreateProcess() (Win32) or probably any other reasonable mechanism to create it.
If you invoke the "python" binary, the python PID will be the PID of this binary that you invoke. It's not going to create another subprocess for itself (Unless your python code does that).
Another option is that the process you execute will set a console window title for himself.
And the searching process will enumerate all windows, find the relevant window handle by name and use the handle to find PID. It works on windows using ctypes.

How to kill headless X server started via Python?

I want to get screenshots of a webpage in Python. For this I am using http://github.com/AdamN/python-webkit2png/ .
newArgs = ["xvfb-run", "--server-args=-screen 0, 640x480x24", sys.argv[0]]
for i in range(1, len(sys.argv)):
if sys.argv[i] not in ["-x", "--xvfb"]:
newArgs.append(sys.argv[i])
logging.debug("Executing %s" % " ".join(newArgs))
os.execvp(newArgs[0], newArgs)
Basically calls xvfb-run with the correct args. But man xvfb says:
Note that the demo X clients used in the above examples will not exit on their own, so they will have to be killed before xvfb-run will exit.
So that means that this script will <????> if this whole thing is in a loop, (To get multiple screenshots) unless the X server is killed. How can I do that?
The documentation for os.execvp states:
These functions all execute a new
program, replacing the current
process; they do not return. [..]
So after calling os.execvp no other statement in the program will be executed. You may want to use subprocess.Popen instead:
The subprocess module allows you to
spawn new processes, connect to their
input/output/error pipes, and obtain
their return codes. This module
intends to replace several other,
older modules and functions, such as:
Using subprocess.Popen, the code to run xlogo in the virtual framebuffer X server becomes:
import subprocess
xvfb_args = ['xvfb-run', '--server-args=-screen 0, 640x480x24', 'xlogo']
process = subprocess.Popen(xvfb_args)
Now the problem is that xvfb-run launches Xvfb in a background process. Calling process.kill() will not kill Xvfb (at least not on my machine...). I have been fiddling around with this a bit, and so far the only thing that works for me is:
import os
import signal
import subprocess
SERVER_NUM = 99 # 99 is the default used by xvfb-run; you can leave this out.
xvfb_args = ['xvfb-run', '--server-num=%d' % SERVER_NUM,
'--server-args=-screen 0, 640x480x24', 'xlogo']
subprocess.Popen(xvfb_args)
# ... do whatever you want to do here...
pid = int(open('/tmp/.X%s-lock' % SERVER_NUM).read().strip())
os.kill(pid, signal.SIGINT)
So this code reads the process ID of Xvfb from /tmp/.X99-lock and sends the process an interrupt. It works, but does yield an error message every now and then (I suppose you can ignore it, though). Hopefully somebody else can provide a more elegant solution. Cheers.

Categories