run external program in a loop with max time limit - python

I wish to have a python script that runs an external program in a loop sequentially. I also want to limit each execution of the program to a max running time. If it is exceeded, then kill the program. What is the best way to accomplish this?
Thanks!

To run an external program from Python you'll normally want to use the subprocess module.
You could "roll your own" subprocess handling using os.fork() and os.execve() (or one of its exec* cousins) ... with any file descriptor plumbing and signal handling magic you like. However, the subprocess.Popen() function has implemented and exposed most of the features for what you'd want to do for you.
To arrange for the program to die after a given period of time you can have your Python script kill it after the timeout. Naturally you'll want to check to see if the process has already completed before then. Here's a dirt stupid example (using the split function from the shlex module for additional readability:
from shlex import split as splitsh
import subprocess
import time
TIMEOUT=10
cmd = splitsh('/usr/bin/sleep 60')
proc = subprocess.Popen(cmd)
time.sleep(TIMEOUT)
pstatus = proc.poll()
if pstatus is None:
proc.kill()
# Could use os.kill() to send a specific signal
# such as HUP or TERM, check status again and
# then resort to proc.kill() or os.kill() for
# SIGKILL only if necessary
As noted there are a few ways to kill your subprocess. Note that I check for "is None" rather than testing pstatus for truth. If your process completed with an exit value of zero (conventionally indicating that no error occurred) then a naïve test of the proc.poll() results would conflate that completion with the still running process status.
There are also a few ways to determine if sufficient time has passed. In this example we sleep, which is somewhat silly if there's anything else we could be doing. That just leaves our Python process (the parent of your external program) laying about idle.
You could capture the start time using time.time() then launch your subprocess, then do other work (launch other subprocesses, for example) and check the time (perhaps in a loop of other activity) until your desired timeout has been exceeded.
If any of your other activity involves file or socket (network) operations then you'd want to consider using the select module as a way to return a list of file descriptors which are readable, writable or ready with "exceptional" events. The select.select() function also takes an optional "timeout" value. A call to select.select([],[],[],x) is essentially the same as time.sleep(x) (in the case where we aren't providing any file descriptors for it to select among).
In lieu of select.select() it's also possible to use the fcntl module to set your file descriptor into a non-blocking mode and then use os.read() (NOT the normal file object .read() methods, but the lower level functionality from the os module). Again it's better to use the higher level interfaces where possible and only to resort to the lower level functions when you must. (If you use non-blocking I/O then all your os.read() or similar operations must be done within exception handling blocks, since Python will represent the "-EWOULDBLOCK" condition as an OSError (exception) like: "OSError: [Errno 11] Resource temporarily unavailable" (Linux). The precise number of the error might vary from one OS to another. However, it should be portable (at least for POSIX systems) to use the -EWOULDBLOCK value from the errno module.
(I realize I'm going down a rathole here, but information on how your program can do something useful while your child processes are running external programs is a natural extension of how to manage the timeouts for them).
Ugly details about non-blocking file I/O (including portability issues with MS Windows) have been discussed here in the past: Stackoverflow: non-blocking read on a stream in Python
As others have commented, it's better to provide more detailed questions and include short, focused snippets of code which show what effort you've already undertaken. Usually you won't find people here inclined to write tutorials rather than answers.

If you are able to use Python 3.3
From docs,
subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False, timeout=None)
subprocess.call(["ls", "-l"])
0
subprocess.call("exit 1", shell=True)
1
Should do the trick.

Related

Interprocess Communication between two python scripts without STDOUT

I am trying to create a Monitor script that monitors all the threads or a huge python script which has several loggers running, several thread running.
From Monitor.py i could run subprocess and forward the STDOUT which might contain my status of the threads.. but since several loggers are running i am seeing other logging in that..
Question: How can run the main script as a separate process and get custom messages, thread status without interfering with logging. ( passing PIPE as argument ? )
Main_Script.py * Runs Several Threads * Each Thread has separate Loggers.
Monitor.py * Spins up the Main_script.py * Monitors the each of the threads in MainScript.py ( may be obtain other messages from Main_script in the future)
So Far, I tried subprocess, process from Multiprocessing.
Subprocess lets me start the Main_script and forward the stdout back to monitor but I see the logging of threads coming in through the same STDOUT. I am using the “import logging “ Library to log the data from each threads to separate files.
I tried “process” from Multiprocessing. I had to call the main function of the main_script.py as a process and send a PIPE argument to it from monitor.py. Now I can’t see the Main_script.py as a separate process when I run top command.
Normally, you want to change the child process to work like a typical Unix userland tool: the logging and other side-band information goes to stderr (or to a file, or syslog, etc.), and only the actual output goes to stdout.
Then, the problem is easy: just capture stdout to a PIPE that you process, and either capture stderr to a different PIPE, or pass it through to real stderr.
If that's not appropriate for some reason, you need to come up with some other mechanism for IPC: Unix or Windows named pipes, anonymous pipes that you pass by leaking the file descriptor across the fork/exec and then pass the fd as an argument, Unix-domain sockets, TCP or UDP localhost sockets, a higher-level protocol like a web service on top of TCP sockets, mmapped files, anonymous mmaps or pipes that you pass between processes via a Unix-domain socket or Windows API calls, …
As you can see, there are a huge number of options. Without knowing anything about your problem other than that you want "custom messages", it's impossible to tell you which one you want.
While we're at it: If you can rewrite your code around multiprocessing rather than subprocess, there are nice high-level abstractions built in to that module. For example, you can use a Queue that automatically manages synchronization and blocking, and also manages pickling/unpickling so you can just pass any (picklable) object rather than having to worry about serializing to text and parsing the text. Or you can create shared memory holding arrays of int32 objects, or NumPy arrays, or arbitrary structures that you define with ctypes. And so on. Of course you could build the same abstractions yourself, without needing to use multiprocessing, but it's a lot easier when they're there out of the box.
Finally, while your question is tagged ipc and pipe, and titled "Interprocess Communication", your description refers to threads, not processes. If you actually are using a bunch of threads in a single process, you don't need any of this.
You can just stick your results on a queue.Queue, or store them in a list or deque with a Lock around it, or pass in a callback to be called with each new result, or use a higher-level abstraction like concurrent.futures.ThreadPoolExecutor and return a Future object or an iterator of Futures, etc.

Terminate a process created with 'subprocess.run'

How can I terminate a process created with subprocess.run in Python 3?
The documentation of subprocess.run is here, but it doesn't specify it.
The documentation of the return-value is here, but there's no hint for it in there either.
With subprocess.Popen it's easy:
p = subprocess.Popen(...)
...
p.terminate()
How can I do the same when using subprocess.run?
you cannot, since the process returns to python interpreter only once it has ended.
You could try to get hold of the PID while running in a thread and kill it, but...
For those cases, Popen is the best solution as you can control input/output & end of your process.
From the documentation:
The underlying process creation and management in this module is handled by the Popen class. It offers a lot of flexibility so that developers are able to handle the less common cases not covered by the convenience functions.
Note that the documentation starts by describing run, then Popen, then the now deprecated check_call, check_output ... calls

Interleaving stdout from Popen with messages from ZMQ recv

Is there a best-practices approach to poll for the stdout/stderr from a subprocess.Popen as well as a zmq socket?
In my case, I have my main program spawning a Popen subprocess. The subprocess publishes messages via zmq which I then want to subscribe to in my main program.
Waiting on multiple zmq sockets is not complicated with the zmq.Poller but when I want to interleave this with the output from my subprocess itself, I am unsure how to do it in the best way without risking waits or having needless loops.
In the end, I would like to use it like so:
process = Popen([prog, '--publish-to', 'tcp://127.0.0.1:89890'],
stdout=subprocess.PIPE, stderr=subprocess.PIPE, ...)
for (origin, data) in interleave(process, 'tcp://127.0.0.1:89890'):
if origin == 'STDOUT': pass
if origin == 'STDERR': pass
if origin == 'ZMQ': pass
prog --publish-to tcp://127.0.0.1:89890 will then open a zmq.PUB socket and publish data, whereas the interleave function will subscribe to this and also poll for stdout and stderr, yielding whatever data reaches it first.
I know how to define interleave with multiple daemon threads and queues but I don’t know if this approach might have some caveats with regards to lazy reading (ie. stdout might not be processed until the end of the program?) or other things that I have not yet thought about (seems also to be quite a bit of overhead for such a task).
I will be thankful for all ideas or insights.
I aim for at least Python 3.3/3.4 but if this turns out to be much easier with the new async/await tools, I could also use Python 3.5 for the code.
Use zmq.Poller: http://pyzmq.readthedocs.io/en/latest/api/zmq.html#polling. You can register zmq sockets and native file descriptors (e.g. process.stdout.fileno() and process.stderr.fileno()) there, and it will wait until input is available on at least one of the registered sources.
I don't know if it works in Windows, you should try.

What is the best way to capture output from a process using python?

I am using python's subprocess module to start a new process. I would like to capture the output of the new process in real time so I can do things with it (display it, parse it, etc.). I have seen many examples of how this can be done, some use custom file-like objects, some use threading and some attempt to read the output until the process has completed.
File Like Objects Example (click me)
I would prefer not to use custom file-like objects because I want to allow users to supply their own values for stdin, stdout and stderr.
Threading Example (click me)
I do not really understand why threading is required so I am reluctant to follow this example. If someone can explain why the threading example makes sense I would be happy listen. However, this example also restricts users from supplying their own stdout and stderr values.
Read Output Example (see below)
The example which makes the most sense to me is to read the stdout, stderr until the process has finished. Here is some example code:
import subprocess
# Start a process which prints the options to the python program.
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
# While the process is running, display the output to the user.
while True:
# Read standard output data.
for stdout_line in iter(process.stdout.readline, ""):
# Display standard output data.
sys.stdout.write(stdout_line)
# Read standard error data.
for stderr_line in iter(process.stderr.readline, ""):
# Display standard error data.
sys.stderr.write(stderr_line)
# If the process is complete - exit loop.
if process.poll() != None:
break
My question is,
Q. Is there a recommended approach for capturing the output of a process using python?
First, your design is a bit silly, since you can do the same thing like this:
process = subprocess.Popen(
["python", "-h"],
bufsize=1,
stdout=sys.stdout,
stderr=sys.stderr
)
… or, even better:
process = subprocess.Popen(
["python", "-h"],
bufsize=1
)
However, I'll assume that's just a toy example, and you might want to do something more useful.
The main problem with your design is that it won't read anything from stderr until stdout is done.
Imagine you're driving an MP3 player that prints each track name to stdout, and logging info to stderr, and you want to play 10 songs. Do you really want to wait 30 minutes before displaying any of the logging to your users?
If that is acceptable, then you might as well just use communicate, which takes care of all of the headaches for you.
Plus, even if it's acceptable for your model, are you sure you can queue up that much unread data in the pipe without it blocking the child? On every platform?
Just breaking up the loop to alternate between the two won't help, because you could end up blocking on stdout.readline() for 5 minutes while stderr is piling up.
So that's why you need some way to read from both at once.
How do you read from two pipes at once?
This is the same problem (but smaller) as handling 1000 network clients at once, and it has the same solutions: threading, or multiplexing (and the various hybrids, like doing green threads on top of a multiplexor and event loop, or using a threaded proactor, etc.).
The best sample code for the threaded version is communicate from the 3.2+ source code. It's a little complicated, but if you want to handle all of the edge cases properly on both Windows and Unix there's really no avoiding a bit of complexity.
For multiplexing, you can use the select module, but keep in mind that this only works on Unix (you can't select on pipes on Windows), and it's buggy without 3.2+ (or the subprocess32 backport), and to really get all the edge cases right you need to add a signal handler to your select. Unless you really, really don't want to use threading, this is the harder answer.
But the easy answer is to use someone else's implementation. There are a dozen or more modules on PyPI specifically for async subprocesses. Alternatively, if you already have a good reason to write your app around an event loop, just about every modern event-loop-driven async networking library (including the stdlib's asyncio) includes subprocess support out of the box, that works on both Unix and Windows.
Is there a recommended approach for capturing the output of a process using python?
It depends on who you're asking; a thousand Python developers might have a thousand different answers… or at least half a dozen. If you're asking what the core devs would recommend, I can take a guess:
If you don't need to capture it asynchronously, use communicate (but make sure to upgrade to at least 3.2 for important bug fixes). If you do need to capture it asynchronously, use asyncio (which requires 3.4).

How to write a system agnostic Python daemon/service? [duplicate]

I would like to have my Python program run in the background as a daemon, on either Windows or Unix. I see that the python-daemon package is for Unix only; is there an alternative for cross platform? If possible, I would like to keep the code as simple as I can.
In Windows it's called a "service" and you could implement it pretty easily e.g. with the win32serviceutil module, part of pywin32. Unfortunately the two "mental models" -- service vs daemon -- are very different in detail, even though they serve similar purposes, and I know of no Python facade that tries to unify them into a single framework.
This question is 6 years old, but I had the same problem, and the existing answers weren't cross-platform enough for my use case. Though Windows services are often used in similar ways as Unix daemons, at the end of the day they differ substantially, and "the devil's in the details". Long story short, I set out to try and find something that allows me to run the exact same application code on both Unix and Windows, while fulfilling the expectations for a well-behaved Unix daemon (which is better explained elsewhere) as best as possible on both platforms:
Close open file descriptors (typically all of them, but some applications may need to protect some descriptors from closure)
Change the working directory for the process to a suitable location to prevent "Directory Busy" errors
Change the file access creation mask (os.umask in the Python world)
Move the application into the background and make it dissociate itself from the initiating process
Completely divorce from the terminal, including redirecting STDIN, STDOUT, and STDERR to different streams (often DEVNULL), and prevent reacquisition of a controlling terminal
Handle signals, in particular, SIGTERM.
The fundamental problem with cross-platform daemonization is that Windows, as an operating system, really doesn't support the notion of a daemon: applications that start from a terminal (or in any other interactive context, including launching from Explorer, etc) will continue to run with a visible window, unless the controlling application (in this example, Python) has included a windowless GUI. Furthermore, Windows signal handling is woefully inadequate, and attempts to send signals to an independent Python process (as opposed to a subprocess, which would not survive terminal closure) will almost always result in the immediate exit of that Python process without any cleanup (no finally:, no atexit, no __del__, etc).
Windows services (though a viable alternative in many cases) were basically out of the question for me: they aren't cross-platform, and they're going to require code modification. pythonw.exe (a windowless version of Python that ships with all recent Windows Python binaries) is closer, but it still doesn't quite make the cut: in particular, it fails to improve the situation for signal handling, and you still cannot easily launch a pythonw.exe application from the terminal and interact with it during startup (for example, to deliver dynamic startup arguments to your script, say, perhaps, a password, file path, etc), before "daemonizing".
In the end, I settled on using subprocess.Popen with the creationflags=subprocess.CREATE_NEW_PROCESS_GROUP keyword to create an independent, windowless process:
import subprocess
independent_process = subprocess.Popen(
'/path/to/pythonw.exe /path/to/file.py',
creationflags=subprocess.CREATE_NEW_PROCESS_GROUP
)
However, that still left me with the added challenge of startup communications and signal handling. Without going into a ton of detail, for the former, my strategy was:
pickle the important parts of the launching process' namespace
Store that in a tempfile
Add the path to that file in the daughter process' environment before launching
Extract and return the namespace from the "daemonization" function
For signal handling I had to get a bit more creative. Within the "daemonized" process:
Ignore signals in the daemon process, since, as mentioned, they all terminate the process immediately and without cleanup
Create a new thread to manage signal handling
That thread launches daughter signal-handling processes and waits for them to complete
External applications send signals to the daughter signal-handling process, causing it to terminate and complete
Those processes then use the signal number as their return code
The signal handling thread reads the return code, and then calls either a user-defined signal handler, or uses a cytpes API to raise an appropriate exception within the Python main thread
Rinse and repeat for new signals
That all being said, for anyone encountering this problem in the future, I've rolled a library called daemoniker that wraps both proper Unix daemonization and the above Windows strategy into a unified facade. The cross-platform API looks like this:
from daemoniker import Daemonizer
with Daemonizer() as (is_setup, daemonizer):
if is_setup:
# This code is run before daemonization.
do_things_here()
# We need to explicitly pass resources to the daemon; other variables
# may not be correct
is_parent, my_arg1, my_arg2 = daemonizer(
path_to_pid_file,
my_arg1,
my_arg2
)
if is_parent:
# Run code in the parent after daemonization
parent_only_code()
# We are now daemonized, and the parent just exited.
code_continues_here()
Two options come to mind:
Port your program into a windows service. You can probably share much of your code between the two implementations.
Does your program really use any daemon functionality? If not, you rewrite it as a simple server that runs in the background, manages communications through sockets, and perform its tasks. It will probably consume more system resources than a daemon would, but it would be quote platform independent.
In general the concept of a daemon is Unix specific, in particular expected behaviour with respect to file creation masks, process hierarchy, and signal handling.
You may find PEP 3143 useful wherein a proposed continuation of python-daemon is considered for Python 3.2, and many related daemonizing modules and implementations are discussed.
The reason it's unix only is that daemons are a Unix specific concept i.e a background process initiated by the os and usually running as a child of the root PID .
Windows has no direct equivalent of a unix daemon, the closest I can think of is a Windows Service.
There's a program called pythonservice.exe for windows . Not sure if it's supported on all versions of python though

Categories