I've been trying to understand for a while now what's the difference between subprocess.call and subprocess.run. I know the last one is new on Python 3.5 and both are based on subprocess.Popen, but I'm not able to understand the difference yet.
The definition of subprocess.call() clearly mentions:
It is equivalent to:
run(...).returncode
(except that the input and check parameters are not supported)
As the Python 3.5's subprocess document says:
Prior to Python 3.5, these three functions (i.e. .call(), .check_call(), .check_output()) comprised the high level API to subprocess. You can now use run() in many cases, but lots of existing code calls these functions.
It is a common practice that when some functions are replaced, they are not instantly deprecated but there is a support window for them for some versions. This helps in preventing the breakage of older code when the language version is upgraded. I do not know whether .call() is going to be replaced in the future or not. But based on the document, what I know is that they are pretty much same.
To make it clear for anyone wanting to know which to use:
subprocess.run() is the recommended approach for all use cases it can handle. The suprocess documentation states:
The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.
subprocess.call() is part of the Older high-level API (Prior to Python 3.5).
I'm not sure I agree with the other answers.
I just had a very frustrating time with a bash script which starts a daemon process (Elasticsearch). The command merely supplies the path to the executable Bash script.
But subprocess.run(...) does not return from this, whereas subprocess.call(...) does.
From my experience, if you then stop the process (e.g. the Terminal if running from a Terminal) using subprocess.run(...) this kills off the daemon process started in it. But this is not the case with subprocess.call(...): the daemon carries on happily.
In both cases I set the kwarg shell=True.
I also tried subprocess.run ẁith shell=False (i.e. default if you omit shell): no change.
I can't see any other possible options in subprocess.run which might overcome this, so it appears, as far as I can tell that subprocess.call is fundamentally different, despite what the docs appear to say. At the time of writing the docs say "You can now use run() in many cases, but lots of existing code calls these functions." (i.e. the older functions, including call).
What is particularly strange, and frustrating, is that (obviously) when you run a script which starts a daemon, such as:
./bin/elasticsearch -d -p pid
... it just returns and you can close the Terminal quite happily. So there appears something quite odd about subprocess.run, which some super-expert might care to explain.
I am not fully clear on the differences either.
I can say that you use subprocess.call() when you want the program to wait for the process to complete before moving onto the next process. In the case of subprocess.run(), the program will attempt to run all the processes at once, inevitably causing the program to crash.
Related
C supplies the standard function system to run a subprocess using the shell, and many languages provide similar functions, like AWK, Perl (with a single argument), and PHP. Sometimes those functions are criticized as being unsuitable for general use, either on security grounds or because the shell is not portable or is not the one used interactively.
Some other languages seem to agree: they provide only a means of running a process without the shell, like Java (which tokenizes any single string argument itself) and Tcl. Python provides both a direct wrapper and a sophisticated replacement that can avoid using the shell and explicitly recommends the latter (as does the user community).
Certainly the shell is unnecessary complexity for many applications; running an external process at all can bring in issues of deadlock, orphan processes, ambiguous exit statuses, and file descriptor sharing and is unnecessary in cases like running mkdir or echo $VAR. However, assuming that system exists for a reason, when is it the right tool to use?
Even assuming a use case for which it's appropriate to run an external process and in particular to run one via the shell (without being able to filter output as with popen), for C and Python (that uses the actual C system(3)) there are additional caveats. POSIX specifies additional behavior for system: it ignores SIGINT and SIGQUIT and blocks SIGCHLD during its execution. The rationale is that the user (who can send SIGINT and SIGQUIT from the terminal) is interacting with the subprocess, not the parent, during its execution, and that system must handle the SIGCHLD for its child process without the application's interference.
This directly implies the answer to the question: it is appropriate to use system only when
The user has directly asked for a particular shell command to be executed (e.g., with ! in less), and
The application need not react to any other child process exiting during this time (e.g, it should not be multithreaded).
If #1 is not satisfied, the user is likely to send a terminal signal expecting it to kill the whole process and have it kill only the (unexpected if not invisible) child. The Linux man pages caution particularly about using it in a loop that the user cannot then interrupt. It is possible to notice that a child has exited with a signal and reraise it, but this is unreliable because some programs (e.g., Python) exit upon receiving certain signals rather than reraising it to indicate why they exited—and because the shell (mandated by system!) conflates exit statuses with signal-kill statuses.
In Python the error-handling problems are compounded by the fact that os.system follows the C exit-status (read: error code) convention instead of reporting failure as an exception, inviting the user to ignore the exit status of the child.
The answer is simple (in theory), because it's the same answer that applies to many other programming questions: it's appropriate to use system() when it makes the programmer's life easier, and makes the user's life no harder.
Spotting when this is true, however, requires considerable judgement, and probably we won't always get it right. But, again, that's true of many judgement calls in programming.
Since most shells are written in C, there's no reason in principle why anything done using system() can't be done without it. However, sometimes it requires a whole heap of coding to do what can be done in one line by invoking a shell. The same applies to popen() which, I guess, raises exactly the same kinds of questions.
Using system() raises portability, thread safety, and signal-management concerns.
My experience, unfortunately, is that the situations where system() gives the most benefit (to the programmer) are precisely the ones where it will be least portable.
Sometimes concerns like this will suggest a different approach, and sometimes they won't matter -- it depends on the application.
So, I have recently been tasked with writing a function in python 2 that can time the execution of another function. This is simple enough, but the catch is I have to do it WITHOUT importing any modules; this naturally includes time, timeit, etc.
Using only built in functions and statements (e.g. sum(), or, yield) is this even possible?
I don't want to see a solution, I need to work that out for myself, but I would greatly appreciate knowing if this is even possible. If not, then I'd rather not waste the time bashing my head against the proverbial brick wall.
If you're on a UNIX (or maybe just Linux) system, yes. Read from /proc/uptime.
It's not super efficient, but hey, builtin functions only.
I'm not sure of a way to do this on Windows.
Simple answer: No, it is not possible.
Here you have a link to the python 2.7 built in functions docs. None of them allow you to measure time. You are forced to use a module.
Python was thought to be used with its modules and it includes a great amalgam of them. I would recomend you to use time for this one.
Sorry for breaking your dreams <3
Depending on the OS you're running and how messy solution can you accept, you can do this without imports.
Ordered by increasing insanity:
Some systems provide virtual files which contain various timers. You can get a sub-second resolution at least on a Linux system by reading a counter from that kind of file before and after execution. Not sure about others.
Can you reuse existing imports? If the file already contains any of threading, multiprocessing, signal, you can construct a timer out of them.
If you have some kind of scheduler running on your system (like cron) you can inject a job into it (by creating a file), which will print out timestamps every time it's run.
You can follow a log file on a busy system and assume the last message was close to the time you read it.
Depending on what accuracy you want, you could measure the amount of time each python bytecode operation takes, then write an interpreter for the code available via function.__code__.co_code. While you run the code, you can sum up all the expected execution times. This is the only pure-python solution which doesn't require a specific OS / environment.
If you're running on a system which allows process memory introspection, you can open it and inject any functionality without technically importing anything.
Two "cheating" methods.
If you're avoiding the import keyword, you can use __import__ to import time, which is actually a module builtin to the python2 executable.
If you know the location of the Python installation, you can use execfile on os.py and use the times function.
I am trying to write cross-platform code in Python. The code should be spawning new shells and run code.
This lead me to look at Python's subprocess tool and in particular its Popen part. So I read through the documentation for this class Popen doc and find too many "if on Unix/if on Windows" statements. Not very cross-platform, unless I have misunderstood the doc.
What is going on? I understand that the two operating systems are different, but really, there is no way to write a common interface? I mean, the same arguments "windows is different than unix" can be applied to os, system, etc., and they all seem 100 % cross-platform.
The problem is that process management is something deeply engrained in the operating system and differs greatly not only in the implementation but often even in the basic functionality.
It's actually often rather easy to abstract code in for example the os class. Both C libraries, be it *nix or Windows, implement reading files as an I/O stream, so you can even write rather low level file operation functions which work the same in Windows and *nix.
But processes differ greatly. In *nix for example processes are all hierarchical, every process has a parent and all processes go back to the init system running under PID 1. A new process gets created by forking itself, checking if it's the parent or the child and then continuing accordingly.
In Windows processes are strictly non-hierarchical and get created by the CreateProcess () system call, for which you need special privileges.
There a good deal more differences, these were just two examples, but I hope it shows that implementing a platform independent process library is a very daunting task.
I've been trying to create a C++ program that embeds multiple python threads. Due to the nature of the program the advantage of multitasking comes from asynchronous I/O; but due to some variables that need to be altered between context switching I need to control the scheduling. I thought that because of python's GIL lock this would be simple enough, but it's turning out not to be: python wants to use POSIX threads rather than software threads, I can't figure out from the documentation what happens if I store the result of PyEval_SaveThread() and don't call PyEval_RestoreThread() in the same function--so presumably I'm not supposed to be doing that, etc.
Is it possible to create a custom scheduler for embedded python threads, or was python basically designed so that it can't be done?
It turns out that using PyEval_SaveThread() and PyEval_RestoreThread() is unnecessary, basically I used coroutines to run the scripts and control the scheduling. In this case from libPCL. However this isn't really much of a solution because if python encounters a syntax error it will segfault if it is in a coroutine, oddly enough even if there is only one python script running in one coroutine this will still happen. But at the very least they don't seem to conflict with each other.
My script accepts arbitrary-length and -content strings of Python code, then runs them inside exec() statements. If the time to run the arbitrary code passes over some predetermined limit, then the exec() statement needs to exit and a boolean flag needs to be set to indicate that a premature exit has occurred.
How can this be accomplished?
Additional information
These pieces of code will be running in parallel in numerous threads (or at least as parallel as you can get with the GIL).
If there is an alternative method in another language, I am willing to try it out.
I plan on cleaning the code to prevent access to anything that might accidentally damage my system (file and system access, import statements, nested calls to exec() or eval(), etc.).
Options I've considered
Since the exec() statements are running in threads, use a poison pill to kill the thread. Unfortunately, I've read that poison pills do not work for all cases.
Running the exec() statements inside processes, then using process.terminate() to kill everything. But I'm running on Windows and I've read that process creation can be expensive. It also complicates communication with the code that's managing all of this.
Allowing only pre-written functions inside the exec() statements and having those functions periodically check for an exit flag then perform clean-up as necessary. This is complicated, time-consuming, and there are too many corner-cases to consider; I am looking for a simpler solution.
I know this is a bit of an oddball question that deserves a "Why would you ever want to allow arbitrary code to run in an exec() statement?" type of response. I'm trying my hand at a bit of self-evolving code. This is my major stumbling block at the moment: if you allow your code to do almost anything, then it can potentially hang forever. How do you regain control and stop it when it does?
This isn't a very detailed answer, but its more than I wanted to put into a comment.
You may want to consider something like this other question for creating functions with timeouts, using multiprocessing as a start.
The problem with threads is that you probably can't use your poison pill approach, as they are not workers taking many small bits of tasks. They would be sitting there blocking on a statement. It would never get the value to exit.
You mentioned that your concern about using processes on Windows is that they are expensive. So what you might do is create your own kind of process pool (a list of processes). They are all pulling from a queue, and you submit new tasks to the queue. If any process exceeds the timeout, you kill it, and replace it in the pool with a new one. That way you limit the overhead of creating new processes only to when they are timing out, instead of creating a new one for every task.
There are a few different options here.
First, start with jdi's suggestion of using multiprocessing. It may be that Windows process creation isn't actually expensive enough to break your use case.
If it actually is a problem, what I'd personally do is use Virtual PC, or even User Mode Linux, to just run the same code in another OS, where process creation is cheap. You get a free sandbox out of that, as well.
If you don't want to do that, jdi's suggestion of processes pools is a bit more work, but should work well as long as you don't have to kill processes very often.
If you really do want everything to be threads, you can do so, as long as you can restrict the way the jobs are written. If the jobs can always be cleanly unwound, you can kill them just by raising an exception. Of course they also have to not catch the specific exception you choose to raise. Obviously neither of these conditions is realistic as a general-purpose solution, but for your use case, it may be fine. The key is to make sure your code evolver never inserts any manual resource-management statements (like opening and closing a file); only with statements. (Alternatively, insert the open and close, but inside a try/finally.) And that's probably a good idea even if you're not doing things this way, because spinning off hundreds of processes that, e.g., each leak as many file handles as they can until they either time out or hit the file limit would slow your machine to a crawl.
If you can restrict the code generator/evolver even further, you could use some form of cooperative threading (e.g., greenlets), which makes things even nicer.
Finally, you could switch from CPython to a different Python implementation that can run multiple interpreter instances in a single process. I don't know whether jython or IronPython can do so. PyPy can do that, and also has a restricted-environment sandbox, but unfortunately I think both of those—and Python 3.x support—are not-ready-for-prime-time features, which means you either have to get a special build of PyPy (probably without the JIT optimizer), or build it yourself. This might be the best long-term solution, but it's probably not what you want today.