On my local machine, I run a python script which contains this line
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)
This works fine.
Then I run the same code on a server and I get the following error message
'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import diag
ImportError: No module named swap
So what I did then is I inserted a print bashCommand which prints me than the command in the terminal before it runs it with os.system().
Of course, I get again the error (caused by os.system(bashCommand)) but before that error it prints the command in the terminal. Then I just copied that output and did a copy paste into the terminal and hit enter and it works...
Does anyone have a clue what's going on?
Don't use os.system. It has been deprecated in favor of subprocess. From the docs: "This module intends to replace several older modules and functions: os.system, os.spawn".
Like in your case:
import subprocess
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.
Prefer subprocess.run() over subprocess.check_call() and friends over subprocess.call() over subprocess.Popen() over os.system() over os.popen()
Understand and probably use text=True, aka universal_newlines=True.
Understand the meaning of shell=True or shell=False and how it changes quoting and the availability of shell conveniences.
Understand differences between sh and Bash
Understand how a subprocess is separate from its parent, and generally cannot change the parent.
Avoid running the Python interpreter as a subprocess of Python.
These topics are covered in some more detail below.
Prefer subprocess.run() or subprocess.check_call()
The subprocess.Popen() function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code ... which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.
Here's a paragraph from the documentation:
The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.
Unfortunately, the availability of these wrapper functions differs between Python versions.
subprocess.run() was officially introduced in Python 3.5. It is meant to replace all of the following.
subprocess.check_output() was introduced in Python 2.7 / 3.1. It is basically equivalent to subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
subprocess.check_call() was introduced in Python 2.5. It is basically equivalent to subprocess.run(..., check=True)
subprocess.call() was introduced in Python 2.4 in the original subprocess module (PEP-324). It is basically equivalent to subprocess.run(...).returncode
High-level API vs subprocess.Popen()
The refactored and extended subprocess.run() is more logical and more versatile than the older legacy functions it replaces. It returns a CompletedProcess object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.
subprocess.run() is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use subprocess.Popen() and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler Popen object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.
It should perhaps be emphasized that just subprocess.Popen() merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a "background" process. If it doesn't need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.
Avoid os.system() and os.popen()
Since time eternal (well, since Python 2.5) the os module documentation has contained the recommendation to prefer subprocess over os.system():
The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.
The problems with system() are that it's obviously system-dependent and doesn't offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python's reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).
PEP-324 (which was already mentioned above) contains a more detailed rationale for why os.system is problematic and how subprocess attempts to solve those issues.
os.popen() used to be even more strongly discouraged:
Deprecated since version 2.6: This function is obsolete. Use the subprocess module.
However, since sometime in Python 3, it has been reimplemented to simply use subprocess, and redirects to the subprocess.Popen() documentation for details.
Understand and usually use check=True
You'll also notice that subprocess.call() has many of the same limitations as os.system(). In regular use, you should generally check whether the process finished successfully, which subprocess.check_call() and subprocess.check_output() do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use check=True with subprocess.run() unless you specifically need to allow the subprocess to return an error status.
In practice, with check=True or subprocess.check_*, Python will throw a CalledProcessError exception if the subprocess returns a nonzero exit status.
A common error with subprocess.run() is to omit check=True and be surprised when downstream code fails if the subprocess failed.
On the other hand, a common problem with check_call() and check_output() was that users who blindly used these functions were surprised when the exception was raised e.g. when grep did not find a match. (You should probably replace grep with native Python code anyway, as outlined below.)
All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.
Understand and probably use text=True aka universal_newlines=True
Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.
(If the differences are not immediately obvious, Ned Batchelder's Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)
Deep down, Python has to fetch a bytes buffer and interpret it somehow. If it contains a blob of binary data, it shouldn't be decoded into a Unicode string, because that's error-prone and bug-inducing behavior - precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.
With text=True, you tell Python that you, in fact, expect back textual data in the system's default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python's ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)
If that's not what you request back, Python will just give you bytes strings in the stdout and stderr strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.
normal = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True,
text=True)
print(normal.stdout)
convoluted = subprocess.run([external, arg],
stdout=subprocess.PIPE, stderr=subprocess.PIPE,
check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode('utf-8'))
Python 3.7 introduced the shorter and more descriptive and understandable alias text for the keyword argument which was previously somewhat misleadingly called universal_newlines.
Understand shell=True vs shell=False
With shell=True you pass a single string to your shell, and the shell takes it from there.
With shell=False you pass a list of arguments to the OS, bypassing the shell.
When you don't have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.
On the other hand, when you don't have a shell, you don't have redirection, wildcard expansion, job control, and a large number of other shell features.
A common mistake is to use shell=True and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.
# XXX AVOID THIS BUG
buggy = subprocess.run('dig +short stackoverflow.com')
# XXX AVOID THIS BUG TOO
broken = subprocess.run(['dig', '+short', 'stackoverflow.com'],
shell=True)
# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(['dig +short stackoverflow.com'],
shell=True)
correct = subprocess.run(['dig', '+short', 'stackoverflow.com'],
# Probably don't forget these, too
check=True, text=True)
# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run('dig +short stackoverflow.com',
shell=True,
# Probably don't forget these, too
check=True, text=True)
The common retort "but it works for me" is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.
To briefly recap, correct usage looks like
subprocess.run("string for 'the shell' to parse", shell=True)
# or
subprocess.run(["list", "of", "tokenized strings"]) # shell=False
If you want to avoid the shell but are too lazy or unsure of how to parse a string into a list of tokens, notice that shlex.split() can do this for you.
subprocess.run(shlex.split("no string for 'the shell' to parse")) # shell=False
# equivalent to
# subprocess.run(["no", "string", "for", "the shell", "to", "parse"])
The regular split() will not work here, because it doesn't preserve quoting. In the example above, notice how "the shell" is a single string.
Refactoring Example
Very often, the features of the shell can be replaced with native Python code. Simple Awk or sed scripts should probably just be translated to Python instead.
To partially illustrate this, here is a typical but slightly silly example which involves many shell features.
cmd = '''while read -r x;
do ping -c 3 "$x" | grep 'min/avg/max'
done <hosts.txt'''
# Trivial but horrible
results = subprocess.run(
cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)
# Reimplement with shell=False
with open('hosts.txt') as hosts:
for host in hosts:
host = host.rstrip('\n') # drop newline
ping = subprocess.run(
['ping', '-c', '3', host],
text=True,
stdout=subprocess.PIPE,
check=True)
for line in ping.stdout.split('\n'):
if 'min/avg/max' in line:
print('{}: {}'.format(host, line))
Some things to note here:
With shell=False you don't need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.
The refactored code also illustrates just how much the shell really does for you with a very terse syntax -- for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)
Common Shell Constructs
For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.
Globbing aka wildcard expansion can be replaced with glob.glob() or very often with simple Python string comparisons like for file in os.listdir('.'): if not file.endswith('.png'): continue. Bash has various other expansion facilities like .{png,jpg} brace expansion and {1..100} as well as tilde expansion (~ expands to your home directory, and more generally ~account to the home directory of another user)
Shell variables like $SHELL or $my_exported_var can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. os.environ['SHELL'] (the meaning of export is to make the variable available to subprocesses -- a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The env= keyword argument to subprocess methods allows you to define the environment of the subprocess as a dictionary, so that's one way to make a Python variable visible to a subprocess). With shell=False you will need to understand how to remove any quotes; for example, cd "$HOME" is equivalent to os.chdir(os.environ['HOME']) without quotes around the directory name. (Very often cd is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day ...)
Redirection allows you to read from a file as your standard input, and write your standard output to a file. grep 'foo' <inputfile >outputfile opens outputfile for writing and inputfile for reading, and passes its contents as standard input to grep, whose standard output then lands in outputfile. This is not generally hard to replace with native Python code.
Pipelines are a form of redirection. echo foo | nl runs two subprocesses, where the standard output of echo is the standard input of nl (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the pipes module in the Python standard library or a number of more modern and versatile third-party competitors).
Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
Quoting in the shell is potentially confusing until you understand that everything is basically a string. So ls -l / is equivalent to 'ls' '-l' '/' but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).
Understand differences between sh and Bash
subprocess runs your shell commands with /bin/sh unless you specifically request otherwise (except of course on Windows, where it uses the value of the COMSPEC variable). This means that various Bash-only features like arrays, [[ etc are not available.
If you need to use Bash-only syntax, you can
pass in the path to the shell as executable='/bin/bash' (where of course if your Bash is installed somewhere else, you need to adjust the path).
subprocess.run('''
# This for loop syntax is Bash only
for((i=1;i<=$#;i++)); do
# Arrays are Bash-only
array[i]+=123
done''',
shell=True, check=True,
executable='/bin/bash')
A subprocess is separate from its parent, and cannot change it
A somewhat common mistake is doing something like
subprocess.run('cd /tmp', shell=True)
subprocess.run('pwd', shell=True) # Oops, doesn't print /tmp
The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.
A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent's environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.
The immediate fix in this particular case is to run both commands in a single subprocess;
subprocess.run('cd /tmp; pwd', shell=True)
though obviously this particular use case isn't very useful; instead, use the cwd keyword argument, or simply os.chdir() before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via
os.environ['foo'] = 'bar'
or pass an environment setting to a child process with
subprocess.run('echo "$foo"', shell=True, env={'foo': 'bar'})
(not to mention the obvious refactoring subprocess.run(['echo', 'bar']); but echo is a poor example of something to run in a subprocess in the first place, of course).
Don't run Python from Python
This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to import the other Python module into your calling script and call its functions directly.
If the other Python script is under your control, and it isn't a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)
If you need parallelism, you can run Python functions in subprocesses with the multiprocessing module. There is also threading which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)
Call it with subprocess
import subprocess
subprocess.Popen("cwm --rdf test.rdf --ntriples > test.nt")
The error you are getting seems to be because there is no swap module on the server, you should install swap on the server then run the script again
It is possible you use the bash program, with the parameter -c for execute the commands:
bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
output = subprocess.check_output(['bash','-c', bashCommand])
You can use subprocess, but I always felt that it was not a 'Pythonic' way of doing it. So I created Sultan (shameless plug) that makes it easy to run command line functions.
https://github.com/aeroxis/sultan
Also you can use 'os.popen'.
Example:
import os
command = os.popen('ls -al')
print(command.read())
print(command.close())
Output:
total 16
drwxr-xr-x 2 root root 4096 ago 13 21:53 .
drwxr-xr-x 4 root root 4096 ago 13 01:50 ..
-rw-r--r-- 1 root root 1278 ago 13 21:12 bot.py
-rw-r--r-- 1 root root 77 ago 13 21:53 test.py
None
According to the error you are missing a package named swap on the server. This /usr/bin/cwm requires it. If you're on Ubuntu/Debian, install python-swap using aptitude.
To run the command without a shell, pass the command as a list and implement the redirection in Python using [subprocess]:
#!/usr/bin/env python
import subprocess
with open('test.nt', 'wb', 0) as file:
subprocess.check_call("cwm --rdf test.rdf --ntriples".split(),
stdout=file)
Note: no > test.nt at the end. stdout=file implements the redirection.
To run the command using the shell in Python, pass the command as a string and enable shell=True:
#!/usr/bin/env python
import subprocess
subprocess.check_call("cwm --rdf test.rdf --ntriples > test.nt",
shell=True)
Here's the shell is responsible for the output redirection (> test.nt is in the command).
To run a bash command that uses bashisms, specify the bash executable explicitly e.g., to emulate bash process substitution:
#!/usr/bin/env python
import subprocess
subprocess.check_call('program <(command) <(another-command)',
shell=True, executable='/bin/bash')
copy paste this:
def run_bash_command(cmd: str) -> Any:
import subprocess
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
if error:
raise Exception(error)
else:
return output
subprocess.Popen() is prefered over os.system() as it offers more control and visibility. However, If you find subprocess.Popen() too verbose or complex, peasyshell is a small wrapper I wrote above it, which makes it easy to interact with bash from Python.
https://github.com/davidohana/peasyshell
The pythonic way of doing this is using subprocess.Popen
subprocess.Popen takes a list where the first element is the command to be run followed by any command line arguments.
As an example:
import subprocess
args = ['echo', 'Hello!']
subprocess.Popen(args) // same as running `echo Hello!` on cmd line
args2 = ['echo', '-v', '"Hello Again"']
subprocess.Popen(args2) // same as running 'echo -v "Hello Again!"` on cmd line
Related
I am calling different processes with the subprocess module. However, I have a question.
In the following code:
callProcess = subprocess.Popen(['ls', '-l'], shell=True)
and
callProcess = subprocess.Popen(['ls', '-l']) # without shell
Both work. After reading the docs, I came to know that shell=True means executing the code through the shell. So that means in absence, the process is directly started.
So what should I prefer for my case - I need to run a process and get its output. What benefit do I have from calling it from within the shell or outside of it?
The benefit of not calling via the shell is that you are not invoking a 'mystery program.' On POSIX, the environment variable SHELL controls which binary is invoked as the "shell." On Windows, there is no bourne shell descendent, only cmd.exe.
So invoking the shell invokes a program of the user's choosing and is platform-dependent. Generally speaking, avoid invocations via the shell.
Invoking via the shell does allow you to expand environment variables and file globs according to the shell's usual mechanism. On POSIX systems, the shell expands file globs to a list of files. On Windows, a file glob (e.g., "*.*") is not expanded by the shell, anyway (but environment variables on a command line are expanded by cmd.exe).
If you think you want environment variable expansions and file globs, research the ILS attacks of 1992-ish on network services which performed subprogram invocations via the shell. Examples include the various sendmail backdoors involving ILS.
In summary, use shell=False.
>>> import subprocess
>>> subprocess.call('echo $HOME')
Traceback (most recent call last):
...
OSError: [Errno 2] No such file or directory
>>>
>>> subprocess.call('echo $HOME', shell=True)
/user/khong
0
Setting the shell argument to a true value causes subprocess to spawn an intermediate shell process, and tell it to run the command. In other words, using an intermediate shell means that variables, glob patterns, and other special shell features in the command string are processed before the command is run. Here, in the example, $HOME was processed before the echo command. Actually, this is the case of command with shell expansion while the command ls -l considered as a simple command.
source: Subprocess Module
An example where things could go wrong with Shell=True is shown here
>>> from subprocess import call
>>> filename = input("What file would you like to display?\n")
What file would you like to display?
non_existent; rm -rf / # THIS WILL DELETE EVERYTHING IN ROOT PARTITION!!!
>>> call("cat " + filename, shell=True) # Uh-oh. This will end badly...
Check the doc here: subprocess.call()
Executing programs through the shell means that all user input passed to the program is interpreted according to the syntax and semantic rules of the invoked shell. At best, this only causes inconvenience to the user, because the user has to obey these rules. For instance, paths containing special shell characters like quotation marks or blanks must be escaped. At worst, it causes security leaks, because the user can execute arbitrary programs.
shell=True is sometimes convenient to make use of specific shell features like word splitting or parameter expansion. However, if such a feature is required, make use of other modules are given to you (e.g. os.path.expandvars() for parameter expansion or shlex for word splitting). This means more work, but avoids other problems.
In short: Avoid shell=True by all means.
The other answers here adequately explain the security caveats which are also mentioned in the subprocess documentation. But in addition to that, the overhead of starting a shell to start the program you want to run is often unnecessary and definitely silly for situations where you don't actually use any of the shell's functionality. Moreover, the additional hidden complexity should scare you, especially if you are not very familiar with the shell or the services it provides.
Where the interactions with the shell are nontrivial, you now require the reader and maintainer of the Python script (which may or may not be your future self) to understand both Python and shell script. Remember the Python motto "explicit is better than implicit"; even when the Python code is going to be somewhat more complex than the equivalent (and often very terse) shell script, you might be better off removing the shell and replacing the functionality with native Python constructs. Minimizing the work done in an external process and keeping control within your own code as far as possible is often a good idea simply because it improves visibility and reduces the risks of -- wanted or unwanted -- side effects.
Wildcard expansion, variable interpolation, and redirection are all simple to replace with native Python constructs. A complex shell pipeline where parts or all cannot be reasonably rewritten in Python would be the one situation where perhaps you could consider using the shell. You should still make sure you understand the performance and security implications.
In the trivial case, to avoid shell=True, simply replace
subprocess.Popen("command -with -options 'like this' and\\ an\\ argument", shell=True)
with
subprocess.Popen(['command', '-with','-options', 'like this', 'and an argument'])
Notice how the first argument is a list of strings to pass to execvp(), and how quoting strings and backslash-escaping shell metacharacters is generally not necessary (or useful, or correct).
Maybe see also When to wrap quotes around a shell variable?
If you don't want to figure this out yourself, the shlex.split() function can do this for you. It's part of the Python standard library, but of course, if your shell command string is static, you can just run it once, during development, and paste the result into your script.
As an aside, you very often want to avoid Popen if one of the simpler wrappers in the subprocess package does what you want. If you have a recent enough Python, you should probably use subprocess.run.
With check=True it will fail if the command you ran failed.
With stdout=subprocess.PIPE it will capture the command's output.
With text=True (or somewhat obscurely, with the synonym universal_newlines=True) it will decode output into a proper Unicode string (it's just bytes in the system encoding otherwise, on Python 3).
If not, for many tasks, you want check_output to obtain the output from a command, whilst checking that it succeeded, or check_call if there is no output to collect.
I'll close with a quote from David Korn: "It's easier to write a portable shell than a portable shell script." Even subprocess.run('echo "$HOME"', shell=True) is not portable to Windows.
Anwser above explains it correctly, but not straight enough.
Let use ps command to see what happens.
import time
import subprocess
s = subprocess.Popen(["sleep 100"], shell=True)
print("start")
print(s.pid)
time.sleep(5)
s.kill()
print("finish")
Run it, and shows
start
832758
finish
You can then use ps -auxf > 1 before finish, and then ps -auxf > 2 after finish. Here is the output
1
cy 71209 0.0 0.0 9184 4580 pts/6 Ss Oct20 0:00 | \_ /bin/bash
cy 832757 0.2 0.0 13324 9600 pts/6 S+ 19:31 0:00 | | \_ python /home/cy/Desktop/test.py
cy 832758 0.0 0.0 2616 612 pts/6 S+ 19:31 0:00 | | \_ /bin/sh -c sleep 100
cy 832759 0.0 0.0 5448 532 pts/6 S+ 19:31 0:00 | | \_ sleep 100
See? Instead of directly running sleep 100. it actually runs /bin/sh. and the pid it prints out is actually the pid of /bin/sh. After if you call s.kill(), it kills /bin/sh but sleep is still there.
2
cy 69369 0.0 0.0 533764 8160 ? Ssl Oct20 0:12 \_ /usr/libexec/xdg-desktop-portal
cy 69411 0.0 0.0 491652 14856 ? Ssl Oct20 0:04 \_ /usr/libexec/xdg-desktop-portal-gtk
cy 832646 0.0 0.0 5448 596 pts/6 S 19:30 0:00 \_ sleep 100
So the next question is , what can /bin/sh do? Every linux user knows it, heard it, and uses it. But i bet there are so many people who doesn't really understand what is shell indeed. Maybe you also hear /bin/bash, they're similar.
One obvious function of shell is for users convenience to run linux application. because of shell programm like sh or bash, you can directly use command like ls rather than /usr/bin/ls. it will search where ls is and runs it for you.
Other function is it will interpret string after $ as environment variable. You can compare these two python script to findout yourself.
subprocess.call(["echo $PATH"], shell=True)
subprocess.call(["echo", "$PATH"])
And the most important, it makes possible to run linux command as script. Such as if else are introduced by shell. it's not native linux command
let's assume you are using shell=False and providing the command as a list. And some malicious user tried injecting an 'rm' command.
You will see, that 'rm' will be interpreted as an argument and effectively 'ls' will try to find a file called 'rm'
>>> subprocess.run(['ls','-ld','/home','rm','/etc/passwd'])
ls: rm: No such file or directory
-rw-r--r-- 1 root root 1172 May 28 2020 /etc/passwd
drwxr-xr-x 2 root root 4096 May 29 2020 /home
CompletedProcess(args=['ls', '-ld', '/home', 'rm', '/etc/passwd'], returncode=1)
shell=False is not a secure by default, if you don't control the input properly. You can still execute dangerous commands.
>>> subprocess.run(['rm','-rf','/home'])
CompletedProcess(args=['rm', '-rf', '/home'], returncode=0)
>>> subprocess.run(['ls','-ld','/home'])
ls: /home: No such file or directory
CompletedProcess(args=['ls', '-ld', '/home'], returncode=1)
>>>
I am writing most of my applications in container environments, I know which shell is being invoked and i am not taking any user input.
So in my use case, I see no security risk. And it is much easier creating long string of commands. Hope I am not wrong.
This question already has answers here:
Actual meaning of 'shell=True' in subprocess
(7 answers)
Closed 7 years ago.
It seems whenever I try to use Python's subprocess module, I find I still don't understand some things. Currently, I was trying to join 3 mp4 files from within a Python module.
When I tried
z ='MP4Box -cat test_0.mp4 -cat test_1.mp4 -cat test_2.mp4 -new test_012d.mp4'
subprocess.Popen(z,shell=True)
Everything worked.
When I tried
z = ['MP4Box', '-cat test_0.mp4', '-cat test_1.mp4', '-cat test_2.mp4', '-new test_012d.mp4']
subprocess.Popen(z,shell=False)
I got the following error:
Option -cat test_0.mp4 unknown. Please check usage
I thought that for shell=False I just needed to supply a list where the first element was the executable I wanted to run and each succeeding element was an argument to that executable. Am I mistaken in this belief, or is there a correct way to create the command I wanted to use?
Also, are there any rules for using Shell=True in subprocess.Popen? So far, all I really know(?) is "don't do it - you can expose your code to Shell injection attacks". Why does Shell=False avoid this problem? Is there ever an actual advantage to using 'Shell=True`?
If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.
When shell=True is dangerous?
If we execute shell commands that might include unsanitized input from an untrusted source, it will make a program vulnerable to shell injection, a serious security flaw which can result in arbitrary command execution. For this reason, the use of shell=True is strongly discouraged in cases where the command string is constructed from external input
Eg. (Taken from docs)
>>> from subprocess import call
>>> filename = input("What file would you like to display?\n")
What file would you like to display?
non_existent; rm -rf / #
>>> call("cat " + filename, shell=True) # Uh-oh. This will end badly..
You have to give every single argument as one element of a list:
z = ['MP4Box', '-cat', 'test_0.mp4', '-cat', 'test_1.mp4', '-cat', 'test_2.mp4', '-new', 'test_012d.mp4']
subprocess.Popen(z,shell=False)
This is normally what you want to do, because you don't need to escape especial characters of the shell in filenames.
I am using Python script to invoke a Java virtual machine. The following command works:
subprocess.call(["./rvm"], shell=False) # works
subprocess.call(["./rvm xyz"], shell=True) # works
But,
subprocess.call(["./rvm xyz"], shell=False) # not working
does not work. Python documentation advices to avoid shell=True.
You need to split the commands into separate strings:
subprocess.call(["./rvm", "xyz"], shell=False)
A string will work when shell=True but you need a list of args when shell=False
The shlex module is useful more so for more complicated commands and dealing with input but good to learn about:
import shlex
cmd = "python foo.py"
subprocess.call(shlex.split(cmd), shell=False)
shlex tut
If you want to use shell=True, this is legit, otherwise it would have been removed from the standard library. The documentation doesn't say to avoid it, it says:
Executing shell commands that incorporate unsanitized input from an untrusted source makes a program vulnerable to shell injection, a serious security flaw which can result in arbitrary command execution. For this reason, the use of shell=True is strongly discouraged in cases where the command string is constructed from external input.
But in your case you are not constructing the command from user input, your command is constant, so your code doesn't present the shell injection issue. You are in control of what the shell will execute, and if your code is not malicious per se, you are safe.
Example of shell injection
To explain why the shell injection is so bad, this is the example used in the documentation:
>>> from subprocess import call
>>> filename = input("What file would you like to display?\n")
What file would you like to display?
non_existent; rm -rf / #
>>> call("cat " + filename, shell=True) # Uh-oh. This will end badly...
Edit
With the additional information you have provided editing the question, stick to Padraic's answer. You should use shell=True only when necessary.
In addition to Enrico.bacis' answer, there are two ways to call programs. With shell=True, give it a full command string. With shell=False, give it a list.
If you do shell tricks like *.jpg or 2> /dev/null, use shell=True; but in general I suggest shell=False -- it's more durable as Enrico said.
source
import subprocess
subprocess.check_call(['/bin/echo', 'beer'], shell=False)
subprocess.check_call('/bin/echo beer', shell=True)
output
beer
beer
Instead of using the filename directory, add the word python in front of it, provided that you've added python path to your environmental variables. If you're not sure, you can always rerun the python installer, once again, provided that you have a new version of python.
Here's what I mean:
import subprocess
subprocess.Popen('python "C:/Path/To/File/Here.py"')
Question can be related to Use python subprocess module like a command line simulator
I have written some infrastructure code called my_shell to which you can pass shell commands of my application that looks like this
class ApplicationTestShell(object):
def __init__(self):
'''
Constructor
'''
self.play_ground_dir = "/var/tmp/MyAppDir"
ensure_dir_exists_and_empty(self.play_ground_dir)
def execute_command(self, command, on_success = None, on_failure = None):
p = create_shell_process(self, self.play_ground_dir)
sout, serr = p.communicate(input = command)
if p.returncode == 0:
on_success(sout)
else:
on_failure(serr)
def create_shell_process(self, cwd):
return Popen("/bin/bash", env= {WHAT DO I DO HERE?},cwd = test_dir, stdout=PIPE, stderr=PIPE, stdin=PIPE)
The interesting bit to me here is the env parameter. Python expects like a 'map' datastructure of all environment variable. My application requires several variables exported and set. The script for setting and exporting is generated by running say '/bin/appload myapp' (Assume appload is always available on the path). What I do currently
is when I call p.communicate I do the following
p.communicate(input = "eval `/bin/appload myapp`;" + command)
So basically before running the command I call the infrastructure setup.
Is there any way to do this in a better fashion in Python. I somehow want to push the eval /bin/appload part to the env parameter on the Popen class OR as part of the shell creation process.
What are the problems with my current implementation? (I feel it is hacky but I may be wrong)
It depends on how /bin/appload myapp works. If it only guarantees that it will output bash syntax, then parsing that output in Python in order to construct the environment object there is almost certainly more trouble than it's worth (you might need to support parameter and variable expansion, subshells, process substitution, etc, etc). On the other hand, if you are sure that /bin/appload myapp will only ever output lines of the form "VARIABLENAME=someword", then that's pretty trivial to parse in Python and you could move it into your Python code if you like.
There are an awful lot of different directions you could go with these requirements; you could capture the output of appload myapp into a tempfile and set the subprocess's $BASH_ENV to that filename; that would cause the shell to source your environment setup before running your command in a way that some might consider cleaner. You could give your command (with the eval-ing prefix) as the first argument to Popen and pass shell=True, and let Popen do the bash invocation on its own (setting $SHELL explicitly to bash if necessary). You could use bash's -c option to specify the code to run on the command line rather than via stdin. You could have a multi-tiered approach by invoking a shell from Python which eval's the appload myapp environment and then exec's another shell underneath it, so that the first doesn't show up in ps listings and the command given to create_shell_process has the shell all to itself (although that shouldn't really matter). You could do a lot of things, depending on what your concerns are with respect to how the shell is invoked, how it looks in ps listings, whether you want your command to still be run if the appload myapp output produces an error when eval'd, etc. But for a general solution, I think what you have is perfectly fine.
I don't see any real problems with the implementation, besides cosmetic things or minor things that probably only came from copying and pasting the code: create_shell_process doesn't use its cwd parameter, and the on_success and on_failure parameters look like they're optional but the defaults will break things (you can't call None).
I'm apprenticing into system administration without schooling, so sometimes I'm missing what is elementary information to many others.
I'm attempting to give my stdout line another argument before printing, but I'm not sure which process I should use, and I'm a bit fuzzy on the commands for subprocess if that's what I should be using.
My current code is:
f = open('filelist', 'r')
searchterm = f.readline()
f.close()|
#takes line from a separate file and gives it definition so that it may be callable.
import commands
commands.getoutput('print man searchterm')
This is running, but not giving me an ouput to the shell. My more important question is though, am I using the right command to get my preferred process? Should I be using one of the subprocess commands instead? I tried playing around with popen, but I don't understand it fully enough to use it correctly.
Ie, I was running
subprocess.Popen('print man searchterm')
but I know without a doubt that's not how you're supposed to run it. Popen requires more arguments than I have given it, like file location and where to run it (Stdout or stderr). But I was having trouble making these commands work. Would it be something like:
subprocess.Popen(pipe=stdout 'man' 'searchterm')
#am unsure how to give the program my arguments here.
I've been researching everywhere, but it is such a widely used process I seem to be suffering from a surplus of information rather than not enough. Any help would be appreciated, I'm quite new.
Preemptive thanks for any help.
The cannonical way to get data from a separate process is to use subprocess (commands is deprecated)
import subprocess
p = subprocess.Popen(['print','man','searchitem'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdoutdata, stderrdata = p.communicate()
Note that some convenience functions exist for splitting strings into lists of arguments. Most notably is shlex.split which will take a string and split it into a list the same way a shell does. (If nothing is quoted in the string, str.split() works just as well).
commands is deprecated in Python 2.6 and later, and has been removed in Python 3. There's probably no situation where it's preferable in new code, even if you are stuck with Python 2.5 or earlier.
From the docs:
Deprecated since version 2.6: The commands module has been removed in
Python 3. Use the subprocess module instead.
To run man searchterm in a separate process and display the result in the terminal, you could do this:
import subprocess
proc = subprocess.Popen('man searchterm'.split())
proc.communicate()