I'm trying to change the environment of my Python execution process. It seems that the correct way to do that should be to interact with os.environ. However, I the following assertion fails:
import os, subprocess
os.environ['ATESTVARIABLE'] = 'value'
value = subprocess.check_output(['echo', '$ATESTVARIABLE'], shell=True)
assert 'value' in value
Is there something else that I should be doing to change the current environment? What flaw in my understanding of Python is revealed by the above code :)?
(Note that within the current Python interpreter, os.environ['ATESTVARIABLE'] contains the expected value. I am setting up to run some code which requires a specific environment variable, and which may launch external processes. Obviously, if I wanted to control the environment of a specific subprocess, I'd use the env keyword.)
Looking through the source code for the subprocess module, it's because using a list of arguments with shell=True will do the equivalent of...
/bin/sh -c 'echo' '$ATESTVARIABLE'
...when what you want is...
/bin/sh -c 'echo $ATESTVARIABLE'
The following works for me...
import os, subprocess
os.environ['ATESTVARIABLE'] = 'value'
value = subprocess.check_output('echo $ATESTVARIABLE', shell=True)
assert 'value' in value
Update
FWIW, the difference between the two is that the first form...
/bin/sh -c 'echo' '$ATESTVARIABLE'
...will just call the shell's built-in echo with no parameters, and set $0 to the literal string '$ATESTVARIABLE', for example...
$ /bin/sh -c 'echo $0'
/bin/sh
$ /bin/sh -c 'echo $0' '$ATESTVARIABLE'
$ATESTVARIABLE
...whereas the second form...
/bin/sh -c 'echo $ATESTVARIABLE'
...will call the shell's built-in echo with a single parameter equal to the value of the environment variable ATESTVARIABLE.
Actually, what's wrong in the following code:
import os, subprocess
os.environ['ATESTVARIABLE'] = 'value'
value = subprocess.check_output(['echo', '$ATESTVARIABLE'], shell=True)
assert 'value' in value
is that you didn't read thoroughly the help page of subprocess:
On Unix with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself. That is to say, Popen does the equivalent of:
Popen(['/bin/sh', '-c', args[0], args[1], ...])
This means that if you call subprocess.check_out() with an array as a first parameter, you won't get the expected result. You shall retry with the following code:
import os, subprocess
os.environ['ATESTVARIABLE'] = 'value'
value = subprocess.check_output('echo $ATESTVARIABLE', shell=True)
assert 'value' in value
and it should work as you expect!
Otherwise, your understanding of environment variables is correct. When you modify the environment, that environment is given to every forked child of your current process.
Related
I'm trying to get the time of a process, and when I use the keyword time in the shell, I get a nicer output as:
real 0m0,430s
user 0m0,147s
sys 0m0,076s
Instead of the /usr/bin/time which gives a different output. When I try to run it through python's subprocess library with subprocess.call('time command args',shell=True) it gives me the /usr/bin/time instead of the keyword. How can I use the keyword function as opposed to the current one?
shell=True causes subprocess to use /bin/sh, not bash. You need the executable argument as well
subprocess.call('time command args', shell=True, executable='/bin/bash')
Adjust the path to bash as necessary.
I have written a python code to check if file exists in Hadoop file system or not. Python function receives location passed from another function and bash code within checks if location exists.
def check_file_exists_in_hadoop(loc):
yourdir = "/somedirectory/inhadoop/"+loc
cmd = '''
hadoop fs -test -d ${yourdir};
if [ $? -eq 0 ]
then
echo "Directory exists!"
else
echo "Directory does not exists!"
fi
'''
res = subprocess.check_output(cmd, shell=True)
output = (str(res, "utf-8").strip())
print(output)
if output == "Directory exists!":
print("Yay!!!!")
else:
print("Oh no!!!!")
How to pass 'yourdir' variable inside bash portion of code.
All that playing around in shells looks awkward, why not just do:
def check_file_exists_in_hadoop(loc):
path = "/somedirectory/inhadoop/" + loc
res = subprocess.run(["hadoop", "fs", "-test", "-d", path])
return res.returncode == 0
You can execute as:
if check_file_exists_in_hadoop('foo.txt'):
print("Yay!!!!")
else:
print("Oh noes!!!!")
When you execute/run a process/program in a Unix-like system, it receives an array of arguments (exposed as e.g., sys.argv in Python). you can construct these in various ways but passing them to run gives you the most direct control. You can of course use a shell to do this, but starting up a shell just to do this seems unnecessary. Given that this argument list is just a list of strings in Python you can use normal list/string manipulations to construct whatever you need.
Using a shell can be useful, but as Gilles says you need to be careful to sanitise/escape your input — not everybody loves little bobby tables!
Pass the string as an argument to the shell. Instead of using shell=True, which runs ['sh', '-c', cmd] under the hood, invoke a shell explicitly. After the shell code, the first argument is the shell or script name (which is unused here), then the next argument is available as "$1" in the shell snippet, the next argument as "$2", etc.
cmd = '''
hadoop fs -test -d "$1";
…
'''
res = subprocess.check_output(['sh', '-c', cmd, 'sh', yourdir])
Alternatively, pass the string as an environment variable.
cmd = '''
hadoop fs -test -d "$yourdir";
…
'''
env = os.environ.copy()
env['yourdir'] = yourdir
res = subprocess.check_output(cmd, shell=True, env=env)
In the shell snippet, note the double quotes around $1 or $yourdir.
Do not interpolate the string into the shell command directly, i.e. don't use things like 'test -d {}'.format(yourdir). That doesn't work if the string contains shell special characters: it's a gaping security hole. For example if yourdir is a; rm -rf ~ then you've just kissed your data goodbye.
When using subprocess.Popen(args, shell=True) to run "gcc --version" (just as an example), on Windows we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc (GCC) 3.4.5 (mingw-vista special r3) ...
So it's nicely printing out the version as I expect. But on Linux we get this:
>>> from subprocess import Popen
>>> Popen(['gcc', '--version'], shell=True)
gcc: no input files
Because gcc hasn't received the --version option.
The docs don't specify exactly what should happen to the args under Windows, but it does say, on Unix, "If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional shell arguments." IMHO the Windows way is better, because it allows you to treat Popen(arglist) calls the same as Popen(arglist, shell=True) ones.
Why the difference between Windows and Linux here?
Actually on Windows, it does use cmd.exe when shell=True - it prepends cmd.exe /c (it actually looks up the COMSPEC environment variable but defaults to cmd.exe if not present) to the shell arguments. (On Windows 95/98 it uses the intermediate w9xpopen program to actually launch the command).
So the strange implementation is actually the UNIX one, which does the following (where each space separates a different argument):
/bin/sh -c gcc --version
It looks like the correct implementation (at least on Linux) would be:
/bin/sh -c "gcc --version" gcc --version
Since this would set the command string from the quoted parameters, and pass the other parameters successfully.
From the sh man page section for -c:
Read commands from the command_string operand instead of from the standard input. Special parameter 0 will be set from the command_name operand and the positional parameters ($1, $2, etc.) set from the remaining argument operands.
This patch seems to fairly simply do the trick:
--- subprocess.py.orig 2009-04-19 04:43:42.000000000 +0200
+++ subprocess.py 2009-08-10 13:08:48.000000000 +0200
## -990,7 +990,7 ##
args = list(args)
if shell:
- args = ["/bin/sh", "-c"] + args
+ args = ["/bin/sh", "-c"] + [" ".join(args)] + args
if executable is None:
executable = args[0]
From the subprocess.py source:
On UNIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
On Windows: the Popen class uses CreateProcess() to execute the child
program, which operates on strings. If args is a sequence, it will be
converted to a string using the list2cmdline method. Please note that
not all MS Windows applications interpret the command line the same
way: The list2cmdline is designed for applications using the same
rules as the MS C runtime.
That doesn't answer why, just clarifies that you are seeing the expected behavior.
The "why" is probably that on UNIX-like systems, command arguments are actually passed through to applications (using the exec* family of calls) as an array of strings. In other words, the calling process decides what goes into EACH command line argument. Whereas when you tell it to use a shell, the calling process actually only gets the chance to pass a single command line argument to the shell to execute: The entire command line that you want executed, executable name and arguments, as a single string.
But on Windows, the entire command line (according to the above documentation) is passed as a single string to the child process. If you look at the CreateProcess API documentation, you will notice that it expects all of the command line arguments to be concatenated together into a big string (hence the call to list2cmdline).
Plus there is the fact that on UNIX-like systems there actually is a shell that can do useful things, so I suspect that the other reason for the difference is that on Windows, shell=True does nothing, which is why it is working the way you are seeing. The only way to make the two systems act identically would be for it to simply drop all of the command line arguments when shell=True on Windows.
The reason for the UNIX behaviour of shell=True is to do with quoting. When we write a shell command, it will be split at spaces, so we have to quote some arguments:
cp "My File" "New Location"
This leads to problems when our arguments contain quotes, which requires escaping:
grep -r "\"hello\"" .
Sometimes we can get awful situations where \ must be escaped too!
Of course, the real problem is that we're trying to use one string to specify multiple strings. When calling system commands, most programming languages avoid this by allowing us to send multiple strings in the first place, hence:
Popen(['cp', 'My File', 'New Location'])
Popen(['grep', '-r', '"hello"'])
Sometimes it can be nice to run "raw" shell commands; for example, if we're copy-pasting something from a shell script or a Web site, and we don't want to convert all of the horrible escaping manually. That's why the shell=True option exists:
Popen(['cp "My File" "New Location"'], shell=True)
Popen(['grep -r "\"hello\"" .'], shell=True)
I'm not familiar with Windows so I don't know how or why it behaves differently.
This question already has answers here:
Why subprocess.Popen doesn't work when args is sequence?
(3 answers)
Closed 6 years ago.
In my terminal if I run: echo $(pwd), I got /home/abr/workspace, but when I tried to run this script in python like this:
>>> import subprocess
>>> cmd = ['echo', '$(pwd)']
>>> subprocess.check_output(cmd, shell=True)
I get '\n'. How to fix this?
Use os package:
import os
print os.environ.get('PWD', '')
From the documentation on the subprocess module:
If args is a sequence, the first item specifies the command string,
and any additional items will be treated as additional arguments to
the shell itself.
You want:
subprocess.check_output("echo $(pwd)", shell=True)
Try this:
cmd = 'echo $(pwd)'
subprocess.check_output(cmd, shell=True)
In subprocess doc it specified that cmd should be a string when shell=True.
From the documentation:
The shell argument (which defaults to False) specifies whether to use
the shell as the program to execute. If shell is True, it is
recommended to pass args as a string rather than as a sequence.
A better way to achieve this is probably to use the os module from the python standard library, like this:
import os
print os.getcwd()
>> "/home/abr/workspace"
The getcwd() function returns a string representing the current working directory.
The command subpreocess.check_output will return the output of the command you are calling:
Example:
#echo 2
2
from python
>>>subprocess.check_output(['echo', '2'], shell=True)
>>>'2\n'
the '\n' is included because that is what the command does it prints the output sting and then puts the current on a new line.
now back to your problem; assuming you want the output of 'PWD', first of all you have to get rid of the shell. If you provide the shell argument, the command will be run in a shell environment and you won't see the returned string.
subprocess.check_output(['pwd'])
Will return the current directory + '\n'
On a personal note, I have a hard time understanding what you are trying to do, but I hope this helps solve it.
I can not get the subprocess.call() to work properly:
>>> from subprocess import call
>>> call(['adduser', '--home=/var/www/myusername/', '--gecos', 'GECOS', '--disabled-login', 'myusername'], shell=True)
adduser: Only one or two names allowed.
1
But without shell=True:
>>> call(['adduser', '--home=/var/www/myusername/', '--gecos', 'GECOS', '--disabled-login', 'myusername'])
Adding user `myusername' ...
Adding new group `myusername' (1001) ...
Adding new user `myusername' (1001) with group `myusername' ...
Creating home directory `/var/www/myusername/' ...
Copying files from `/etc/skel' ...
0
Or the same directly in shell:
root#www1:~# adduser --home=/var/www/myusername/ --gecos GECOS --disabled-login myusername
Adding user `myusername' ...
Adding new group `myusername' (1001) ...
Adding new user `myusername' (1001) with group `myusername' ...
Creating home directory `/var/www/myusername/' ...
Copying files from `/etc/skel' ...
I miss some logic in the shell=True behavior. Can somebody explain me why? What is wrong with the first example? From the adduser command error message it seems that arguments are somehow crippled.
Thanks!
When you specify shell=True you switch to quite different behaviour. From the docs:
On Unix with shell=True, the shell defaults to /bin/sh. If args is a
string, the string specifies the command to execute through the shell.
This means that the string must be formatted exactly as it would be
when typed at the shell prompt. This includes, for example, quoting or
backslash escaping filenames with spaces in them. If args is a
sequence, the first item specifies the command string, and any
additional items will be treated as additional arguments to the shell
itself. That is to say, Popen does the equivalent of:
Popen(['/bin/sh', '-c', args[0], args[1], ...])
So you are running the equivalent of
/bin/sh -c "adduser" --home=/var/www/myusername/ --gecos GECOS --disabled-login myusername
The error message you are getting is what happens when you try and run adduser without any arguments as all the arguments are being passed to sh.
If you want to set shell=True then you would need to call it like this:
call('adduser --home=/var/www/myusername/ --gecos GECOS --disabled-login myusername', shell=True)
OR like this:
call(['adduser --home=/var/www/myusername/ --gecos GECOS --disabled-login myusername'], shell=True)
But mostly you just want to use call without the shell=True and use a list of arguments. As per your second, working, example.
I am not 100% sure about this but I think that it you specify Shell=True, you should be passing the command line as a single string which the shell itself will interpret:
>>> call('adduser --home=/var/www/myusername/ --gecos GECOS --disabled-login myusername', shell=True)
It seems that with shell=True you need to pass string into args rather than list of arguments.
A simple test:
In [4]: subprocess.call(['echo', 'foo', 'bar'], shell=True)
Out[4]: 0
In [5]: subprocess.call('echo foo bar', shell=True)
foo bar
Out[5]: 0
I.e. echo got the right arguments only when I used string, not list.
Python version 2.7.3
If shell is True the specified command will be executed through the shell, that is the shell takes care of filename wildcards, environment variable expansion etc. When you use shell=True the cmd is a single string, it must be formatted exactly as it would be typed in the shell. If shell=True and cmd is a sequence, the first argument specifies the command and the additional arguments are treated as arguments to the shell itself (by the -c switch).
If shell=False, and a sequence of arguments is provided the module will take care of properly escaping and quoting the arguments and for example ~ won't be expanded as the home directory etc.
Read more about it in the subprocess documentation, and mind the security hazard related to shell=True.