PYTHONPATH setting inside python script using subprocess.Popen() fails [duplicate]

PYTHONPATH setting inside python script using subprocess.Popen() fails [duplicate] - python

This question already has answers here:
How to set environment variables in Python?
(19 answers)
Closed 6 years ago.
The following code allows me to dynamically identify and load the custom module if it is not located in any of the directory of sys.path variable
import sys
sys.path.append("/lib")
But, this gives me OSError
import subprocess
x = subprocess.Popen(["export", "PYTHONPATH=/lib"], stdout=subprocess.PIPE)
Not just this, even simple Linux/Unix variable declaration setting fails in subprocess.Popen()
import subprocess
x = subprocess.Popen("x=y", stdout=subprocess.PIPE)
I wanted to check subprocess as I tried setting PYTHONPATH via os.system(), os.popen() etc., and the variable did not set (may be it is set in the child process shell)

Try this:
>>> subprocess.call(["export foo=bar && echo foo=$foo"], shell=True)
foo=bar
0
>>>

There are several things that are going on here and are probably confusing you a little. One thing is, that whatever instructions given to Popen will be executed in the child process and will not affect your main process. You can merely Pipe or retrieve results from it.
First to comment on your second use case, where you use string as an argument. From the docs you can read:
class subprocess.Popen(args, bufsize=-1, executable=None, stdin=None,
stdout=None, stderr=None, preexec_fn=None, close_fds=True,
shell=False, cwd=None, env=None, universal_newlines=False,
startupinfo=None, creationflags=0, restore_signals=True,
start_new_session=False, pass_fds=())
...
args should be a sequence of program arguments or else a single
string. By default, the program to execute is the first item in args
if args is a sequence. If args is a string, the interpretation is
platform-dependent and described below. See the shell and executable
arguments for additional differences from the default behavior. Unless
otherwise stated, it is recommended to pass args as a sequence.
On POSIX, if args is a string, the string is interpreted as the name
or path of the program to execute. However, this can only be done if
not passing arguments to the program.
So in your second case, you are trying to execute a file or program x=y which doesn't go.
Even if you use list, like in your first use case, you must be aware, that this isn't equivalent to passing code to the bash shell. If you want this, you can use shell=True as an keyword argument, but this has other issues as indicated by the docs. But your code will execute with shell=True.
If your sole purpose is to set environmental variable, then you should consider the option to use os.environ variable that maps your environmental variables to values (as indicated by #cdarke first).

Related

What's the difference between subprocess.Popen("echo $HOME"... and subprocess.Popen(["echo", "$HOME"]

I cannot get it it's bash related or python subprocess, but results are different:
>>> subprocess.Popen("echo $HOME", shell=True, stdout=subprocess.PIPE).communicate()
(b'/Users/mac\n', None)
>>> subprocess.Popen(["echo", "$HOME"], shell=True, stdout=subprocess.PIPE).communicate()
(b'\n', None)
Why in second time it's just newline? Where argument are falling off?

The first argument to subprocess.Popen() tells the system what to run.
When it is a list, you need to use shell=False. It coincidentally happens to work as you hope in Windows; but on Unix-like platforms, you are simply passing in a number of arguments which will typically get ignored. Effectively,
/bin/sh -c 'echo' '$HOME'
which simply causes the second argument to not be used for anything (where I use single quotes to emphasize that these are just static strings).
In my humble opinion, Python should throw an error in this case. On Windows, too. This is an error which should be caught and reported.
(In the opposite case, where shell=False is specified but the string you pass in is not the name of a valid command, you will get an error eventually anyway, and it makes sense if you have even a vague idea of what's going on.)
If you really know what you are doing, you could cause the first argument to access subsequent arguments; for example
/bin/sh -c 'printf "%s\n" "$#"' 'ick' 'foo' 'bar' 'baz'
would print foo, bar, and baz on separate lines. (The "zeroth" argument - here, 'ick' - is used to populate $0.) But this is just an obscure corollary; don't try to use this for anything.
As a further aside, you should not use subprocess.Popen() if you just want a command to run. The subprocess.run() documentation tells you this in some more detail. With text=True you get a string instead of bytes.
result = subprocess.run('echo "$HOME"', shell=True,
text=True, capture_output=True, check=True)
print(result.stdout, result.stderr)
And of course, os.environ['HOME'] lets you access the value of $HOME from within Python. This also allows you to avoid shell=True which you usually should if you can.

In the documentation found on https://docs.python.org/2/library/subprocess.html#popen-constructor, if you look at the shell argument you will find
The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.
Which means that when you execute the second command it runs as echo and hence you get just a new line.

When you have shell=True, actual process that runs is the shell process i.e., think of it running /bin/sh -c on unix. The arguments you pass to Popen are passed as arguments to this shell process. So /bin/sh -c 'echo' '$HOME' prints newline and the second argument is ignored. So usually you should only use string arguments with shell=True.

Why does the command and its arguments have to be in a list for subprocess.Popen?

I tried doing
import subprocess
p = subprocess.Popen("ls -la /etc", stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.stdout.read().decode()
Which gives me
FileNotFoundError: [Errno 2] No such file or directory: 'ls -la /etc': 'ls -la /etc'
Following
Python subprocess.Popen with var/args
I did
import subprocess
p = subprocess.Popen(["ls", "-la", "/etc"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
p.stdout.read().decode()
Which did work.
Why is that? Why do I have to split my command and its arguments? What's the rationale behind this design?
Python version:
3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]

That's how all process invocations work on UNIX.
Under the hood, running a program on UNIX is traditionally done with the following steps:
fork() off a child process.
In that child process, open new copies of stdin, stdout, stderr, etc if redirections are requested, using the dup2() call to assign the newly-opened files over the file descriptors that are redirection targets.
In that child process, use the execve() syscall to replace the current process with the desired child process. This syscall takes an array of arguments, not a single string.
wait() for the child to exit, if the call is meant to be blocking.
So, subprocess.Popen exposes the array interface, because the array interface is what the operating system actually does under the hood.
When you run ls /tmp at a shell, that shell transforms the string into an array and then does the above steps itself -- but it gives you more control (and avoids serious bugs -- if someone creates a file named /tmp/$(rm -rf ~), you don't want trying to cat /tmp/$(rm -rf ~) to delete your home directory) when you do the transformations yourself.

According to the docs, it's dependent on the shell= keyword argument on how a string will work vs a list (Bold indicates what is likely causing your experienced behavior):
args should be a sequence of program arguments or else a single string or path-like object. By default, the program to execute is the first item in args if args is a sequence. If args is a string, the interpretation is platform-dependent and described below. See the shell and executable arguments for additional differences from the default behavior. Unless otherwise stated, it is recommended to pass args as a sequence.
On POSIX, if args is a string, the string is interpreted as the name or path of the program to execute. However, this can only be done if not passing arguments to the program.
Further down...
On POSIX with shell=True, the shell defaults to /bin/sh. If args is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If args is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself. That is to say, Popen does the equivalent of:
Popen(['/bin/sh', '-c', args[0], args[1], ...])
On Windows with shell=True, the COMSPEC environment variable specifies the default shell. The only time you need to specify shell=True on Windows is when the command you wish to execute is built into the shell (e.g. dir or copy). You do not need shell=True to run a batch file or console-based executable.

Why does subprocess use a list instead of a string with spaces by default?

Why does Python's subprocess module expect the arguments as a list by default? Why isn't a string with spaces (similar to what you type into a terminal when running the command normally) the default input? There are plenty of sources explaining how to pass in the space delimited string of the command into subprocess, but it's less clear as to why the default isn't the other way around.

TL;DR Using the list bypasses the shell so that you don't need to worry about the shell interpreting a dynamically constructed command line in ways you did not intend.
Suppose you have a really simple command: echo foo. Here it is, using both a string and a list:
Popen("echo foo", shell=True)
Popen(["echo", "foo"])
Not much difference yet. Now suppose the argument contains quotes to protect whitespace and/or a shell pattern, echo "foo * bar":
Popen("echo \"foo * bar\"", shell=True)
Popen(["echo", "foo * bar"])
Yes, I could have used single quotes to avoid needing to escape the double quotes, but you can see the list form is starting to have an advantage. Now imagine I don't have a literal argument for the command, but that it is stored in a variable. Now which do you want to use...
This?
Popen('echo "%s"' % (x,), shell=True)
or this?
Popen(["echo", x])
If you answered "the first one", here's the value of x:
x = "\";rm -rf \""
The command you just executed was echo ""; rm -rf/"". You needed to make sure any special characters in the value of x were first escaped before incorporating it into the string you are building to pass to the shell.
Or you just use a list and avoid the shell altogether.

Forget all that I wrote - just read the relevant PEP yourself
https://www.python.org/dev/peps/pep-0324/
===============
My short guess - the no-shell list version is closer to the format that is eventually passed to the POSIX forking commands. It requires less manipulation. The shell string approach is something of a Windows legacy.
=====================
So you are asking why the shell=False case is the default?
On POSIX, with shell=False (default): In this case, the Popen class
uses os.execvp() to execute the child program. args should normally
be a sequence. A string will be treated as a sequence with the string
as the only item (the program to execute).
On POSIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
'why' questions tend to be closed because they rarely have definitive answers, or they involve opinions, or history.
I'd suggest studying the subprocess.py code. I see for example a lot of calls to:
Popen(*popenargs, **kwargs)
It's init is:
def __init__(self, args, bufsize=-1, executable=None,
stdin=None, stdout=None, stderr=None,
preexec_fn=None, close_fds=_PLATFORM_DEFAULT_CLOSE_FDS,
shell=False, cwd=None, env=None, universal_newlines=False,
startupinfo=None, creationflags=0,
restore_signals=True, start_new_session=False,
pass_fds=()):
As a keyword arg, shell has to have some default value; why not False?
I suspect that in the shell case it passes a whole string to some code that calls the shell. In the no-shell case it must pass a list. But we have to find that code.
There are 2 methods of call the subprocess, one for POSIX and the other Windows. In the POSIX case it appears to convert the string list, regardless whether shell is True or not It may be more nuanced than that, but this is the relevant code:
"""Execute program (POSIX version)"""
if isinstance(args, (str, bytes)):
args = [args]
else:
args = list(args)
if shell:
args = ["/bin/sh", "-c"] + args
if executable:
args[0] = executable
....
self.pid = _posixsubprocess.fork_exec(
args, executable_list,...
In the windows shell case the args string is combined with cmd info:
if shell:
....
comspec = os.environ.get("COMSPEC", "cmd.exe")
args = '{} /c "{}"'.format (comspec, args)
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
# no special security
....

Checking Subprocesses in python

I'm trying to run one python program from another using subprocess. Here's the function I've got so far:
def runProcess(exe):
p = subprocess.Popen(exe, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
while(True):
retcode = p.poll() #returns None while subprocess is running
line = p.stdout.readline()
yield line
if(retcode is not None):
break
then i run:
for line in runProcess('python myotherprogram.py'): print line
but I get an OS error: no such file, but it doesn't tell me what file doesn't exist. It's baffling. Any suggestions? I can use the runProcess function for normal terminal commands, such as ls.

What doesn't exist is a single executable named python myotherprogram.py. To specify arguments, you need to provide a list consisting of the command and its argument, such as with runProcess(["python", "myotherprogram.py"]), or specify shell=True to the Popen constructor.
The relevant quote from the documentation:
args should be a sequence of program arguments or else a single
string. By default, the program to execute is the first item in args
if args is a sequence. If args is a string, the interpretation is
platform-dependent and described below. See the shell and executable
arguments for additional differences from the default behavior. Unless
otherwise stated, it is recommended to pass args as a sequence.
On Unix, if args is a string, the string is interpreted as the name or
path of the program to execute. However, this can only be done if not
passing arguments to the program.

Launch gnu screen from python?

I tried executing a server daemon with gnu screen from subprocess call but it didn't even start
subprocess.call(["screen", "-dmS test ./server"])
I was told that running screen requires terminal, hence the reason why I can't simply execute it with call. Can you show me some piece of codes to do this?

Try
subprocess.call( ["screen", "-d", "-m", "-S", "test", "./server"] )
You need to break the argument string into separate arguments, one per string.
Here's the relevant quote from the subprocess docs:
On UNIX, with shell=False (default): In this case, the Popen class
uses os.execvp() to execute the child program. args should normally
be a sequence. A string will be treated as a sequence with the string
as the only item (the program to execute).
On UNIX, with shell=True: If args is a string, it specifies the
command string to execute through the shell. If args is a sequence,
the first item specifies the command string, and any additional items
will be treated as additional shell arguments.
So by default, the arguments are used exactly as you give them; it doesn't try to parse a string into multiple arguments. If you set shell to true, you could try the following:
subprocess.call("screen -dmS test ./server", shell=True)
and the string would be parsed exactly like a command line.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.