Prevent creating new child process using subprocess in Python

Prevent creating new child process using subprocess in Python - python

I need to run a lot of bash commands from Python. For the moment I'm doing this with
subprocess.Popen(cmd, shell=True)
Is there any solution to run all these commands in the same shell? subprocess.Popen opens a new shell at every execution and I need to set up all the necessary variables at every call, in order for cmd command to work properly.

subprocess.Popen lets you supply a dictionary of environment variables, which will become the environment for the process being run. If the only reason you need shell=True is to set an environment variable, then I suggest you use an explicit environment dictionary instead; it's safer and not particularly difficult. Also, it's usually much easier to construct command invocations when you don't have to worry about quoting and shell metacharacters.
It may not even be necessary to construct the environment dictionary, if you don't mind having the environment variables set in the running process. (Most of the time, this won't be a problem, but sometimes it is. I don't know enough about your application to tell.)
If you can modify your own environment with the settings, just do that:
os.environ['theEnvVar'] = '/the/value'
Then you can just use a simple Popen.call (or similar) to run the command:
output = subprocess.check_output(["ls", "-lR", "/tmp"])
If for whatever reason you cannot change your own environment, you need to make a copy of the current environment, modify it as desired, and pass it to each subprocess.call:
env = os.environ.copy()
env['theEnvVar'] = '/the/value'
output = subprocess.check_output(["ls", "-lR", "/tmp"], env=env)
If you don't want to have to specify env=env every time, just write a little wrapper class.

Why not just create a shell script with all the commands you need to run, then just use a single subprocess.Popen() call to run it? If the contents of the commands you need to run depend on results calculated in your Python script, you can just create the shell script dynamically, then run it.

Use multiprocessing instead, it's more lightweight and efficient.
Unlike subprocess.Popen it does not open a new shell at every execution.
You didn't say you need to run subprocess.Popen and you may well not need to; you just said that's what you're currently doing. More justification please.
See set env var in Python multiprocessing.Process for how to set your env vars once and for all in the parent process.

Related

Python, the relationship between the bash/python/subprocess processes (shells)?

When trying to write script with python, I have a fundamental hole of knowledge.
Update: Thanks to the answers I corrected the word shell to process/subprocess
Nomenclature
Starting with a Bash prompt, lets call this BASH_PROCESS
Then within BASH_PROCESS I run python3 foo.py, the python script runs in say PYTHON_SUBPROCESS
Within foo.py is a call to subprocess.run(...), this subprocess command runs in say `SUBPROCESS_SUBPROCESS
Within foo.py is subprocess.run(..., shell=True), this subprocess command runs in say SUBPROCESS_SUBPROCESS=True
Test for if a process/subprocess is equal
Say SUBPROCESS_A starts SUBPROCESS_B. In the below questions, when I say is SUBPROCESS_A == SUBPROCESS_B, what I means is if SUBPROCESS_B sets an env variable, when it runs to completion, will they env variable be set in SUBPROCESS_A? If one runs eval "$(ssh-agent -s)" in SUBPROCESS_B, will SUBPROCESS_A now have an ssh agent too?
Question
Using the above nomenclature and equality tests
Is BASH_PROCESS == PYTHON_SUBPROCESS?
Is PYTHON_SUBPROCESS == SUBPROCESS_SUBPROCESS?
Is PYTHON_SUBPROCESS == SUBPROCESS_SUBPROCESS=True?
If SUBPROCESS_SUBPROCESS=True is not equal to BASH_PROCESS, then how does one alter the executing environment (e.g. eval "$(ssh-agent -s)") so that a python script can set up the env for the calller?

You seem to be confusing several concepts here.
TLDR No, there is no way for a subprocess to change its parent's environment. See also Global environment variables in a shell script
You really don't seem to be asking about "shells".
Instead, these are subprocesses; if you run python foo.py in a shell, the Python process is a subprocess of the shell process. (Many shells let you exec python foo.py which replaces the shell process with a Python process; this process is now a subprocess of whichever process started the shell. On Unix-like systems, ultimately all processes are descendants of process 1, the init process.)
subprocess runs a subprocess, simply. If shell=True then the immediate subprocess of Python is the shell, and the command(s) you run are subprocesses of that shell. The shell will be the default shell (cmd on Windows, /bin/sh on Unix-like systems) though you can explicitly override this with e.g. executable="/bin/bash"
Examples:
subprocess.Popen(['printf', '%s\n', 'foo', 'bar'])
Python is the parent process, printf is a subprocess whose parent is the Python process.
subprocess.Popen(r"printf '%s\n' foo bar", shell=True)
Python is the parent process of /bin/sh, which in turn is the parent process of printf. When printf terminates, so does sh, as it has reached the end of its script.
Perhaps notice that the shell takes care of parsing the command line and splitting it up into the four tokens we ended up explicitly passing directly to Popen in the previous example.
The commands you run have access to shell features like wildcard expansion, pipes, redirection, quoting, variable expansion, background processing, etc.
In this isolated example, none of those are used, so you are basically adding an unnecessary process. (Maybe use shlex.split() if you want to avoid the minor burden of splitting up the command into tokens.) See also Actual meaning of 'shell=True' in subprocess
subprocess.Popen(r"printf '%s\n' foo bar", shell=True, executable="/bin/bash")
Python is the parent process of Bash, which in turn is the parent process of printf. Except for the name of the shell, this is identical to the previous example.
There are situations where you really need the slower and more memory-hungry Bash shell, when the commands you want to execute require features which are available in Bash, but not in the Bourne shell. In general, a better solution is nearly always to run as little code as possible in a subprocess, and instead replace those Bash commands with native Python constructs; but if you know what you are doing (or really don't know what you are doing, but need to get the job done rather than solve the problem properly), the facility can be useful.
(Separately, you should probably avoid bare Popen when you can, as explained in the subprocess documentation.)
Subprocesses inherit the environment of their parent when they are started. On Unix-like systems, there is no way for a process to change its parent's environment (though the parent may participate in making that possible, as in your eval example).
To perhaps accomplish what you may ultimately be asking about, you can set up an environment within Python and then start your other command as a subprocess, perhaps then with an explicit env= keyword argument to point to the environment you want it to use:
import os
...
env = os.environ.copy()
env["PATH"] = "/opt/foo:" + env["PATH"]
del env["PAGER"]
env["secret_cookie"] = "xyzzy"
subprocess.Popen(["otherprogram"], env=env)
or have Python print out values in a form which can safely be passed to eval in the Bourne shell. (Caution: this requires you to understand the perils of eval in general and the target shell's quoting conventions in particular; also, you will perhaps need to support the syntax of more than one shell, unless you are only targeting a very limited audience.)
... Though in many situations, the simplest solution by far is to set up the environment in the shell, then run Python as a subprocess of that shell instance (or exec python if you want to get rid of the shell instance after it has performed its part; see also What are the uses of the exec command in shell scripts?)
Python without an argument starts the Python REPL, which could be regarded as a "shell", though we would commonly not use that term (perhaps instead call it "interactive interpreter" - see also below); but python foo.py simply runs the script foo.py and exits, so there is no shell there.
The definition of "shell" is slightly context-dependent, but you don't really seem to be asking about shells here. (Some GUIs have a concept of "graphical shell" etc but we are already out of the scope of what you were trying to ask about.) Some programs are command interpreters (the Python executable interprets and executes commands in the Python language; the Bourne shell interprets and executes shell scripts) but generally only those whose primary purposes include running other programs are called "shells".

None of those equalities are true, and half of those "shells" aren't actually shells.
Your bash shell is a shell. When you launch your Python script from that shell, the Python process that runs the script is a child process of the bash shell process. When you launch a subprocess from the Python script, that subprocess is a child process of the Python process. If you launch the subprocess with shell=True, Python invokes a shell to parse and run the command, but otherwise, no shell is involved in running the subprocess.
Child processes inherit environment variables from their parent on startup (unless you take specific steps to avoid that), but they cannot set environment variables for their parent. You cannot run a Python script to set environment variables in your shell, or run a subprocess from Python to set your Python script's environment variables.

Access exported variables by a called script from python

While there are many questions regarding accessing/ setting env variables in python, I could not find the answer for my particular scenario.
I have a shell script that when called it does the export of a bunch of env variables that are used later on.
When you need those variables to be available from your current session, then you would do
. ./script_exp_var.sh
that does, say
export MYVAR = MYVAL
then if you run python, you could access it with os.environ.get('MYVAR').
My question is how to invoke the script from python and then to have to access the env variables that called script just exported. Is it possible at all and if yes how?
Note: I know I could set the env var from python using os.environ["MYVAR"] = MYVAL but I would like to use the existing logic in my ./script_exp_var.sh because it exports many variables.
Also making sure to execute the script first and then execute the python is also not an option for my scenario.

You can't. Environment variables are copied from parent to child, never back to the parent.
If you execute a shell script from python then the environment variables will be set in that shell process (the child) and python will be unaware of them.
You could write a parser to read the shell commands from python, but that's a lot of work.
Better to write a shell script with the settings in that and then call the python program as a child of the script.
Alternatively, write a shell script that echoes the values back to python which can be picked-up using a pipe.

python shell script execution that passes environment to the parent

So I know how subprocess works and use it a lot, but I've run into a strange issue. I need to execute an export of some environment variables. The reason is that some program (black-box) executes a program that seems like it runs in a subshell, so it doesn't have access to the environment variables but it has access to all my files.
I can't hard code the environment variables so I want to source or . the file that has the export commands in it. However, if I source or . that file in a subprocess, it won't make any difference to its parent process. In which case I either need some function besides subprocess that can execute shell commands without creating a subprocess, if that exists. Another issue is that a subprocess doesn't have the proper permissions to read the file.
And copying the environment variables via os isn't really possible either.
Does anything besides subprocess exist? Or is there some other kind of workaround?

IMHO the simplest solution consists in creating a new shell script (let's call it run_black_box.sh) which sources the setup script (let's assume it is named setup.sh) to initialize the environment and then calls the black_box program.
Here is a possible content of run_black_box.sh:
#/bin/bash
source setup.sh
black_box
The you can pass run_black_box.sh to subprocess for execution.

How to set a 'system level' environment variable?

I'm trying to generate an encryption key for a file and then save it for use next time the script runs. I know that's not very secure, but it's just an interim solution for keeping a password out of a git repo.
subprocess.call('export KEY="password"', shell=True) returns 0 and does nothing.
Running export KEY="password" manually in my bash prompt works fine on Ubuntu.

subprocess.call('export KEY="password"', shell=True)
creates a shell, sets your KEY and exits: accomplishes nothing.
Environment variables do not propagate to parent process, only to child processes. When you set the variable in your bash prompt, it is effective for all the subprocesses (but not outside the bash prompt, for a quick parallel)
The only way to make it using python would be to set the password using a master python script (using os.putenv("KEY","password") or os.environ["KEY"]="password") which calls sub-modules or processes.

Using Python:
#SET:
os.environ["EnvVar"] = "1"
#GET:
print os.environ["EnvVar"]

This isn't a thing you can do. Your subprocess call creates a subshell and sets the env var there, but doesn't affect the current process, let alone the calling shell.

I use os.system to run executable file, but it need a .so file, it can't find library

When I call a executable in python using os.system("./mydemo") in Ubuntu, it can't find the .so file (libmsc.so) needed for mydemo. I used os.system("export LD_LIBRARY_PATH=pwd:$LD_LIBRARY_PATH;"), but it still can't find libmsc.so.
The libmsc.so is in the current directory. and shouldn't be global.

When you do os.system("export LD_LIBRARY_PATH=pwd:$LD_LIBRARY_PATH;"), you run new instance of shell, alter LD_LIBRARY_PATH there than immediately exit from it. Also, pwd mean nothing in the context of Python.
Try to set env variables like this:
os.system("LD_LIBRARY_PATH={} ./mydemo".format(os.getcwd()))
Or maybe it is better to use subprocess module?
import subprocess
env = os.environ.copy()
env['LD_LIBRARY_PATH'] = os.getcwd()
proc = subprocess.Popen("./mydemo", shell=True, env=env)
proc.wait()

If I remember correctly, executing export ... via os.system will only set that shell variable within the scope, thus it's not available in the following os.system scopes. You should set the LD_LIBRARY_PATH in the shell, before executing the Python script.
Btw. also avoid setting relative paths…

The problem is that export only exports its variables to children of the current shell. As in the shell that you created by calling os.system, which then immediately exits.
If you want the simplest fix to make this work, you can do both the export and the target program inside a single shell:
os.system("export LD_LIBRARY_PATH=pwd:$LD_LIBRARY_PATH; ./mydemo")
There are some other problems with this. For example, exporting and assigning a variable in the same command is a bash-ism that may not be available in all shells. With subprocess you can specify a specific shell, but with system you just get whatever the OS considers the default shell—on Linux, the manpage says this means /bin/sh -c.
Really, the best way to solve this is to just not use a shell in the first place, and set the environment variables the way you want. That's exactly why the os.system docs say: "The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function." For example:
env = dict(os.environ)
env['LD_LIBRARY_PATH'] = '"{}":{}'.format(
os.getcwd(), env.get('LD_LIBRARY_PATH', ''))
subprocess.check_call(['./mydemo'], env=env)
Or, if you want to be really safe (unlike your shell code):
LD_LIBRARY_PATH = env.get('LD_LIBRARY_PATH', '')
if LD_LIBRARY_PATH:
LD_LIBRARY_PATH = ':' + LD_LIBRARY_PATH
LD_LIBRARY_PATH = shlex.quote(os.getcwd()) + LD_LIBRARY_PATH
subprocess.check_call(['./mydemo'], env=env)
I've written this out more explicitly and verbosely than you'd normally write it, to make the steps obvious: don't include a trailing : before an empty path, and use shlex.quote in case someone does something tricky with the current working directory.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.