Python: runtime shebang problems

Python: runtime shebang problems - python

Here is the problem I am trying to solve. I don't have a specific question in the title because I don't even know what I need.
We have an ancient Hadoop computing cluster with a very old version of Python installed. What we have done is installed a new version (2.7.9) to a local directory (that we have perms on) visible to the entire cluster, and have a virtualenv with the packages we need. Let's call this path /n/2.7.9/venv/
We are using Hadoopy to distribute Python jobs on the cluster. Hadoopy distributes the python code (the mappers and reducers) to the cluster, which are assumed to be executable and come with a shebang, but it doesn't do anything like activate a virtualenv.
If I hardcode the shebang in the .py files to /n/2.7.9/venv/, everything works. But I want to put the .py files in a library; these files should have some generic shebang like #!/usr/bin/env python. But I tried this and it does not work, because at runtime the virtualenv is not "activated" by the script and therefore it bombs with import errors.
So if anyone has any ideas on how to solve this problem I would be grateful. Essentially I want #!/usr/bin/env python to resolve to /n/2.7.9/venv/ without /n/2.7.9/venv/ being active, or some other solution where I cannot hardcode the shebang.
Currently I am solving this problem by having a run function in the library, and putting a wrapper around this function in the main code (that calls the library) with the hardcoded shebang in it. This is less offensive because the hardcoded shebang makes sense in the main code, but it is still messy because I have to have an executable wrapper file around every function I want to run from the library.

I would change the environment variable PYTHONPATH and also the environment variable PATH. Point PYTHONPATH to your virtual environment and PATH to the directory that contains your new python executable, and make sure the path to your python executable comes first.

I accepted John Schmitt's answer because it led me to the solution. However, I am posting what I actually did, because it might be useful for other Hadoopy users.
What I actually did was :
args['cmdenvs'] = ['export VIRTUAL_ENV=/n/2.7.9/ourvenv','export PYTHONPATH=/n/2.7.9/ourvenv', 'export PATH=/n/2.7.9/ourvenv/bin:$PATH']
and passed args into Hadoopy's launch function. In the executable .py files, I put the generic #!/usr/bin/env python shebang.

Related

Prevent a Python-embedded to look in my default path C:\Python38 for modules

I'm using Cython in --embed mode to produce a .exe. I'm evaluating the Minimal set of files required to distribute an embed-Cython-compiled code and make it work on any machine. To do this, I only copy a minimal number of files from the Python Windows embeddable package.
In order to check this, I need to be sure that the current process I'm testing doesn't in fact use my system default Python install, i.e. C:\Python38.
To do this, I open a new cmd.exe and do set PATH= which temporarily removes everything from the PATH. Then I can test any self-compiled app.exe and make sure it doesn't reuse C:\Python38's files under the hood.
It works, except for the modules. Even after doing set PATH=, my code app.py
import json
print(json.dumps({"a":"b"}))
when Cython---embed-compiled into a .exe works, but it still uses C:\Python38\Lib\json\__init__.py! I know this for sure, because if I temporarily remove this file, my .exe now fails, because it cannot find the json module.
How to completely remove any link to C:\Python38 when debugging a Python program which shouldn't use these files?
Why isn't set PATH= enough? Which other environment variable does it use for modules? I checked all my system variables and I think I don't find any which seems related to Python.

Python has a quite complicated heuristic for finding its "installation" (see for example this SO-question or this description), so probably it doesn't find the installation you are providing but the "default" installation.
Probably the most simple way is to set the environment variable PYTHONPATH pointing to the desired installation prior to start of the embedded interpreter.
By examination of sys.path one can check whether the correct installation was found.

Thanks to #ead's answer and his link getpath.c finally redirecting to getpathp.c in the case of Windows, we can learn that the rule for building the path for module etc. is:
current directory first
PYTHONPATH env. variable
registry key HKEY_LOCAL_MACHINE\SOFTWARE\Python or the same in HKCU
PYTHONHOME env. variable
finally:
Iff - we can not locate the Python Home, have not had a PYTHONPATH
specified, and can't locate any Registry entries (ie, we have nothing
we can assume is a good path), a default path with relative entries is
used (eg. .\Lib;.\DLLs, etc)
Conclusion: in order to debug an embedded version of Python, without interfering with the default system install (C:\Python38 in my case), I finally solved it by temporarily renaming the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Python to HKEY_LOCAL_MACHINE\SOFTWARE\PythonOld.
Side note: I'm not sure I will ever revert this registry key back to normal: my normal Python install shouldn't need it anyway to find its path, since when I run python.exe from anywhere (it is in the PATH for everyday use), it will automatically look in .\Lib\ and .\DLL\ which is correct. I don't see a single use case in which my normal install python.exe wouldn't find its subdir .\Lib\ or .\DLL\ and requiring the registry for this. In which use case would the registry be necessary? if python.exe is started then its path has been found, and it can take its .\Lib subfolder, without help from registry. I think 99,99% of the time this registry feature is doing more harm than good, preventing a Python install to be really "portable" (i.e. that we can move from one folder to another).
Notes:
To be 100% sure, I also did this in command line, but I don't think it's necessary:
set PATH=
set PYTHONPATH=
set PYTHONHOME=
Might be helpful to do debugging of an embedded Python: import ctypes. If you haven't _ctypes.pyd and libffi-7.dll in your embedded install folder, it should fail. If it doesn't, this means it looks somewhere else (probably in your default system-wide Python install).

What is the advantage of "#!/usr/bin/env python" in the shebang rather than just calling the "#!python" interpreter?

I understand the difference between starting a python script like:
#!/usr/bin/env python
or
#!/usr/bin/python
From what I understood, just executes python as we would do in our shell, so it looks in $PATH. The second one is not a fixed path, which has the inconvenient that in a different system the python interpreter might be located in another path.
My question is, why do we need env? Why can we just do:
#!python
This works perfectly fine in my computer? Is there a reason to prefer calling env?

Short answer:
This depends on the shell. in bash #!python will just be ignored and you have to use #!/usr/bin/env python. zsh on the other hand seems to be able to handle #!python.
The long answer (for bash):
let's imagine your python file is named 'tst.py'
and has following contents
#!python
import sys
print(sys.executable, sys.version)
then you can always type
python tst.py
The first line is completely irrelevant and is not even looked at.
However if you do following (for example on linux)
chmod +x tst.py
./tst.py
Then the first line is looked at to determine which interpreter shall be used (bash, perl, python, something else?) and here at least for my OS (ubuntu) and my shell (bash) an absolute path is required for the executable name (e.g. /bin/bash, /bin/python, /usr/bin/env)
If I call ./tst.py on my ubuntu machine I get
bash: ./tst.py: python: bad interpreter: No such file or directory
Special case Windows, when typing tst.py or clicking on a python script.
If you're on windows, the line is looked at, but even if it is wrong a default python interpreter will be used. On Windows, this line could be used to select explicitly python2 or python3 for example.
Windows uses file type associations to determine which executable to call for which suffix. for .py files (python 3.x is installed) this is normally py.exe which is an executable located in the system path, that will just call a python interpreter. depending on installed versions, the shebang line and environment vars, that might indicate a virtualenv
Addendum:
The first line is interpreted by the shell or in the windows case by py.exe.
It seems bash requires absolute paths for command, whereas zsh accepts relative paths as well.
So for zsh only
#!python
works perfectly well whereas bash requiers absolute paths and thus the trick with the env command
#!/usr/bin/env python
For scripts that shall be executed by a cronjob it's best to hardcode the path of the python executable as PATH is rather minimalistic for cronjobs, so there it's best to have something like
#!/usr/bin/python3.5
or
#!/home/username/myvirtualenv/bin/python

Each Python script is written with a particular version of Python in mind. If the shebang is #!/usr/bin/env python, that puts the version used under the control of the individual caller, whose PATH may not provide the right version.
#!/usr/bin/python is a little better, as it puts the decision in the hands of the script itself. However, the script author doesn't necessarily know where the correction version of Python is on your system, so it still may not work.
The solution is to specify the correct location on your system when you install the module or script. The Python installer (pip, etc) will, at that time, rewrite any shebang containing the word python (#!python being the minimal such shebang) to the path specified by the installer.
Now when you run the script, it will point to the correct path that you have already provided, without being subject to runtime PATH lookup that could choose the wrong version.
Note that #!python is not meant to be used as-is, though for various reasons it might work for you. It's is a means towards making sure the script gets a fixed, absolute path to the correct interpreter. #!/usr/bin/python is also subject to install-time replacement, so may serve as a usable default.

Can't access local programs from Python script

I´m trying to execute locally installed programs from a Python script (OSX), but they are not found, since /usr/local/bin is not in the PATH. Running os.environ gives only /usr/bin:/bin:/usr/sbin:/sbin.
It is probably a common/simple problem, but I´ve exhausted Google, and start feeling a little stupid :-)

How do you try to execute your programs (please provide us a minimal code sample)?
If you are using the subprocess package, you can try to provide the full path of your executable:
subprocess.run(["/usr/local/bin/my_program"], ...)
Else, you can try to append /usr/local/bin to the os.environ list.

Run python script from another directory

I feel a little foolish that I don't know this, but I tried to do it today and was surprised when it didn't work....
I have a directory C:\test with a demo script, lets call it demo.py
If i am in C:\test then I can just do python demo.py. Easy
I could also use a relative path, so from C:\, it's python test\demo.py
What if C:\test is on the path?
I was expecting to be able to now do python demo.py from anywhere however...
python: can't open file 'demo.py': [Errno 2] No such file or directory
I feel foolish because I thought this was straightforward, but I have searched around and have not found a solution. Am I fundamentally misunderstanding something here about how the Python interpreter finds scripts to run? I don't think this is anything to do with PYTHONPATH, as I understood that to relate to loading of modules inside scripts.
This is on Windows 7, by the way.

The PATH is only used to search for commands. A first way is that a Python script can be used directly as a command and in that case the PATH will be used: just use demo.py instead of python demo.py.
It will rely on OS specific ways. On Windows, file type (given by the extension - here .py) can be given default application to process them, while on Unix-like, the first line of a script can declare the program that will process it.
Alternatively, python allows to launch a module that will be searched in the PYTHONPATH (not PATH) by using python -m module or for Windows py -m module.

Add a directory to Python sys.path so that it's included each time I use Python

Currently, when trying to reference some library code, I'm doing this at the top of my python file:
import sys
sys.path.append('''C:\code\my-library''')
from my-library import my-library
Then, my-library will be part of sys.path for as long as the session is active. If I start a new file, I have to remember to include sys.path.append again.
I feel like there must be a much better way of doing this. How can I make my-library available to every python script on my windows machine without having to use sys.path.append each time?

Simply add this path to your PYTHONPATH environment variable. To do this, go to Control Panel / System / Advanced / Environment variable, and in the "User variables" sections, check if you already have PYTHONPATH. If yes, select it and click "Edit", if not, click "New" to add it.
Paths in PYTHONPATH should be separated with ";".

You should use
os.path.join
to make your code more reliable.
You have already used __my-library__ in the path. So don't use it the second time in import.
If you have a directory structure like this
C:\code\my-library\lib.py and a function in there, e.g.:
def main():
print("Hello, world")
then your resulting code should be
import sys
sys.path.append(os.path.join('C:/', 'code', 'my-library'))
from lib import main

If this is a library that you use throughout your code, you should install it as such. Package it up properly, and either install it in your site-packages directory - or, if it's specific to certain projects, use virtualenv and install it just within the relevant virtualenvs.

To do such a thing, you'll have to use a sitecustomize.py (or usercustomize.py) file where you'll do your sys.path modifications (source python docs).
Create the sitecustomize.py file into the \Lib\site-packages directory of your python installation, and it will be imported each time a python interpreter is launched.

If you are doing this interactively, the best thing to do would be to install ipython and configure your startup settings to include that code. If you intend to have it be part of a script you run from the interpreter, the same thing applies, since it will have access to your namespace.
On the other hand, a stand alone script should not include that automatically. In the future, you or some other maintainer will come along, and all the code should be obvious, and not dependent upon a specific machine setup. The best thing to do would be to set up a skeleton file for new projects that includes all of the basic functionality you need. That, along with oft-used snippets will handle the problem.
All of your code to run the script, will be in the script, and you won't have to think about adding that code every time.

Using jupyter with multiple environments, adding the path to .bashrc didn't work. I had to edit the kernel.json file for that particular kernel and append it to the PYTHONPATH in env section.
This only worked in that kernel but maybe this can help someone else.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.