Is adding project root directory to sys.path a good practice? - python

I have a question about adding project path to python, for facilitating import effort.
Situation
When I write code in Python, I usually add necessary path to sys.path by using
import sys
sys.path.append("/path/to/dir/") # almost every `.py` need this
Sometimes, when my project gets bigger with many levels of directories, this approach seems bulky and error-prone (especially when I re-organize my files)
Recently, I start using a bash script (located at project root directory) that adding the sys.path.append with project root argument to .py file in the project. With this approach, I hardly have to manually care about importing a module.
Question
My question is: Is that a good practice? I find it convenient for myself, compared to my old method, but since the bash script is a separated file, I need 2 command to run any script in my project (one for the bash and one for the .py). I can include the command calling .py to the bash, but it far less flexible than directly call it from terminal.
Really want to hear some advices! Thanks in advance. Any suggestion will be gratefully appreciated!

It is generally not good practice to use manipulate sys.path within a python library or program. You should add the relevant paths to the PYTHONPATH in the calling environment for your python program:
PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH" python ...
or
export PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH"
python ...
This allows you to easily manipulate the paths that your program or library will search for dependencies easily without modifying your code.
It is also very easy to manage this in your personal development environment by modifying your bashrc or in your production environments in your init script (or other wrapper script) and provides you with one place to update each time you add or modify your project paths.
Given that you mention that you have almost one directory per .py file, you should also consider how your code might be organized into packages to further simplify your setup.

It's not a particularly good practice, though you could get away with it. Better to look into virtualenv though (or pipenv) for a smoother workflow.

Related

change the priority of pythonpath

When I was loading certain modules [namely pygments.lexers Bash Lexer and pygments.formatters LatexFormatter] I was was getting an error that python couldn't find the modules. I then realised that this problem was being caused by my PYTHONPATH, which is set up for using paraview with python. It brings its own version of pygments, which doesn't work with nbconvert from the jupyter notebook for some reason [Note it is not totally disfunctional, as PythonLexer, and a few others were called without a problem, it was only the ones that I've mentioned above that couldn't be found].
I have a similar problem with mayavi, which wouldn't work with paraview's version of vtk.
Both of these problems can be resolved simply enough by commenting out the python path in the bashrc, but obviously then paraview won't work.
Is there any way to, for example, reduce the priority of the PYTHONPATH so that the system codes in /etc... are called preferentially, but paraview can still find the ones that it needs in the PYTHONPATH?
I am using python 2.76 on linux mint 17.3, paraview is version 4.4.0, installed from source code as per here
Ordering the entries in PYTHONPATH is partly right, but the system paths don't seem to get included until you run python, and then they get put at the end. So to put a system path in front, explicitly add it:
export PYTHONPATH="[path/to/system/files]:$PYTHONPATH"
It is kind of a hack, because the system path you add will be duplicated in sys.path. But it works.
Yes, you can import sys and manipulate sys.path as an ordinary list at runtime. You can rearrange what's there, or just insert at the beginning (sys.path.insert(0, 'path')). Do this before your import statement. If this will cause problems elsewhere, put it back after your import statement.
Note, this is fairly hacky. But it sounds like you might have a case for it, although I have not looked at these specific tools together.
Edit: this is more relevant if you want to control the python path at the level of individual imports within the course of one execution of python. If you want to control the path at the level of one full execution of Python, you can also set the python path on the command line for just that execution like this:
PYTHONPATH=/replacement/path/here python your_script.py
This is more verbose (unless you wrap it in a shell script or alias) than just calling python, but it lets you control the path one script at a time, wheres putting it into .bashrc/.bash_profile or similar changes it for your whole shell session.
export PYTHONPATH=$PYTHONPATH:<your_path> will give priority to system paths and only if something is not there will look on your path.
export PYTHONPATH=<your_path>:$PYTHONPATH will search <your_path> firstly and then $PYTHONPATH for what it does not find.
If something exists in both but you want to use one version of it for one program and the other for another then you might want to look on different bashrc profiles.

distutils setup script under linux - permission issue

So I created a setup.py script for my python program with distutils and I think it behaves a bit strange. First off it installs all data_files into /usr/local/my_directory by default which is a bit weird since this isn't a really common place to store data, is it?
I changed the path to /usr/share/my_directory/. But now I'm not able to write to the database inside that directory and I can't set required permission from within setup.py neither since the actual database file has not been created when I run it.
Is my approach wrong? Should I use another tool for distributing?
Because at least for Linux, writing a simple setup sh script seems easier to me at the moment.
The immediate solution is to invoke setup.py with --prefix=/the/path/you/want.
A better approach would be to include the data as package_data. This way they will be installed along side your python package and you'll find it much easier to manage it (find paths etc).

Local collection of Python packages: best way to import them?

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)
Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*
I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

Why use sys.path.append(path) instead of sys.path.insert(1, path)?

Edit: based on a Ulf Rompe's comment, it is important you use "1" instead of "0", otherwise you will break sys.path.
I have been doing python for quite a while now (over a year), and I am always confused as to why people recommend you use sys.path.append() instead of sys.path.insert(). Let me demonstrate.
Let's say I am working on a module named PyWorkbooks (that is installed on my computer), but I am simultaneously working on a different module (let's say PyJob) that incorporates PyWorkbooks. As I'm working on PyJob I find errors in PyWorkbooks that I am correcting, so I would like to import a development version.
There are multiple ways to work on both (I could put my PyWorkbooks project inside of PyJob, for instance), but sometimes I will still need to play with the path. However, I cannot simply do a sys.path.append() to the folder where PyWorkbooks is at. Why? Because python will find my installed PyWorkbooks first!
This is why you have to do a sys.path.insert(1, path_to_dev_pyworkbooks)
In summary:
sys.path.append(path_to_dev_pyworkbooks)
import PyWorkbooks # does NOT import dev pyworkbooks, imports installed one
or:
sys.path.insert(1, path_to_dev_pyworkbooks) # based on comments you should use **1 not 0**
import PyWorkbooks # imports correct file
This has caused a few hangups for me in the past, and I would really like it if we (as a community) started recommending sys.path.insert(1, path), as if you are manually inserting a path I think it is safe to say that that is the path you want to use!
Or do I have something wrong? It's a question that sometimes bothers me and I wanted it in the open!
If you really need to use sys.path.insert, consider leaving sys.path[0] as it is:
sys.path.insert(1, path_to_dev_pyworkbooks)
This could be important since 3rd party code may rely on sys.path documentation conformance:
As initialized upon program startup, the first item of this list,
path[0], is the directory containing the script that was used to
invoke the Python interpreter.
If you have multiple versions of a package / module, you need to be using virtualenv (emphasis mine):
virtualenv is a tool to create isolated Python environments.
The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.
Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.
Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.
In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).
That's why people consider insert(0, to be wrong -- it's an incomplete, stopgap solution to the problem of managing multiple environments.
you are confusing the concept of appending and prepending. the following code is prepending:
sys.path.insert(1,'/thePathToYourFolder/')
it places the new information at the beginning (well, second, to be precise) of the search sequence that your interpreter will go through. sys.path.append() puts things at the very end of the search sequence.
it is advisable that you use something like virtualenv instead of manually coding your package directories into the PYTHONPATH everytime. for setting up various ecosystems that separate your site-packages and possible versions of python, read these two blogs:
python ecosystems introduction
bootstrapping python virtual environments
if you do decide to move down the path to environment isolation you would certainly benefit by looking into virtualenvwrapper: http://www.doughellmann.com/docs/virtualenvwrapper/

What is the pythonic way to share common files in multiple projects?

Lets say I have projects x and y in brother directories: projects/x and projects/y.
There are some utility funcs common to both projects in myutils.py and some db stuff in mydbstuff.py, etc.
Those are minor common goodies, so I don't want to create a single package for them.
Questions arise about the whereabouts of such files, possible changes to PYTHONPATH, proper way to import, etc.
What is the 'pythonic way' to use such files?
The pythonic way is to create a single extra package for them.
Why don't you want to create a package? You can distribute this package with both projects, and the effect would be the same.
You'll never do it right for all instalation scenarios and platforms if you do it by mangling with PYTHONPATH and custom imports.
Just create another package and be done in no time.
You can add path to shared files to sys.path either directly by sys.path.append(pathToShared) or by defining .pth files and add them to with site.addsitedir. Path files (.pth) are simple text files with a path in each line.
You can also create a .pth file, which will store the directory(ies) that you want added to your PYTHONPATH. .pth files are copied to the Python/lib/site-packages directory, and any directory in that file will be added to your PYTHONPATH at runtime.
http://docs.python.org/library/site.html
StackOVerflow question (see accepted solution)
I agree with 'create a package'.
If you cannot do that, how about using symbolic links/junctions (ln -s on Linux, linkd on Windows)?
I'd advise using setuptools for this. It allows you to set dependencies so you can make sure all of these packages/individual modules are on the sys.path before installing a package. If you want to install something that's just a single source file, it has support for automagically generating a simple setup.py for it. This may be useful if you decide not to go the package route.
If you plan on deploying this on multiple computers, I will usually set up a webserver with all the dependencies I plan on using so it can install them for you automatically.
I've also heard good things about paver, but haven't used it myself.

Categories