What is the pythonic way to share common files in multiple projects? - python

Lets say I have projects x and y in brother directories: projects/x and projects/y.
There are some utility funcs common to both projects in myutils.py and some db stuff in mydbstuff.py, etc.
Those are minor common goodies, so I don't want to create a single package for them.
Questions arise about the whereabouts of such files, possible changes to PYTHONPATH, proper way to import, etc.
What is the 'pythonic way' to use such files?

The pythonic way is to create a single extra package for them.
Why don't you want to create a package? You can distribute this package with both projects, and the effect would be the same.
You'll never do it right for all instalation scenarios and platforms if you do it by mangling with PYTHONPATH and custom imports.
Just create another package and be done in no time.

You can add path to shared files to sys.path either directly by sys.path.append(pathToShared) or by defining .pth files and add them to with site.addsitedir. Path files (.pth) are simple text files with a path in each line.

You can also create a .pth file, which will store the directory(ies) that you want added to your PYTHONPATH. .pth files are copied to the Python/lib/site-packages directory, and any directory in that file will be added to your PYTHONPATH at runtime.
http://docs.python.org/library/site.html
StackOVerflow question (see accepted solution)

I agree with 'create a package'.
If you cannot do that, how about using symbolic links/junctions (ln -s on Linux, linkd on Windows)?

I'd advise using setuptools for this. It allows you to set dependencies so you can make sure all of these packages/individual modules are on the sys.path before installing a package. If you want to install something that's just a single source file, it has support for automagically generating a simple setup.py for it. This may be useful if you decide not to go the package route.
If you plan on deploying this on multiple computers, I will usually set up a webserver with all the dependencies I plan on using so it can install them for you automatically.
I've also heard good things about paver, but haven't used it myself.

Related

Is adding project root directory to sys.path a good practice?

I have a question about adding project path to python, for facilitating import effort.
Situation
When I write code in Python, I usually add necessary path to sys.path by using
import sys
sys.path.append("/path/to/dir/") # almost every `.py` need this
Sometimes, when my project gets bigger with many levels of directories, this approach seems bulky and error-prone (especially when I re-organize my files)
Recently, I start using a bash script (located at project root directory) that adding the sys.path.append with project root argument to .py file in the project. With this approach, I hardly have to manually care about importing a module.
Question
My question is: Is that a good practice? I find it convenient for myself, compared to my old method, but since the bash script is a separated file, I need 2 command to run any script in my project (one for the bash and one for the .py). I can include the command calling .py to the bash, but it far less flexible than directly call it from terminal.
Really want to hear some advices! Thanks in advance. Any suggestion will be gratefully appreciated!
It is generally not good practice to use manipulate sys.path within a python library or program. You should add the relevant paths to the PYTHONPATH in the calling environment for your python program:
PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH" python ...
or
export PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH"
python ...
This allows you to easily manipulate the paths that your program or library will search for dependencies easily without modifying your code.
It is also very easy to manage this in your personal development environment by modifying your bashrc or in your production environments in your init script (or other wrapper script) and provides you with one place to update each time you add or modify your project paths.
Given that you mention that you have almost one directory per .py file, you should also consider how your code might be organized into packages to further simplify your setup.
It's not a particularly good practice, though you could get away with it. Better to look into virtualenv though (or pipenv) for a smoother workflow.

Correct way to handle configuration files using setuptools

I've got a Python (2.x) project which contains 2 Python fragment configuration files (import config; config.FOO and so on). Everything is installed using setuptools causing these files to end up in the site-packages directory. From a UNIX perspective it would be nice to have the configuration for a software suite situated in /etc so people could just edit it without resorting to crawl into /usr/lib/python*/site-packages. On the other hand it would be nice to retain the hassle-free importing.
I've got 2 "fixes" in mind that would resolve this issue:
Create a softlink from /etc/stuff.cfg to the file in site-packages (non-portable and ugly)
Write a configuration management tool (somewhat like a registry) that edit site-packages directly (waay more work that I am willing to do).
I am probably just incapable of finding the appropriate documentation as I can't imagine that there is no mechanism to do this.

Why use sys.path.append(path) instead of sys.path.insert(1, path)?

Edit: based on a Ulf Rompe's comment, it is important you use "1" instead of "0", otherwise you will break sys.path.
I have been doing python for quite a while now (over a year), and I am always confused as to why people recommend you use sys.path.append() instead of sys.path.insert(). Let me demonstrate.
Let's say I am working on a module named PyWorkbooks (that is installed on my computer), but I am simultaneously working on a different module (let's say PyJob) that incorporates PyWorkbooks. As I'm working on PyJob I find errors in PyWorkbooks that I am correcting, so I would like to import a development version.
There are multiple ways to work on both (I could put my PyWorkbooks project inside of PyJob, for instance), but sometimes I will still need to play with the path. However, I cannot simply do a sys.path.append() to the folder where PyWorkbooks is at. Why? Because python will find my installed PyWorkbooks first!
This is why you have to do a sys.path.insert(1, path_to_dev_pyworkbooks)
In summary:
sys.path.append(path_to_dev_pyworkbooks)
import PyWorkbooks # does NOT import dev pyworkbooks, imports installed one
or:
sys.path.insert(1, path_to_dev_pyworkbooks) # based on comments you should use **1 not 0**
import PyWorkbooks # imports correct file
This has caused a few hangups for me in the past, and I would really like it if we (as a community) started recommending sys.path.insert(1, path), as if you are manually inserting a path I think it is safe to say that that is the path you want to use!
Or do I have something wrong? It's a question that sometimes bothers me and I wanted it in the open!
If you really need to use sys.path.insert, consider leaving sys.path[0] as it is:
sys.path.insert(1, path_to_dev_pyworkbooks)
This could be important since 3rd party code may rely on sys.path documentation conformance:
As initialized upon program startup, the first item of this list,
path[0], is the directory containing the script that was used to
invoke the Python interpreter.
If you have multiple versions of a package / module, you need to be using virtualenv (emphasis mine):
virtualenv is a tool to create isolated Python environments.
The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.
Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.
Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.
In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).
That's why people consider insert(0, to be wrong -- it's an incomplete, stopgap solution to the problem of managing multiple environments.
you are confusing the concept of appending and prepending. the following code is prepending:
sys.path.insert(1,'/thePathToYourFolder/')
it places the new information at the beginning (well, second, to be precise) of the search sequence that your interpreter will go through. sys.path.append() puts things at the very end of the search sequence.
it is advisable that you use something like virtualenv instead of manually coding your package directories into the PYTHONPATH everytime. for setting up various ecosystems that separate your site-packages and possible versions of python, read these two blogs:
python ecosystems introduction
bootstrapping python virtual environments
if you do decide to move down the path to environment isolation you would certainly benefit by looking into virtualenvwrapper: http://www.doughellmann.com/docs/virtualenvwrapper/

How to distribute / access data files in Python egg?

I'm writing a Django application that is using pip & virtualenv to manage its development environment.
One of the dependencies, pkgme, comes with many data files which are its "backends" and are configured in its setup.py with data_files=$FOO (rather than package_data).
When pkgme looks for its backends, it looks in os.path.join(sys.prefix, "share", "pkgme", "backends"). This works great when pkgme has been installed normally, and seems to match the documentation but does not work when pkgme is installed as an egg.
There, the data files are installed under $VIRTUAL_ENV/lib/python2.7/site-packages/pkgme-0.1-py2.7.egg/share rather than the expected $VIRTUAL_ENV/share.
Which leaves me with two questions:
Should I be using something other than the os.path.join above to find the data files regardless of whether we are using an egg installation or a traditional system installation? If so, what?
Should I be distributing my data files differently so as to make them more readily available in an egg?
Note that I know about pkgutil.get_data, but would rather not use it. I'm not interested in the contents of these data files, I want to know their location instead, so I can execute them.
My current plan is to do this:
Use package_data instead of data_files
Change pkgme to look for backends relative to pkgme.__file__ rather than sys.prefix
Your current plan is essentially correct, or is at any rate a workable option.
When setuptools creates an egg, it checks whether code in the egg makes use of __file__, and if so, it marks the egg as not being installable in compressed form. In this way, when the egg is installed by easy_install, it'll get extracted to an .egg/ directory instead of being left in an .egg file.
If you want to support compressed/drop-in installation (i.e., just dumping the egg in a directory without "installing" it), then you should use the pkg_resources.resource_filename() (docs here) API instead of __file__, but then your package will be dependent on setuptools or distribute in order to have that API available.
I ended up doing the following:
Changing pkgme to use pkg_resources.resource_filename() to find its own included backends
Added an entry point that any backend written in Python can use to publish the location of its own backend scripts
Kept the sys.prefix-based check for any backend that don't want to use Python
The diff can be found here: http://bazaar.launchpad.net/~pkgme-committers/pkgme/trunk/revision/86

Distributing Python packages which depend on in-house common convenience libraries

I have a few Python packages that I would like to tidy up and publish on PyPI. These packages import a couple of Python modules I've written to augment or simplify certain operations (e.g., reading/writing from CSV files with headers by wrapping csv functions), provide handy data structures, etc. Currently these modules are housed in the top-level directory that holds the code for my projects, and I rely on reaching them by having added that directory to my PYTHONPATH environmental variable. (Less than tidy, I know.)
By creating a separate package for these modules and uploading them on PyPI, I could mark such a package as a dependency for the packages I actually want to distribute. These convenience modules are, however, small and of limited use and interest, such that I don't think they warrant distributing as a separate package on PyPI. On the other hand, I am hesitant to copy these convenience modules (i.e., use cp convenience_module.py projectX/.) into each project directory, as this creates multiple copies of the same file both in the VCS repository housing my Python code and in the different source distribution tarballs I would post to PyPI. Is there an elegant solution to this problem?
You don't say why you're hesitant to 'provide copies'. In general, I think a reasonable approach is to think about how you've set things up for yourself to use the convenience modules. Did you install them in site-packages (or equivalent), or did you just depend on them being in the directory you ran the code from? However you use the modules, is that situation ideal, or is there a way that would be nicer for you?
Start with that, and figure out how to automate it through setup.py, which lets you put things wherever you want on the system (though I strongly discourage abusing this capability).
Whether you distribute them as a tarball or with the package that needs them, you still have to maintain all of the files, so the only real question is whether you intend for those convenience modules to develop their own user communities with their own support requests, etc., or whether they're decidedly intended only for use in support of this other module.
If you intend those modules to be used only for the one module, include them in the package, perhaps in a 'utils' package inside the distribution. Otherwise you're just cluttering the index with things people might think are useful, but are really joined at the hip with something else that drives the changes and maintenance of them.
If you intend those modules to be generic, and intend to maintain them as such, and think they have use outside of supporting this module, distribute them separately.
As far as I know, distributing these small packages via PyPI is only viable option. Yes, it clutters the index with near-useless packages, but its something that should be solved by PyPI maintainers, not package developers. Another alternative is to use stdlib's or other util packages data and functions rather than reinventing the wheel.
Just make sure you describe that utils package as such, or extend them in something more useful for others.

Categories