Finding asset files, when developing & when installed (C++/Python, primarily on Linux)

Finding asset files, when developing & when installed (C++/Python, primarily on Linux) - python

My project has a global asset directory (/usr/share/openage) that contains various files (graphics, texts, ...) and an user-specific asset directory (~/.openage) that allows the user to overwrite some of the global assets/add their own.
It is my understanding that when building, you pass the install prefix to the build system (e.g. ./configure --install-prefix=/usr), which will in turn generate a file (e.g. configure.h) which makes the install prefix available for the code (e.g. #define INSTALL_PREFIX="/usr"). The code will then look for its assets in INSTALL_PREFIX "share/openage". So far, so good.
However, when the project hasn't been installed yet (which is true in 99.9% of cases for me as the developer), the directory /usr/share/openage obviously doesn't exist yet; instead, I would want to use ./assets in the current directory. Even worse, if the installed directory exists (e.g. from an independent, earlier install), it might data incompatible to the current dev version.
Similarily, if running the installed project, I'd want it to use the user's home directory (~/.openage) as user asset directory, while in "devmode", it should use a directory like "./userassets".
It gets even worse when thinking about non-POSIX platforms. On Windows, INSTALL_PREFIX is useless since programs can be installed basically anywhere (do programs simply use the current working directory or as the asset directory?), and I don't have the slightest idea how Mac handles this.
So my question is: Is there a generally accepted "best way to do this"? Surely, hundreds of projects (basically every single project that has an asset directory) have dealt with this problem one way or on other.
Unfortunately, I don't even know what to google for. I don't even know how to tag this question.
Current ideas (and associated issues) include:
Looking for a file openage_version, which only exists in the source directory, in cwd. If it exists, assume that the project is currently uninstalled.
Issue: Even in "development mode", cwd might not always be the project root directory.
Checking whether readlink("/proc/self/exe") starts with INSTALL_PREFIX
Issue: platform-specific
Issue: theoretically, the project root directory could be in /usr/myweirdhomedirectory/git/openage
Forcing developers to specify an argument --not-installed, or to set an environment variable, OPENAGE_INSTALLED=0
Issue: Inconvenient
Issue: Forgetting to specify the argument will lead to confusion when the wrong asset directory is used
During development, call ./configure with a different INSTALL_PREFIX
Issue: When the project is built for installing, the recommended make test will run tests while the project is not installed
A combination of the first two options: Checking for dirname(readlink("proc/self/exe")) + "/openage_version"
Issue: Even more platform-specific
This seems like the most robust option so far

The solution I finally chose is to do the decision in the Python part of the application.
There is a python module, buildsystem.setup, which does not get installed during make install.
Using this fact, I can simply
def is_in_devmode():
try:
import ..buildsystem.setup
return True
except ImportError:
return False

Related

Uninstallation hooks for python packages

Is it possible to create "uninstall hooks" in setup.py files using setuptools.setup()?
I have an issue that my package needs to store some configuration files on the computer. The issue is when user uninstalls this package, the configuration files will stay. How can I detect when user uninstalls my package?
My guesses how to tackle this problem are:
a) Use some functionality in setuptools.setup() to create such a hook. I couldn't find any information about it existing, but even if it would exist, manual removal of files from site-packages directory probably wouldn't be detected.
b) Create a daemon that starts when machine boots up and check once per interval whenever package still exists, otherwise remove config files with the daemon. This approach could work, but it is complicated, system dependent and error prone while I want a simple solution.

Python application directory choices

I am writing a new Python application that I intend to distribute to several colleagues. Instead of my normal carefree attitude of just having everything self contained and run inside a folder in my home directory, this time I would like to broaden my horizon and actually try to utilize the Linux directory structure as it was intended (at least somewhat). Can you please read my breakdown below and comment and or make recommendations if this is not correct.
Lets call the application "narf"
/usr/narf - Install location for the actual python file(s).
/usr/bin/narf - Either a softlink to the main python file above or use this location instead.
/etc/narf - Any configuration files for app narf.
/var/log/narf - Any log files for app narf.
/usr/lib - Any required libraries for app narf.
/run/narf - Any persistent (across reboot), but still temp files for app narf.
/tmp/narf - Very temp files for app narf that go away with reboot
I assume I should stick to using /usr/X (for example /usr/bin instead of just /bin) since my application is not system critical and a mere addon.
I currently use Ubuntu 16 LTS, however part of this is intended as a way to try to standardize my app for any popular Linux distro.
Thanks for the help.
* UPDATE *
I think I see the answer to at least part of my question. Looking in /usr, I now see that it is a pretty barebones directory and almost akin to user level root directory (ie has bin, lib, local, sbin, etc. but thats pretty much all). This leads me to believe my application should absolutely NOT live in /usr, and ONLY in /usr/bin.

You'd be better off putting your entire application into /opt. See here: http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/Linux-Filesystem-Hierarchy.html#opt
Then put a soft link to the executable into /usr/local/bin. see here: https://unix.stackexchange.com/a/8658/219043
I wouldn't worry about the rest.

Your application should not live in the /usr/ directory. If you want to package your application into a distribution, please refer to these guides:
Packaging and Distributing Projects
How To Package And Distribute Python Applications
You can for sure write to unix directories within your application when appropriate, but keep in mind there are mechanisms built into setup.py that help with the installation side of this (for example).
If this is something private, I'd suggest making this a private repository on GitHub and have your colleagues install it through pip.

Where is the proper place to put Python virtual environments according to the Linux Filesystem Hierarchy Standard?

As the title asks, what is the technically proper location to store Python virtual environments on Linux operating systems according to the Linux FHS?
Stated another way that allows for a clear answer: Is it "technically correct" to separate the location of a Python virtual environment from the data files you are serving?
Note: This question differs from the closest, already-asked question I could find, as virtual-environments contain libraries, binaries, header files, and scripts.
As an added complication, I tend to write code that supports internet-accessible services. However, I don't see this as substantially differentiating my needs from scenarios in which the consumers of the service are other processes on the same server. I'm mentioning this detail in case my responses to comments include "web dev"-esque content.
For reference, I am using the following documentation as my definition of the Linux FHS: http://www.pathname.com/fhs/pub/fhs-2.3.html
I do not believe the popular virtualenv-wrapper script suggests the correct action, as it defaults to storing virtual environments in a user's home directory. This violates the implicit concept that the directory is for user-specific files, as well as the statement that "no program should rely on this location."
From the root level of the file system, I lean towards /usr (shareable, read-only data) or /srv (Data for services provided by this system), but this is where I have a hard time deciding further.
If I was to go alongside the decision of my go-to reverse proxy, that means /usr. Nginx is commonly packaged to go into /usr/share/nginx or /usr/local/nginx, however, /usr/ is supposed to be mounted read-only according to the FHS. I find this strange because I've never worked on a single project in which development happened so slowly that "unmount as read-only/remount with write, unmount/remount as read-only" was considered worth the effort.
/srv is another possible location, but is stated as the "location of the data files for particular service," whereas a Python virtual environment is more focused on libraries and binaries for what provides a service (without this differentiation, .so files would also be in srv). Also, multiple services with the same requirements could share a virtual environment, which violates the "particular" detail of the description.
I believe that part of the difficulty in choosing a correct location is because the virtual environment is an "environment," which consists of both binaries and libraries (almost like its own little hierarchy), which pushes my impression that somewhere under /usr is more conventional:
virtual-env/
├── bin ~= /usr/local : "for use by the system administrator when installing software locally"
├── include ~= /usr/include : "Header files included by C programs"
├── lib ~= /usr/lib : "Libraries for programming and packages"
└── share ~= /usr/local
With my assumptions and thoughts stated: consider the common scenario of Nginx acting as a reverse proxy to a Python application. Is it correct to place a virtual environment and source code (e.g. application.py) under /usr/local/service_name/ while using /srv for files that are changed more often (e.g. 'static' assets, images, css)?
edit: To be clear: I know why and how to use virtualenvs. I am by no means confused about project layouts or working in development environments.

As the title asks, what is the technically proper location to store
Python virtual environments on Linux operating systems according to
the Linux FHS?
Keep in mind that the Linux FHS is not really a standard, it is a set of guidelines. It is only referred to as a standard by the LSB - which is just a bunch of rules that make supporting Linux easier.
/run, /sys, /proc and /usr/local are all not part of the LFS but you see them in most linux distributions.
For me the clear choice to put virtual environments is /opt, because this location is reserved for the installation of add-on software packages.
However, on most Linux distributions only root can write to /opt, which makes this a poor choice because one of the main goals of virtual environments is to avoid being root.
So, I would recommend /usr/local (if its writable by your normal user account) - but there is nothing wrong with installing it in your home directory.
Stated another way that allows for a clear answer: Is it "technically
correct" to separate the location of a Python virtual environment from
the data files you are serving?
I'm not sure what you mean by "data files you are serving", but here are the rules for virtual environments:
Don't put them in source control.
Maintain a list of installed packages, and put this in version control. Remember that virtual environments are not exactly portable.
Keep your virtual environment separate from your source code.
Given the above, you should keep your virtual environment separate from your source code.
consider the common scenario of Nginx acting as a reverse proxy to a
Python application. Is it correct to place a virtual environment and
source code (e.g. application.py) under /usr/local/service_name/ while
using /srv for more dynamic files (e.g. 'static' assets, images)?
Static assets are not dynamic files, I think you are confusing terms.
Either way, you should do the following:
Create a user account to run that application.
Put the application files under a directory that is controlled by that user and that user alone. Typically this is the /home/username directory, but you can make this /services/servicename. Place the virtual environment as a subset of this directory, in a standard naming format. For example, I use env.
Put your static assets, such as all media files, css files, etc. in a directory that is readable by your front end server. So, typically you would make a www directory or a public_html directory.
Make sure that the user account you create for this application has write access to this asset directory, so that you are able to update files. The proxy server should not have execute permissions on this directory. You can accomplish this by changing the group of the directory to the same as that of the proxy server user. Given this, I would put this directory under /home/username/ or /services/servicename.
Launch the application using a process manager, and make sure your process manager switches the user to the one created in step 1 when running your application code.
Finally, I cannot stress this enough DOCUMENT YOUR PROCESS and AUTOMATE IT.

Local collection of Python packages: best way to import them?

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)

Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*

I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

Why use sys.path.append(path) instead of sys.path.insert(1, path)?

Edit: based on a Ulf Rompe's comment, it is important you use "1" instead of "0", otherwise you will break sys.path.
I have been doing python for quite a while now (over a year), and I am always confused as to why people recommend you use sys.path.append() instead of sys.path.insert(). Let me demonstrate.
Let's say I am working on a module named PyWorkbooks (that is installed on my computer), but I am simultaneously working on a different module (let's say PyJob) that incorporates PyWorkbooks. As I'm working on PyJob I find errors in PyWorkbooks that I am correcting, so I would like to import a development version.
There are multiple ways to work on both (I could put my PyWorkbooks project inside of PyJob, for instance), but sometimes I will still need to play with the path. However, I cannot simply do a sys.path.append() to the folder where PyWorkbooks is at. Why? Because python will find my installed PyWorkbooks first!
This is why you have to do a sys.path.insert(1, path_to_dev_pyworkbooks)
In summary:
sys.path.append(path_to_dev_pyworkbooks)
import PyWorkbooks # does NOT import dev pyworkbooks, imports installed one
or:
sys.path.insert(1, path_to_dev_pyworkbooks) # based on comments you should use **1 not 0**
import PyWorkbooks # imports correct file
This has caused a few hangups for me in the past, and I would really like it if we (as a community) started recommending sys.path.insert(1, path), as if you are manually inserting a path I think it is safe to say that that is the path you want to use!
Or do I have something wrong? It's a question that sometimes bothers me and I wanted it in the open!

If you really need to use sys.path.insert, consider leaving sys.path[0] as it is:
sys.path.insert(1, path_to_dev_pyworkbooks)
This could be important since 3rd party code may rely on sys.path documentation conformance:
As initialized upon program startup, the first item of this list,
path[0], is the directory containing the script that was used to
invoke the Python interpreter.

If you have multiple versions of a package / module, you need to be using virtualenv (emphasis mine):
virtualenv is a tool to create isolated Python environments.
The basic problem being addressed is one of dependencies and versions, and indirectly permissions. Imagine you have an application that needs version 1 of LibFoo, but another application requires version 2. How can you use both these applications? If you install everything into /usr/lib/python2.7/site-packages (or whatever your platform’s standard location is), it’s easy to end up in a situation where you unintentionally upgrade an application that shouldn’t be upgraded.
Or more generally, what if you want to install an application and leave it be? If an application works, any change in its libraries or the versions of those libraries can break the application.
Also, what if you can’t install packages into the global site-packages directory? For instance, on a shared host.
In all these cases, virtualenv can help you. It creates an environment that has its own installation directories, that doesn’t share libraries with other virtualenv environments (and optionally doesn’t access the globally installed libraries either).
That's why people consider insert(0, to be wrong -- it's an incomplete, stopgap solution to the problem of managing multiple environments.

you are confusing the concept of appending and prepending. the following code is prepending:
sys.path.insert(1,'/thePathToYourFolder/')
it places the new information at the beginning (well, second, to be precise) of the search sequence that your interpreter will go through. sys.path.append() puts things at the very end of the search sequence.
it is advisable that you use something like virtualenv instead of manually coding your package directories into the PYTHONPATH everytime. for setting up various ecosystems that separate your site-packages and possible versions of python, read these two blogs:
python ecosystems introduction
bootstrapping python virtual environments
if you do decide to move down the path to environment isolation you would certainly benefit by looking into virtualenvwrapper: http://www.doughellmann.com/docs/virtualenvwrapper/

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.