Where to store data files generated by linux software?

Where to store data files generated by linux software? - python

I have a Python script with a make install (by default to /usr/local/lib/python2.7/dist-packages/) option available. But the script also generates files with user-specific mutable data during script's normal usage. It seems to me that I should not keep compiled script files in couple with data. What is conventional default place to store software data in such cases?

Summarizing from the Filesystem Hierarchy Standard:
Immutable architecture-independent data should go in /usr/share or /usr/local/share. Mutable data should go in the user's home directory if it's user-specific (XDG provides more guidance here), or in /var if it's system-wide (this usually requires a group-owned directory and files, and a setgid application, to allow writing to a shared file).
/usr/share and /usr/local/share usually have a structure that somewhat mirrors /usr/lib and /usr/local/lib; I don't know about Python, but Perl has a module File::ShareDir that helps a module with installing and accessing data in a share directory corresponding to the directory where the module is installed.
And don't forget the other option: just ask the user where the data should go.

Related

Correct way to handle configuration files using setuptools

I've got a Python (2.x) project which contains 2 Python fragment configuration files (import config; config.FOO and so on). Everything is installed using setuptools causing these files to end up in the site-packages directory. From a UNIX perspective it would be nice to have the configuration for a software suite situated in /etc so people could just edit it without resorting to crawl into /usr/lib/python*/site-packages. On the other hand it would be nice to retain the hassle-free importing.
I've got 2 "fixes" in mind that would resolve this issue:
Create a softlink from /etc/stuff.cfg to the file in site-packages (non-portable and ugly)
Write a configuration management tool (somewhat like a registry) that edit site-packages directly (waay more work that I am willing to do).
I am probably just incapable of finding the appropriate documentation as I can't imagine that there is no mechanism to do this.

Compiling Python statically

I can use
python -m py_compile mytest.py
And it will byte-compile the file. From reading some other documentation, it was my impression that it byte-compiled any modules imported. But if I change any of the files it imports, I see the changed functionality. Is there some way to completely compile a python script and modules it imports, so that any changes to the originals don't reflect? I want to do this for security purposes, essentially creating a "trusted" version which can't be subverted by changing the functionality of any modules that it calls.

If you compile to bytecode, then delete the source files, then the bytecode can't change. But if someone has the ability to change source files on your machine, they can also change bytecode files on your machine. I don't think this will give you any actual security.
If you want a single-file Python program, you can run from a zip file.
Another option is to use cx_freeze or a similar program to compile the program into a native executable.

We solved this problem by developing our own customer loader using signet http://jamercee.github.io/signet/. Signet will scan your python source and it's dependencies and calculate sha1 hashes, which it embeds in the loader. You deliver the loader AND your script to your users with instructions they run the loader. On invocation, it re-calculates the hashes and if they match, control is then transferred to your script. Signet also supports code signing and PE verification.

Freeze/Bundle:
bbfreeze cross-platform
cx_freeze cross-platform
py2exe Windows
py2app OS X
pyinstaller cross-platform
Cryptography:
Assuming your "process" for running "scripts" is secure and cannot be tampered with:
Create secure hashes of the scripts and record them (e.g: SHA1)
When executing "scripts" ensure their cryptographic hashes match (ensuring they haven't been tampered with).
A common approach to this for secure protocols and apis is to use HMAC

Where is the proper place to put Python virtual environments according to the Linux Filesystem Hierarchy Standard?

As the title asks, what is the technically proper location to store Python virtual environments on Linux operating systems according to the Linux FHS?
Stated another way that allows for a clear answer: Is it "technically correct" to separate the location of a Python virtual environment from the data files you are serving?
Note: This question differs from the closest, already-asked question I could find, as virtual-environments contain libraries, binaries, header files, and scripts.
As an added complication, I tend to write code that supports internet-accessible services. However, I don't see this as substantially differentiating my needs from scenarios in which the consumers of the service are other processes on the same server. I'm mentioning this detail in case my responses to comments include "web dev"-esque content.
For reference, I am using the following documentation as my definition of the Linux FHS: http://www.pathname.com/fhs/pub/fhs-2.3.html
I do not believe the popular virtualenv-wrapper script suggests the correct action, as it defaults to storing virtual environments in a user's home directory. This violates the implicit concept that the directory is for user-specific files, as well as the statement that "no program should rely on this location."
From the root level of the file system, I lean towards /usr (shareable, read-only data) or /srv (Data for services provided by this system), but this is where I have a hard time deciding further.
If I was to go alongside the decision of my go-to reverse proxy, that means /usr. Nginx is commonly packaged to go into /usr/share/nginx or /usr/local/nginx, however, /usr/ is supposed to be mounted read-only according to the FHS. I find this strange because I've never worked on a single project in which development happened so slowly that "unmount as read-only/remount with write, unmount/remount as read-only" was considered worth the effort.
/srv is another possible location, but is stated as the "location of the data files for particular service," whereas a Python virtual environment is more focused on libraries and binaries for what provides a service (without this differentiation, .so files would also be in srv). Also, multiple services with the same requirements could share a virtual environment, which violates the "particular" detail of the description.
I believe that part of the difficulty in choosing a correct location is because the virtual environment is an "environment," which consists of both binaries and libraries (almost like its own little hierarchy), which pushes my impression that somewhere under /usr is more conventional:
virtual-env/
├── bin ~= /usr/local : "for use by the system administrator when installing software locally"
├── include ~= /usr/include : "Header files included by C programs"
├── lib ~= /usr/lib : "Libraries for programming and packages"
└── share ~= /usr/local
With my assumptions and thoughts stated: consider the common scenario of Nginx acting as a reverse proxy to a Python application. Is it correct to place a virtual environment and source code (e.g. application.py) under /usr/local/service_name/ while using /srv for files that are changed more often (e.g. 'static' assets, images, css)?
edit: To be clear: I know why and how to use virtualenvs. I am by no means confused about project layouts or working in development environments.

As the title asks, what is the technically proper location to store
Python virtual environments on Linux operating systems according to
the Linux FHS?
Keep in mind that the Linux FHS is not really a standard, it is a set of guidelines. It is only referred to as a standard by the LSB - which is just a bunch of rules that make supporting Linux easier.
/run, /sys, /proc and /usr/local are all not part of the LFS but you see them in most linux distributions.
For me the clear choice to put virtual environments is /opt, because this location is reserved for the installation of add-on software packages.
However, on most Linux distributions only root can write to /opt, which makes this a poor choice because one of the main goals of virtual environments is to avoid being root.
So, I would recommend /usr/local (if its writable by your normal user account) - but there is nothing wrong with installing it in your home directory.
Stated another way that allows for a clear answer: Is it "technically
correct" to separate the location of a Python virtual environment from
the data files you are serving?
I'm not sure what you mean by "data files you are serving", but here are the rules for virtual environments:
Don't put them in source control.
Maintain a list of installed packages, and put this in version control. Remember that virtual environments are not exactly portable.
Keep your virtual environment separate from your source code.
Given the above, you should keep your virtual environment separate from your source code.
consider the common scenario of Nginx acting as a reverse proxy to a
Python application. Is it correct to place a virtual environment and
source code (e.g. application.py) under /usr/local/service_name/ while
using /srv for more dynamic files (e.g. 'static' assets, images)?
Static assets are not dynamic files, I think you are confusing terms.
Either way, you should do the following:
Create a user account to run that application.
Put the application files under a directory that is controlled by that user and that user alone. Typically this is the /home/username directory, but you can make this /services/servicename. Place the virtual environment as a subset of this directory, in a standard naming format. For example, I use env.
Put your static assets, such as all media files, css files, etc. in a directory that is readable by your front end server. So, typically you would make a www directory or a public_html directory.
Make sure that the user account you create for this application has write access to this asset directory, so that you are able to update files. The proxy server should not have execute permissions on this directory. You can accomplish this by changing the group of the directory to the same as that of the proxy server user. Given this, I would put this directory under /home/username/ or /services/servicename.
Launch the application using a process manager, and make sure your process manager switches the user to the one created in step 1 when running your application code.
Finally, I cannot stress this enough DOCUMENT YOUR PROCESS and AUTOMATE IT.

Python: Writing to files within packages?

Using this general structure:
setup.py
/package
__init__.py
project.py
/data
client.log
I have a script that saves a list of names to client.log, so I don't have to reinitialize that list each time I need access to it or run the module. Before I set up this structure with pkg_resources, I used open('.../data/client.log', 'w') to update the log with explicit paths, but this doesn't work anymore.
Is there any way to edit data files within modules? Or is there a better way to save this list?

No, pkg_resources are for reading resources within a package. You can't use it to write log files, because it's the wrong place for log files. Your package directory should typically not be writeable by the user that loads the library. Also, your package may in fact be inside a ZIP-file.
You should instead store the logs in a log directory. Where to put that depends on a lot of things, the biggest issue is your operating system but also if it's system software or user software.

Python import module (xlwt) from archive

Let's say Tight Ars & Co. is a company with incredibly tight security policies, and lets assume I work for this company. Assume they've one task that requires a python script to write to excel files, and I find this incredibly wonderful library called xlwt. Now my script is able to write to excel files, everything is wonderful and the sun is shining, I release the code, and suddenly I'm asked what is this thingamajig setup.py, why should we run it? wait, we'll not even run it, we want the environment to be clean from third party code etc etc, since I'm unaware of any wizardry or voo doo is there any way I can package the dependent libraries and import them in my script?

All setup.py typically does with any pure-Python package is copy files into a standard place and compile the .py files to .pyc. I can't imagine why your employer would regard that as (nasty) third-party software, but the source of the package is OK, your IDE is OK, Python itself is OK, etc ...
Options:
(1) Copy the xlwt directory from a source distribution to somewhere that's listed in sys.path
(2) Make a ZIP file xlwt.zip containing the contents of the xlwt directory and copy it to ditto.
(3) As (2) but compile the .py files to .pyc first.
If somebody points out that the above involves error-prone manual steps, you can:
(a) write a script to do that
or
(b) copy setup.py, change its name, pretend that you wrote it yourself, use it, ...

Unless I am misunderstanding the question you should be able to obtain the source archive and simply copy the "xlwt" directory to the same directory as your script and it should be importable from the local directory.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.