Python Packaging: user specified file path - python

I am writing a python package which is dependent on a number of large data files. These files are not included with the package. Instead, users are required to have these files on disk already (with an arbitrary path). What is the best way to make my package aware of the location of these files?
I have been reading about setup.py and setup.cfg but I am still not sure how to to this. It seems to me that a user-editable option in setup.cfg would be a good way to go but I don't know if it is, if it can be done at all, or how I would do it if so...
I did see this almost identical question, Python Packaging: Ask user for variable values when (pip) installing, which focuses on user input during pip (which is discouraged in the comments). If that actually would be a good solution, I'm interested in how to do that too.
In my private development of the package, I have used module constants, as in
DEFAULT_PATH_FILE1 = "my/path/to/file1.csv"
DEFAULT_PATH_FILE2 = "my/path/to/file2.csv"
etc. and properties initialized with these constants. This doesn't seem viable at all for distribution.

What you want is not a one-time setup during install (which is also impossible with modern .whl installs), but a way for clients to configure your library at any point during runtime. Given that you don't provide a cli, you can either use environment variables as an option to provide that , or look for a user-defined config file.
This here is a simple recipe using appdirs to find out where the config file should be found. It loads on import of your package, and tells clients how bad it is if the config file isn't there. Usually, that'd be:
write a log message
use default settings
throw some kind of exception
a combination of the above
from logging import getLogger
from pathlib import Path
from configparser import ConfigParser
# loads .ini format files easily, just to have an example to go with
import appdirs # needs to be pip-installed
log = getLogger(__name__)
config = ConfigParser(interpolation=None)
# load config, substitute "my_package" with the actual name of your package
config_path = Path(appdirs.user_config_dir("my_package")) / "user.ini"
try:
with open(config_path) as f:
config.read_file(f, source="user")
except FileNotFoundError:
# only do whatever makes sense
log.info(f"User config expected at '{config_path}', but not found.")
config.read_string("[pathes]\nfile_foo=foo\nfile_bar=bar") # dubious
raise ImportError(f"Can't use this module; create a config at '{config_path}'.")
class Foo:
def __init__(self):
with open(cfg["pathes"]["file_foo"]) as f:
self.data = f.read()

This sounds like runtime configuration. It's none of the business of setup.py, which is concerned with installing your package.
For app configuration, it would be common to specify this resource location by command-line argument, environment variable, or configuration file. You will usually want to either hard-code some sensible default path in case the user does not specify any configuration, or raise an exception in case of resources not existing / not found.
Example for environment var:
import os
DEFAULT_PATH_FILE1 = "/default/path/to/file1.csv"
PATH_FILE1 = os.environ.get("PATH_FILE1", DEFAULT_PATH_FILE1)

Related

Packaging a python app with a config file?

I'm writing a Python CLI app and want to package it. I need the app to have a config file that applies to the entire "installation" (i.e. no matter where the user calls my CLI app from, it needs to read this config file).
I want this to be in the package install directory, not just some arbitrary place on the filesystem if I can avoid it. What's the un-messy way to do this?
Sorry to give a "that's not what you want to do"-answer, but I would strongly advice against bundling an editable config file into your package. The reasons being:
Any serious os has well-defined standards where user- or system-level configs should go. Putting your app's config there as well is not messy at all.
Python packages (read, .wheel files) don't need to be unzipped in order to be runnable, so any os that supports python may chose not to do so on install. If you want to edit a config that is in there, you need to unzip the package first, which is a bit inconvenient.
For the same reason, the config can't be found with search-tools. So good luck to your users if they ever forget where it was.
Last and least importantly, there is an assumption that an executable is static. In other languages where the code is compiled there is no way for it not to be, but for interpreted languages I'd deem it good style to follow suit.
But the best argument for following standards is usually that you get to use well-written tools that support said standard, in this case appdirs. It can (among other things) find the common config-directory for you, so using it is as simple as this:
from pathlib import Path
from appdirs import site_config_dir
from configparser import ConfigParser
def load_config():
# .ini is easiest to work with, but .toml is probably better in the long run
cfg_loc = Path(site_config_dir(appname="my_cli_app", appauthor="K4KFH")) / "config.ini"
# that's it, that was the magic. the rest of the code here is just for illustration
if not cfg_loc.exists():
cfg_loc.parent.mkdir(parents=True, exist_ok=True)
with open(config_loc) as f:
f.write(
"[basic]\n"
"foo = 1\n"
)
print(f"Initialized new default config at {config_loc}.")
cfg = ConfigParser()
cfg.read(cfg_loc)
return cfg
Which, on Windows, will get you this:
>>> cfg = load_config()
Initialized new default config at C:\ProgramData\K4KFH\my_cli_app\config.ini.
>>> cfg["basic"]["foo"]
1
And on debian buster this:
>>> cfg = load_config()
Initialized new default config at /etc/xdg/my_cli_app/config.ini.
>>> cfg["basic"]["foo"]
1

python configuration at compile time

As of now we have a file conf.py which stores most of our configuration variables for the service. We have code similar to this:
environment = 'dev' # could be dev, local, staging, production
configa = 'something'
configb = 'something else'
if environment = 'dev':
configa = 'something dev'
elif environment = 'local':
configa = 'something local'
Is this the right way to manage configuration file in a python project? Are these configuration loaded into variables at compile time (while creating pyc files), or are the if conditions checked every time the conf is imported in a python script or is it every time a configuration variable is accessed?
All code runs at import time. But since you are unlikely to import your application again and again while it's running you can ignore the (minimal) overhead.
This is subjective but there is a good discussion in this post:
What's the best practice using a settings file in Python?
With your method, it will be treated the same way as any other python script, i.e. on import. If you wanted it updated on access/or without restarting the service it is best to use an external/non-python config file (e.g. json, .ini) and set up functionality to refresh the file.
You must create a file, example settings.py,
add the path to the module where the file to the system.
Example:
sys.path.append(os.path.dirname(__file__))
Аfter anywhere you can import the file and to obtain from any setting:
import settings
env = settings.environment
Similarly, many working frameworks.

Configuration variables for a collection of scripts in Python

I have a collection of scripts written in Python. Each of them can be executed independently. However, most of the time they should be executed one after the other, so there is a MainScript.py which calls them in the appropriate order. Each script has some configurable variables (let's call them Root_Dir, Data_Dir and LinWinFlag). If this collection of scripts is moved to a different computer, or different data needs to be processed, these variable values need to be changed. As there are many scripts this duplication is annoying and error-prone. I would like to group all configuration variables into a single file.
I tried making Config.py which would contain them as per this thread, but import Config produces ImportError: No module named Config because they are not part of a package.
Then I tried relying on variable inheritance: define them once in MainScript.py which calls all the others. This works, but I realized that each script would not be able to run on its own. To solve this, I tried adding useGlobal=True in MainScript.py and in other files:
if (useGlobal is None or useGlobal==False):
# define all variables
But this fails when scripts are run standalone: NameError: name 'useGlobal' is not defined. The workaround is to define useGlobal and set it to False when running the scripts independently of MainScript.py. It there a more elegant solution?
The idea is that python wants to access files - including the Config.py - primarily as part of a module.
The nice thing is that Python makes building modules (i.e. python packages) really easy - initializing it can be done by creating a
__init__.py
file in each directory you want as a module, a submodule, a subsubmodule, and so on.
So your import should go through if you have created this file.
If you have further questions, look at the excellent python documentation.
The best way to do this is to use a configuration file placed in your home directory (~/.config/yourscript/config.json).
You can then load the file on start and provide default values if the file does not exist :
Example (config.py) :
import json
default_config = {
"name": "volnt",
"mail": "oh#hi.com"
}
def load_settings():
settings = default_config
try:
with open("~/.config/yourscript/config.json", "r") as config_file:
loaded_config = json.loads(config_file.read())
for key in loaded_config:
settings[key] = loaded_config[key]
except IOError: # file does not exist
pass
return settings
For a configuration file it's a good idea to use json and not python, because it makes it easy to edit for people using your scripts.
As suggested by cleros, ConfigParser module seems to be the closest thing to what I wanted (one-line statement in each file which would set up multiple variables).

PyYAML path at serialization time vs deserialization time

I am working on a game engine which includes a simple GUI development tool. The GUI tool allows a user to define various entities and components, which can then be saved in a configuration file. When the game engine runtime loads the configuration file, it can determine how to create the various entities and components for use in the game.
For a configuration file saving mechanism, I am using PyYAML. The issue that I am having stems from the fact that the serialization process occurs in a module which is in a different directory than the module which loads and parses the file through PyYAML.
Simplified Serializer
import yaml
def save_config(context, file_name):
config_file = file(file_name, 'w')
# do some various processing on the context dict object
yaml.dump(context, config_file)
config_file.close()
This takes the context object, which is a dict that represents various game objects, and writes it to a config file. This works without issue.
Simplified Deserializer in engine
import yaml
def load(file_name):
config_file = open(file_name, 'r')
context = yaml.load(config_file)
return context
This is where the problem occurs. On yaml.load(config_file), I will receive an error, because it fails to find a various name on a certain module. I understand why this is happening. For example, when I serialize the config file, it will list an AssetComponent (a component type in the engine) as being at engine.common.AssetComponent. However, from the deserializer's perspective, the AssetComponent should just be at common.AssetComponent (because the deserialization code itself exists within the engine package), so it fails to find it under engine.
Is there a way to manually handle paths when serializing or deserializing with PyYAML? I would like to make sure they both happen from the same "perspective."
Edit:
The following shows what a problematic config file might look like, followed by what the manually corrected config would look like
Problematic
!!python/object/apply:collections.defaultdict
args: [!!python/name:__builtin__.dict '']
dictitems:
assets:
- !!python/object:common.Component
component: !!python/object:engine.common.AssetComponent {file_name: ../content/sticksheet.png,
surface: null}
text: ../content/sticksheet.png
type_name: AssetComponent
Corrected
!!python/object/apply:collections.defaultdict
args: [!!python/name:__builtin__.dict '']
dictitems:
assets:
- !!python/object:tools.common.Component
component: !!python/object:common.AssetComponent {file_name: ../content/sticksheet.png,
surface: null}
text: ../content/sticksheet.png
type_name: AssetComponent
Your problem lies in a mismatch between your package structure and your __main__ routines. The module containing your __main__ will be inside a package, but it has no way of knowing that. So, therefore, you will use imports relative to the location of the file containing __main__ and not relative to the top level structure of your package.
See Relative imports for the billionth time for a longer (and probably better) explanation.
So, how can you fix it?
Inside the file containing __main__ you do:
from tools.common import Component
# instead of from common import Component
c = Component()
print yaml.dump(c)
Another thing you must ensure is that python will know how to load your modules. If you have installed your package this will be done automatically, but during development this is usually not the case. So during development you will also want to make your development modules findable.
The easiest way (but not very clean) is to use sys.path.append('the directory containing tools and engine'). Another way (cleaner) is to set the PYTHONPATH environment variable to include your top level directory containing tools and engine.
You can explicitly declare the Python type of an object in a YAML document:
!!python/object:module_foo.ClassFoo {
attr_foo: "spam",
…,
}

Plugin design question

My program is broken down into two parts: the engine, which deals with user interface and other "main program" stuff, and a set of plugins, which provide methods to deal with specific input.
Each plugin is written in its own module, and provides a function that will allow me to send and retrieve data to and from the plugin.
The name of this function is the same across all plugins, so all I need is to determine which one to call and then the plugin will handle the rest.
I've placed all of the plugins in a sub-folder, wrote an __ init__.py that imports each plugin, and then I import the folder (I think it's called a package?)
Anyways currently I explicitly tell it what to import (which is basically "import this", "import that"). Is there a way for me to write it so that it will import everything in that folder that is a plug-in so that I can add additional plugins without having to edit the init file?
Here is the code I use to do this:
def _loadPackagePlugins(package):
"Load plugins from a specified package."
ppath = package.__path__
pname = package.__name__ + "."
for importer, modname, ispkg in pkgutil.iter_modules(ppath, pname):
module = __import__(modname, fromlist = "dummy")
The main difference from Jakob's answer is that it uses pkgutil.iter_modules instead of os.listdir. I used to use os.listdir and changed to doing it this way, but I don't remember why. It might have been that os.listdir failed when I packaged my app with py2exe and py2app.
You could always have a dict called plugins, use __import__ to import the modules and store them that way.
e.g.
plugins = {}
for plugin in os.listdir('plugins'):
plugin = plugin.split()[0]
plugins[plugin] = __import__(plugin)
This is assuming that every plugin is a single file. Personally I would go with something that looks in each folder for a __run__.py file, like a __init__.py in a package it would indicate a plugin, that code would look more like something like this
for root, dirs, files in os.walk('.'):
for dir in dirs:
if "__run__.py" in os.listdir(os.path.join(root, dir)):
plugins[dir] = __import__(dir)
Code written without testing. YMMV

Categories