PyYAML path at serialization time vs deserialization time - python

I am working on a game engine which includes a simple GUI development tool. The GUI tool allows a user to define various entities and components, which can then be saved in a configuration file. When the game engine runtime loads the configuration file, it can determine how to create the various entities and components for use in the game.
For a configuration file saving mechanism, I am using PyYAML. The issue that I am having stems from the fact that the serialization process occurs in a module which is in a different directory than the module which loads and parses the file through PyYAML.
Simplified Serializer
import yaml
def save_config(context, file_name):
config_file = file(file_name, 'w')
# do some various processing on the context dict object
yaml.dump(context, config_file)
config_file.close()
This takes the context object, which is a dict that represents various game objects, and writes it to a config file. This works without issue.
Simplified Deserializer in engine
import yaml
def load(file_name):
config_file = open(file_name, 'r')
context = yaml.load(config_file)
return context
This is where the problem occurs. On yaml.load(config_file), I will receive an error, because it fails to find a various name on a certain module. I understand why this is happening. For example, when I serialize the config file, it will list an AssetComponent (a component type in the engine) as being at engine.common.AssetComponent. However, from the deserializer's perspective, the AssetComponent should just be at common.AssetComponent (because the deserialization code itself exists within the engine package), so it fails to find it under engine.
Is there a way to manually handle paths when serializing or deserializing with PyYAML? I would like to make sure they both happen from the same "perspective."
Edit:
The following shows what a problematic config file might look like, followed by what the manually corrected config would look like
Problematic
!!python/object/apply:collections.defaultdict
args: [!!python/name:__builtin__.dict '']
dictitems:
assets:
- !!python/object:common.Component
component: !!python/object:engine.common.AssetComponent {file_name: ../content/sticksheet.png,
surface: null}
text: ../content/sticksheet.png
type_name: AssetComponent
Corrected
!!python/object/apply:collections.defaultdict
args: [!!python/name:__builtin__.dict '']
dictitems:
assets:
- !!python/object:tools.common.Component
component: !!python/object:common.AssetComponent {file_name: ../content/sticksheet.png,
surface: null}
text: ../content/sticksheet.png
type_name: AssetComponent

Your problem lies in a mismatch between your package structure and your __main__ routines. The module containing your __main__ will be inside a package, but it has no way of knowing that. So, therefore, you will use imports relative to the location of the file containing __main__ and not relative to the top level structure of your package.
See Relative imports for the billionth time for a longer (and probably better) explanation.
So, how can you fix it?
Inside the file containing __main__ you do:
from tools.common import Component
# instead of from common import Component
c = Component()
print yaml.dump(c)
Another thing you must ensure is that python will know how to load your modules. If you have installed your package this will be done automatically, but during development this is usually not the case. So during development you will also want to make your development modules findable.
The easiest way (but not very clean) is to use sys.path.append('the directory containing tools and engine'). Another way (cleaner) is to set the PYTHONPATH environment variable to include your top level directory containing tools and engine.

You can explicitly declare the Python type of an object in a YAML document:
!!python/object:module_foo.ClassFoo {
attr_foo: "spam",
…,
}

Related

Sphinx Documentation for Python project: Excluding module names from the functions in the index

I am trying to build sphinx documentation for a set of utility functions that are run out of an ipython notebook as a pip package.
The main thing I want out of this is a searchable. alphabetized index of the function names that somone can click on and look at their docstrings. I am able to generate this with the standard sphinx html setup but when I go to the index page on my documentation the functions look like this
function1() (in module support_utils.users.utils)
function2() (in module support_utils.documents.utils)
function3() (in module support_utils.documents.utils)
function4() (in module support_utils.reports.utils)
function5() (in module support_utils.formatting.utils)
The module that they are part of is really not relevant here as this is being imported into an Ipython notebook, and in some of the themes this extra info practically breaks the CSS and makes it unreadable. I'd like to exclude everything except the function names from the index. I've been scouring the sphinx documentation site for some sort of option that I can put in the configuration to do this.
Right now I have this awful hack that I put in the conf.py file
def replace_module(app, exception):
import re
with open(os.path.join(DOC_DIRECTORY, 'genindex.html'), 'r') as f:
data = f.read()
new_content = re.sub('\(in module.*\)', '', data)
with open(os.path.join(DOC_DIRECTORY, 'genindex.html'), 'w') as f:
f.write(new_content)
def setup(app):
app.connect('build-finished', replace_module)
While this technically works it's a terrible practice as it's running a script in the conf file, and relies on a hardcoded doc directory. If I make this it's own extension it no longer has access hte DOC_DIRECTORY variable so I'd have a hidden hardcoded directory which is even worse. I know that Sphinx supports custom extension development and there is a way to probably run this regex on individual nodes of the documents before it finishes building, so I do not need to supply the output directory, but I'm wondering if there is some prebuilt configuration option in Sphinx to not show the module names in the Index so I do not have to go through the trouble of developing this extension. This is an internal document

Python Packaging: user specified file path

I am writing a python package which is dependent on a number of large data files. These files are not included with the package. Instead, users are required to have these files on disk already (with an arbitrary path). What is the best way to make my package aware of the location of these files?
I have been reading about setup.py and setup.cfg but I am still not sure how to to this. It seems to me that a user-editable option in setup.cfg would be a good way to go but I don't know if it is, if it can be done at all, or how I would do it if so...
I did see this almost identical question, Python Packaging: Ask user for variable values when (pip) installing, which focuses on user input during pip (which is discouraged in the comments). If that actually would be a good solution, I'm interested in how to do that too.
In my private development of the package, I have used module constants, as in
DEFAULT_PATH_FILE1 = "my/path/to/file1.csv"
DEFAULT_PATH_FILE2 = "my/path/to/file2.csv"
etc. and properties initialized with these constants. This doesn't seem viable at all for distribution.
What you want is not a one-time setup during install (which is also impossible with modern .whl installs), but a way for clients to configure your library at any point during runtime. Given that you don't provide a cli, you can either use environment variables as an option to provide that , or look for a user-defined config file.
This here is a simple recipe using appdirs to find out where the config file should be found. It loads on import of your package, and tells clients how bad it is if the config file isn't there. Usually, that'd be:
write a log message
use default settings
throw some kind of exception
a combination of the above
from logging import getLogger
from pathlib import Path
from configparser import ConfigParser
# loads .ini format files easily, just to have an example to go with
import appdirs # needs to be pip-installed
log = getLogger(__name__)
config = ConfigParser(interpolation=None)
# load config, substitute "my_package" with the actual name of your package
config_path = Path(appdirs.user_config_dir("my_package")) / "user.ini"
try:
with open(config_path) as f:
config.read_file(f, source="user")
except FileNotFoundError:
# only do whatever makes sense
log.info(f"User config expected at '{config_path}', but not found.")
config.read_string("[pathes]\nfile_foo=foo\nfile_bar=bar") # dubious
raise ImportError(f"Can't use this module; create a config at '{config_path}'.")
class Foo:
def __init__(self):
with open(cfg["pathes"]["file_foo"]) as f:
self.data = f.read()
This sounds like runtime configuration. It's none of the business of setup.py, which is concerned with installing your package.
For app configuration, it would be common to specify this resource location by command-line argument, environment variable, or configuration file. You will usually want to either hard-code some sensible default path in case the user does not specify any configuration, or raise an exception in case of resources not existing / not found.
Example for environment var:
import os
DEFAULT_PATH_FILE1 = "/default/path/to/file1.csv"
PATH_FILE1 = os.environ.get("PATH_FILE1", DEFAULT_PATH_FILE1)

Load module to invoke its decorators

I have a program consistring of several modules specifying the respective web application handlers and one, specifying the respective router.
The library I use can be found here.
Excerpt from webapp.service (there are more such modules):
from webapp.router import ROUTER
#ROUTER.route('/service/[id:int]')
class ServicePermissions(AuthenticatedService):
"""Handles service permissions."""
NODE = 'services'
NAME = 'services manager'
DESCRIPTION = 'Manages services permissions'
PROMOTE = False
webapp.router:
ROUTER = Router()
When I import the webapp.router module, the webapp.service module does obviously not run. Hence, the #ROUTER.route('/service/[id:int]') decorator is not run and my web aplication will fail with the message, that the respective route is not available.
What is the best practice in that case to run the code in webapp.service to "run" the decorators? I do not really need to import the module itself or any of its members.
As stated in the comments fot the question,
you simply have to import the modules. As for linter complaints, those are the lesser of your problems. Linters are there to help - if they get into the way, just don't listen to them.
So, the simple way just to get your things working is, at the end of your __main__.py or __init__.py, depending on your app structure, to import explicitly all the modules that make use of the view decorator.
If you have a linter, check how to silence it on the import lines - that is usually accomplished with a special comment on the import line.
Python's introspection is fantastic, but it can't find instances of a class, or subclasses, if those are defined in modules that are not imported: such a module is just a text file sitting on the disk, like any data file.
What some frameworks offer as an approach is to have a "discovery" utility that will silently import all "py" files in the project folders. That way your views can "come into existence" without explicit imports.
You could use a function like:
import os
def discover(caller_file):
caller_folder = os.path.dirname(caller_file)
for current, folders, files in os.walk(caller_folder):
if current == "__pycache__":
continue
for file in files:
if file.endswith(".py"):
__import__(os.path.join(current, file))
And call it on your main module with discover(__file__)

Configuration variables for a collection of scripts in Python

I have a collection of scripts written in Python. Each of them can be executed independently. However, most of the time they should be executed one after the other, so there is a MainScript.py which calls them in the appropriate order. Each script has some configurable variables (let's call them Root_Dir, Data_Dir and LinWinFlag). If this collection of scripts is moved to a different computer, or different data needs to be processed, these variable values need to be changed. As there are many scripts this duplication is annoying and error-prone. I would like to group all configuration variables into a single file.
I tried making Config.py which would contain them as per this thread, but import Config produces ImportError: No module named Config because they are not part of a package.
Then I tried relying on variable inheritance: define them once in MainScript.py which calls all the others. This works, but I realized that each script would not be able to run on its own. To solve this, I tried adding useGlobal=True in MainScript.py and in other files:
if (useGlobal is None or useGlobal==False):
# define all variables
But this fails when scripts are run standalone: NameError: name 'useGlobal' is not defined. The workaround is to define useGlobal and set it to False when running the scripts independently of MainScript.py. It there a more elegant solution?
The idea is that python wants to access files - including the Config.py - primarily as part of a module.
The nice thing is that Python makes building modules (i.e. python packages) really easy - initializing it can be done by creating a
__init__.py
file in each directory you want as a module, a submodule, a subsubmodule, and so on.
So your import should go through if you have created this file.
If you have further questions, look at the excellent python documentation.
The best way to do this is to use a configuration file placed in your home directory (~/.config/yourscript/config.json).
You can then load the file on start and provide default values if the file does not exist :
Example (config.py) :
import json
default_config = {
"name": "volnt",
"mail": "oh#hi.com"
}
def load_settings():
settings = default_config
try:
with open("~/.config/yourscript/config.json", "r") as config_file:
loaded_config = json.loads(config_file.read())
for key in loaded_config:
settings[key] = loaded_config[key]
except IOError: # file does not exist
pass
return settings
For a configuration file it's a good idea to use json and not python, because it makes it easy to edit for people using your scripts.
As suggested by cleros, ConfigParser module seems to be the closest thing to what I wanted (one-line statement in each file which would set up multiple variables).

Plugin design question

My program is broken down into two parts: the engine, which deals with user interface and other "main program" stuff, and a set of plugins, which provide methods to deal with specific input.
Each plugin is written in its own module, and provides a function that will allow me to send and retrieve data to and from the plugin.
The name of this function is the same across all plugins, so all I need is to determine which one to call and then the plugin will handle the rest.
I've placed all of the plugins in a sub-folder, wrote an __ init__.py that imports each plugin, and then I import the folder (I think it's called a package?)
Anyways currently I explicitly tell it what to import (which is basically "import this", "import that"). Is there a way for me to write it so that it will import everything in that folder that is a plug-in so that I can add additional plugins without having to edit the init file?
Here is the code I use to do this:
def _loadPackagePlugins(package):
"Load plugins from a specified package."
ppath = package.__path__
pname = package.__name__ + "."
for importer, modname, ispkg in pkgutil.iter_modules(ppath, pname):
module = __import__(modname, fromlist = "dummy")
The main difference from Jakob's answer is that it uses pkgutil.iter_modules instead of os.listdir. I used to use os.listdir and changed to doing it this way, but I don't remember why. It might have been that os.listdir failed when I packaged my app with py2exe and py2app.
You could always have a dict called plugins, use __import__ to import the modules and store them that way.
e.g.
plugins = {}
for plugin in os.listdir('plugins'):
plugin = plugin.split()[0]
plugins[plugin] = __import__(plugin)
This is assuming that every plugin is a single file. Personally I would go with something that looks in each folder for a __run__.py file, like a __init__.py in a package it would indicate a plugin, that code would look more like something like this
for root, dirs, files in os.walk('.'):
for dir in dirs:
if "__run__.py" in os.listdir(os.path.join(root, dir)):
plugins[dir] = __import__(dir)
Code written without testing. YMMV

Categories