Declare python module in yaml - python

I have a yaml file which has some fields with values that are understandable in python, but they get parsed as string values, not that python type I meant. This is my sample:
verbose:
level: logging.DEBUG
and obviously when I load it, the value is string type
config = yaml.load(args.config.read(), Loader=yaml.SafeLoader)
I have no idea how to get exactly logging.DEBUG object, not its string.
Note that I don't look for configuring logging to get logger thing. This logging is just a sample of python module.

There's no out of the box way for that. The simplest and safest way seems to be processing the values manually, e.g:
import logging
class KnownModules:
logging = logging
...
def parse_value(s):
v = KnownModules
for p in s.split('.'):
v = getattr(v, p) # remember to handle AttributeError
return v
However, if you're ok with slightly changing your YAML structure, PyYAML supports some custom YAML tags. For example:
verbose:
level: !!python/name:logging.DEBUG
will make config['verbose']['level'] equal to logging.DEBUG (i.e. 10).
Considering that you're (correctly) using SafeLoader, you may need to combine those methods by defining your own tag.

The YAML loader has no knowledge of what logging.DEBUG might mean except a string "logging.DEBUG" (unless it's tagged with a YAML tag).
For string values that need to be interpreted as e.g. references to module attributes, you will need to parse them after-the-fact, e.g.
def parse_logging_level(level_string: str):
module, _, value = level_string.partition(".")
assert module == "logging"
return logging._nameToLevel[value]
# ...
yaml_data["verbose"]["level"] = parse_logging_level(yaml_data["verbose"]["level"])

Edit: Please see AKX answer. I was not aware of logging._nameToLevel which does not require defining your own enum and is definitely better than using evel. But, I decided to not delete this answer as I think the current preferred design (as of python 3.4) which uses enums is worth mentioning (it would probably be used in the logging module if it was available back then).
If you are absolutely sure that the values provided in the config are legitimate ones, you can use eval like this:
import logging
levelStr = 'logging.DEBUG'
level = eval(levelStr)
But as said in the comments, if you are not sure about the values present in the config file, using eval could be disasterous (see the example provided by AKX in the comments).
A better design is to define an enum for this purpose. Unfortunately the logging module does not provide the levels as enum (they are just constants defined in the module), thus you should define your own.
from enum import Enum
class LogLevel(Enum):
CRITICAL = 50
FATAL = 50
ERROR = 40
WARNING = 30
WARN = 30
INFO = 20
DEBUG = 10
NOTSET = 0
and then you can use it like this:
levelStr = 'DEBUG'
levelInt = LogLevel[levelStr].value # Comparable with logging.DEBUG which is also an integer
But to use this you have to change your yml file a bit and replace logging.DEBUG with DEBUG.

Related

Python disabled logging slowing script

I am using the built in Python "logging" module for my script. When I turn verbosity to "info" it seems like my "debug" messages are significantly slowing down my script.
Some of my "debug" messages print large dictionaries and I'm guessing Python is expanding the text before realizing "debug" messages are disabled. Example:
import pprint
pp = pprint.PrettyPrinter(indent=4)
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
How can I improve my performance? I'd prefer to still use Python's built in logging module. But need to figure out a "clean" way to solve this issue.
There is already a feature of logging for the feature mentioned by dankal444, which is slightly neater:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
Another possible approach is to use %-formatting, which only does the formatting when actually needed (the logging event has to be processed by a handler as well as a logger to get to that point). I know f-strings are the new(ish) hotness and are performant, but it all depends on the exact circumstances as to which will offer the best result.
An example of taking advantage of lazy %-formatting:
class DeferredFormatHelper:
def __init__(self, func, *args, *kwargs):
self.func = func # assumed to return a string
self.args = args
self.kwargs = kwargs
def __str__(self):
# This is called because the format string contains
# a %s for this logging argument, lazily if and when
# the formatting needs to happen
return self.func(*self.args, **self.kwargs)
if logger.isEnabledFor(logging.DEBUG):
arg = DeferredFormatHelper(pp.pformat, obj)
logger.debug('Large Dict Object: %s', arg)
Check if the current level is good enough:
if logger.getEffectiveLevel() <= logging.DEBUG:
logger.debug(f"Large Dict Object: {pp.pformact(obj)}")
This is not super clean but best that I can think of. You just need to encompass with this if performance bottlenecks
I can't verify where your bottleneck is, but if it's because of the pprint library, your logger will never have a chance to do anything about it. Rewriting to clarify.
from pprint import PrettyPrinter
import logging
logger = logging.getLogger()
large_object = {"very": "large container"}
pp = PrettyPrinter(indent=4)
# This is done first.
formatted_msg = pp.pformat(large_object)
# It's already formatted when it's sent to your logger.
logger.debug(f"Large dict object: {formatted_msg}")

Is it bad practice to modify attributes of one module from another module?

I want to define a bunch of config variables that can be imported in all the modules in my project. The values of those variables will be constant during runtime but are not known before runtime; they depend on the input. Usually I'd define a dict in my top module which would be passed to all functions and classes from other modules; however, I was thinking it may be cleaner to simply create a blank config.py module which would be dynamically filled with config variables by the top module:
# top.py
import config
config.x = x
# config.py
x = None
# other.py
import config
print(config.x)
I like this approach because I don't have to save the parameters as attributes of classes in my other modules; which makes sense to me because parameters do not describe classes themselves.
This works but is it considered bad practice?
The question as such may be disputed. But I would generally say yes, it's "bad practice" because scope and impact of change is really getting blurred. Note the use case you're describing really is not about sharing configuration, but about different parts of the program functions, objects, modules exchanging data and as such it's a bit of a variation on (meta)global variable).
Reading common configuration values could be fine, but changing them along the way... you may lose track of what happened where and also in which order as modules get imported / values get modified. For instance assume the config.py and two modules m1.py:
import config
print(config.x)
config.x=1
and m2.py:
import config
print(config.x)
config.x=2
and a main.py that just does:
import m1
import m2
import config
print(config.x)
or:
import m2
import m1
import config
print(config.x)
The state in which you find config in each module and really any other (incl. main.py here) depends on order in which imports have occurred and who assigned what value when. Even for a program entirely under your control, this may get confusing (and source of mistakes) rather quickly.
For runtime data and passing information between objects and modules (and your example is really that and not configuration that is predefined and shared between modules) I would suggest you look into describing the information perhaps in a custom state (config) object and pass it around through appropriate interface. But really just a function / method argument may be all that is needed. The exact form depends on what exactly you're trying to achieve and what your overall design is.
In your example, other.py behaves differently when called or imported before top.py which may still seem obvious and manageable in a minimal example, but really is not a very sound design. Anyone reading the code (incl. future you) should be able to follow its logic and this IMO breaks its flow.
The most trivial (and procedural) example of what for what you've described and now I hopefully have a better grasp of would be other.py recreating your current behavior:
def do_stuff(value):
print(value) # We did something useful here
if __name__ == "__main__":
do_stuff(None) # Could also use config with defaults
And your top.py presumably being the entry point and orchestrating importing and execution doing:
import other
x = get_the_value()
other.do_stuff(x)
You can of course introduce an interface to configure do_stuff perhaps a dict or a custom class even with default implementation in config.py:
class Params:
def __init__(self, x=None):
self.x = x
and your other.py:
def do_stuff(params=config.Params()):
print(params.x) # We did something useful here
And on your top.py you can use:
params = config.Params(get_the_value())
other.do_stuff(params)
But you could also have any use case specific source of value(s):
class TopParams:
def __init__(self, url):
self.x = get_value_from_url(url)
params = TopParams("https://example.com/value-source")
other.do_stuff(params)
x could even be a property which you retrieve every time you access it... or lazily when needed and then cached... Again, it really then is a matter of what you need to do.
"Is it bad practice to modify attributes of one module from another module?"
that it is considered as bad practice - violation of the law of demeter, which means in fact "talk to friends, not to strangers".
Objects should expose behaviour and functions, but should HIDE the data.
DataStructures should EXPOSE data, but should not have any methods (which are exposed). The law of demeter does not apply to such DataStructures. OOP Purists might cover such DataStructures with setters and getters, but it really adds no value in Python.
there is a lot of literature about that like : https://en.wikipedia.org/wiki/Law_of_Demeter
and of course, a must to read: "Clean Code", by Robert C. Martin (Uncle Bob), check it out on Youtube also.
For procedural programming it is perfectly normal to keep data in a DataStructure which does not have any (exposed) methods.
The procedures in the program work with that data. Consider to use the module attrs, see : https://www.attrs.org/en/stable/ for easy creation of such classes.
my prefered method for keeping config is (here without using attrs):
# conf_xy.py
"""
config is code - so why use damned parsers, textfiles, xml, yaml, toml and all that
if You just can use testable code as config that can deliver the correct types, etc.
as well as hinting in Your favorite IDE ?
Here, for demonstration without using attrs package - usually I use attrs (read the docs)
"""
class ConfXY(object):
def __init__(self) -> None:
self.x: int = 1
self.z: float = get_z_from_input()
...
conf_xy=ConfXY()
# other.py
from conf_xy import conf_xy
...
y = conf_xy.x * 2
...

How to declare names of variables in module?

Motivation
I have following motivational problem - I want use a slightly boosted logging in my project. For that purpose, I am creating module my_logging with similar usage as logging. Most importantly, my_logging needs to have methods debug, info, etc.
Question
Suppose you have a bunch of methods method_1, method_2, .. you want to have in a module module (e.g. debug, info, ... in my_logging) and you know that implementation of these methods will be fairly simmilar.
What is the cleanest way to implement this?
Possible solutions
Solution 1
Define each method separetely, using parametrized method. You are able to do this, since you know all the methods in advance.
def _parametrized_general_method(params):
...
def method_1():
_parametrized_gereral_method(method_1_params)
def method_2():
_parametrized_gereral_method(method_2_params)
...
Discussion
Obviously, this is too lenghty and there is too much of repeated code. It is tedious, if there is too much of these methods.
On the other hand, methods are declared at 'compile time' and it works well with typing and so on.
The tedium can be much avoid by generating the code.
METHODS = [
('method_1', method_1_params),
('method_2', method_2_params),
...
]
method_template = '''
def {0}():
_parametrized_gereral_method({1})
'''
with open('module.py', 'w') as f:
# Probably some module header here
for name, params in METHODS:
print(
method_template.format(name, params)
file=f
)
# Maybe some footer
But this forces you to care about python file, which does not directly belong to your project. You need to have the genarator file in case you want to do some changes in module and run the file. This does not belong (in my opinion) to standard developing cycle and therefore it is not very nice, although very effective.
Also, for sake of completeness, instead of _parametrized_gereral_method you can have something as method_factory in following snippet.
def method_factory(method_params):
def _parametrized_general_method(params):
...
return _parametrized_general_method(method_params)
method_1 = method_factory(method_1_params)
method_2 = method_factory(method_2_params)
Solution 2
More cleaner way from my point of view it to create these methods at runtime.
METHODS = [
('method_1', method_1_params),
('method_2', method_2_params),
...
]
for name, params in METHODS:
globals()[name] = method_factory(params)
Discussion
I consider this to be very elegent way and from purely Pythonic 'dynamic' view as Ok.
Problem arrives with IDEs and their help in form of reference resolution and typing at 'compile time'.
import module
module.method_1()
If you use module from another module, the methods are not found, of course and a warning appears (at 'compile time', it is not actual error). In PyCharm
Cannot find reference 'method' in 'module.py'
Obviously, you don't want to globally supress these warnings, as they are usually very helpful. Moreover, such warnings are one of the reasons why to use IDE.
Also, you can supress it for a particular line (as any warning in PyCharm), but that is way to much pollution in the code.
Or you can ignore the warnings which is very, very bad habit in general.
Solution 3 - maybe?
In Python module, you are able to specify what names the partical module exports / provides by attribute __all__.
In ideal world something like this works.
METHODS = (
('method_1', method_1_params),
('method_2', method_2_params),
...
)
__all__ = [name for name, params in METHODS]
for name, params in METHODS:
globals()[name] = method_factory(params)
Discussion
See that __all__ can be evaluated at 'compile time', as METHODS is not used anywhere before assingment of __all__.
My idea is that this will properly notify other modules about names of not-yet-created methods and no warning will appear while the nice dynamic creation of functions is preserved.
Problem is that it does not work as planned because apparently Python cannot regoznize such all attribute and reference warnings in importing modules are still present.
More specific question
Is there a way how to make this approach with __all__ work? Or is there something in similar fashion?
Thank you.
For IDE-level solution - Refer PyCharm document.
Someone extracted list of available suppressions, found one there:
import module
# noinspection PyUnresolvedReferences
module.a()
module.b()
Will only disable inspection one line.
Alternatively, if all you wanted is not writing multiple similar functions by yourself - you could just make python do it for you:
from os import path
function_template = '''
def {0}():
print({0})
'''
with open(path.abspath(__file__), 'a') as fp:
for name, arg in zip('abcde', range(5)):
fp.write(function_template.format(name, arg))
This will extend current script with generated functions.
Elif you just want to wrap logging functions with least typing effort, try closure.
import logging
def make_function(name: str = None):
logger = logging.getLogger(name)
logging.basicConfig(format="%(asctime)s | %(name)s | %(levelname)s - %(message)s")
def wrapper(log_level):
level_func = getattr(logger, log_level)
def alternative_logging(msg, *args):
nonlocal level_func
level_func(msg)
# add some actions here, maybe with args
return alternative_logging
return map(wrapper, ('debug', 'info', 'warning', 'error', 'critical'))
debug, info, warning, error, critical = make_function('Nice name')
debug2, info2, warning2, error2, critical2 = make_function('Bad name')
warning('oh no')
warning2('what is it')
error('hold on')
error2('are ya ok')
critical('ded')
critical2('not big surprise')
Output:
2020-09-06 12:06:59,742 | Nice name | WARNING - oh no
2020-09-06 12:06:59,742 | Bad name | WARNING - what is it
2020-09-06 12:06:59,742 | Nice name | ERROR - hold on
2020-09-06 12:06:59,742 | Bad name | ERROR - are ya ok
2020-09-06 12:06:59,742 | Nice name | CRITICAL - ded
2020-09-06 12:06:59,742 | Bad name | CRITICAL - not big surprise

Use Python for Creating JSON

I want to use Python for creating JSON.
Since I found no library which can help me, I want to know if it's possible to inspect the order of the classes in a Python file?
Example
# example.py
class Foo:
pass
class Bar:
pass
If I import example, I want to know the order of the classes. In this case it is [Foo, Bar] and not [Bar, Foo].
Is this possible? If "yes", how?
Background
I am not happy with yaml/json. I have the vague idea to create config via Python classes (only classes, not instantiation to objects).
Answers which help me to get to my goal (Create JSON with a tool which is easy and fun to use) are welcome.
The inspect module can tell the line numbers of the class declarations:
import inspect
def get_classes(module):
for name, value in inspect.getmembers(module):
if inspect.isclass(value):
_, line = inspect.getsourcelines(value)
yield line, name
So the following code:
import example
for line, name in sorted(get_classes(example)):
print line, name
Prints:
2 Foo
5 Bar
First up, as I see it, there are 2 things you can do...
Continue pursuing to use Python source files as configuration files. (I won't recommend this. It's analogous to using a bulldozer to strike a nail or converting a shotgun to a wheel)
Switch to something like TOML, JSON or YAML for configuration files, which are designed for the job.
Nothing in JSON or YAML prevents them from holding "ordered" key-value pairs. Python's dict data type is unordered by default (at least till 3.5) and list data type is ordered. These map directly to object and array in JSON respectively, when using the default loaders. Just use something like Python's OrderedDict when deserializing them and voila, you preserve order!
With that out of the way, if you really want to use Python source files for the configuration, I suggest trying to process the file using the ast module. Abstract Syntax Trees are a powerful tool for syntax level analysis.
I whipped a quick script for extracting class line numbers and names from a file.
You (or anyone really) can use it or extend it to be more extensive and have more checks if you want for whatever you want.
import sys
import ast
import json
class ClassNodeVisitor(ast.NodeVisitor):
def __init__(self):
super(ClassNodeVisitor, self).__init__()
self.class_defs = []
def visit(self, node):
super(ClassNodeVisitor, self).visit(node)
return self.class_defs
def visit_ClassDef(self, node):
self.class_defs.append(node)
def read_file(fpath):
with open(fpath) as f:
return f.read()
def get_classes_from_text(text):
try:
tree = ast.parse(text)
except Exception as e:
raise e
class_extractor = ClassNodeVisitor()
li = []
for definition in class_extractor.visit(tree):
li.append([definition.lineno, definition.name])
return li
def main():
fpath = "/tmp/input_file.py"
try:
text = read_file(fpath)
except Exception as e:
print("Could not load file due to " + repr(e))
return 1
print(json.dumps(get_classes_from_text(text), indent=4))
if __name__ == '__main__':
sys.exit(main())
Here's a sample run on the following file:
input_file.py:
class Foo:
pass
class Bar:
pass
Output:
$ py_to_json.py input_file.py
[
[
1,
"Foo"
],
[
5,
"Bar"
]
]
If I import example,
If you're going to import the module, the example module to be on the import path. Importing means executing any Python code in the example module. This is a pretty big security hole - you're loading a user-editable file in the same context as the rest of the application.
I'm assuming that since you care about preserving class-definition order, you also care about preserving the order of definitions within each class.
It is worth pointing out that is now the default behavior in python, since python3.6.
Aslo see PEP 520: Preserving Class Attribute Definition Order.
(Moving my comments to an answer)
That's a great vague idea. You should give Figura a shot! It does exactly that.
(Full disclosure: I'm the author of Figura.)
I should point out the order of declarations is not preserved in Figura, and also not in json.
I'm not sure about order-preservation in YAML, but I did find this on wikipedia:
... according to the specification, mapping keys do not have an order
It might be the case that specific YAML parsers maintain the order, though they aren't required to.
You can use a metaclass to record each class's creation time, and later, sort the classes by it.
This works in python2:
class CreationTimeMetaClass(type):
creation_index = 0
def __new__(cls, clsname, bases, dct):
dct['__creation_index__'] = cls.creation_index
cls.creation_index += 1
return type.__new__(cls, clsname, bases, dct)
__metaclass__ = CreationTimeMetaClass
class Foo: pass
class Bar: pass
classes = [ cls for cls in globals().values() if hasattr(cls, '__creation_index__') ]
print(sorted(classes, key = lambda cls: cls.__creation_index__))
The standard json module is easy to use and works well for reading and writing JSON config files.
Objects are not ordered within JSON structures but lists/arrays are, so put order dependent information into a list.
I have used classes as a configuration tool, the thing I did was to derive them from a base class which was customised by the particular class variables. By using the class like this I did not need a factory class. For example:
from .artifact import Application
class TempLogger(Application): partno='03459'; path='c:/apps/templog.exe'; flag=True
class GUIDisplay(Application): partno='03821'; path='c:/apps/displayer.exe'; flag=False
in the installation script
from .install import Installer
import app_configs
installer = Installer(apps=(TempLogger(), GUIDisplay()))
installer.baseline('1.4.3.3475')
print installer.versions()
print installer.bill_of_materials()
One should use the right tools for the job, so perhaps python classes are not the right tool if you need ordering.
Another python tool I have used to create JSON files is Mako templating system. This is very powerful. We used it to populate variables like IP addresses etc into static JSON files that were then read by C++ programs.
I'm not sure if this is answers your question, but it might be relevant. Take a look at the excellent attrs module. It's great for creating classes to use as data types.
Here's an example from glyph's blog (creator of Twisted Python):
import attr
#attr.s
class Point3D(object):
x = attr.ib()
y = attr.ib()
z = attr.ib()
It saves you writing a lot of boilerplate code - you get things like str representation and comparison for free, and the module has a convenient asdict function which you can pass to the json library:
>>> p = Point3D(1, 2, 3)
>>> str(p)
'Point3D(x=1, y=2, z=3)'
>>> p == Point3D(1, 2, 3)
True
>>> json.dumps(attr.asdict(p))
'{"y": 2, "x": 1, "z": 3}'
The module uses a strange naming convention, but read attr.s as "attrs" and attr.ib as "attrib" and you'll be okay.
Just touching the point about creating JSON from python. there is an excellent library called jsonpickle which lets you dump python objects to json. (and using this alone or with other methods mentioned here you can probably get what you wanted)

Avoid `logger=logging.getLogger(__name__)` without loosing way to filter logs

I am lazy and want to avoid this line in every python file which uses logging:
logger = logging.getLogger(__name__)
In january I asked how this could be done, and found an answer: Avoid `logger=logging.getLogger(__name__)`
Unfortunately the answer there has the drawback, that you loose the ability to filter.
I really want to avoid this useless and redundant line.
Example:
import logging
def my_method(foo):
logging.info()
Unfortunately I think it is impossible do logger = logging.getLogger(__name__) implicitly if logging.info() gets called for the first time in this file.
Is there anybody out there who knows how to do impossible stuff?
Update
I like Don't Repeat Yourself. If most files contain the same line at the top, I think this is a repetition. It looks like WET. The python interpreter in my head needs to skip this line every time I look there. My subjective feeling: this line is useless bloat. The line should be the implicit default.
Think well if you really want to do this.
Create a Module e.g. magiclog.py like this:
import logging
import inspect
def L():
# FIXME: catch indexing errors
callerframe = inspect.stack()[1][0]
name = callerframe.f_globals["__name__"]
# avoid cyclic ref, see https://docs.python.org/2/library/inspect.html#the-interpreter-stack
del callerframe
return logging.getLogger(name)
Then you can do:
from magiclog import L
L().info("it works!")
I am lazy and want to avoid this line in every python file which uses
logging:
logger = logging.getLogger(__name__)
Well, it's the recommended way:
A good convention to use when naming loggers is to use a module-level
logger, in each module which uses logging, named as follows:
logger = logging.getLogger(__name__)
This means that logger names track the package/module hierarchy, and
it’s intuitively obvious where events are logged just from the logger
name.
That's a quote from the official howto.
I like Don't Repeat Yourself. If most files contain the same line at
the top, I think this is a repetition. It looks like WET. The python
interpreter in my head needs to skip this line every time I look
there. My subjective feeling: this line is useless bloat. The line
should be the implicit default.
It follows "Explicit is better than implicit". Anyway you can easily change a python template in many IDEs to always include this line or make a new template file.

Categories