While reading Python code, I usually see one of those two conventions:
def something(logger):
logger.info('doing something')
or:
LOGGER = logging.getLogger(__NAME__)
def something():
LOGGER.info('doing something')
Does the former have any advantages, i.e. being thread safe while the other isn't? Or is it purely a stylistic difference?
Use a global logger if you want a fixed logger:
LOGGER = logging.getLogger('stuff.do')
# logger depends on what we are
def do_stuff(operation: Callable):
LOGGER.info('will do stuff')
operation()
LOGGER.info('just did stuff')
do_stuff(add_things)
do_stuff(query_things)
This is commonly used when logging shared operations for diagnostic purposes. For example, a web server would log creating and destroying threads.
Use a logger parameter if you want to change the logger:
# logger depends on what we do
def do_stuff(operation: Callable, logger: Logger):
logger.info('will do stuff')
operation()
logger.info('just did stuff')
do_stuff(add_things, logging.getLogger('add'))
do_stuff(query_things, logging.getLogger('query'))
This is commonly used when logging configurable operations for accounting purposes. For example, a web server would log different kinds of requests and their results.
Which one to use depends solely on whether the choice of logger depends on global or local data.
If the logger choice can be decided globally, doing so avoids polluting function signatures with logger passing. This improves modularity, as you can add/remove logging calls without changing other code. When using logging to find bugs, you likely want to add logging to dubious code sections and remove it from proven ones.
If the logger choice depends on local state, passing loggers or their names around is often the only option. When using logging to document what is going on, you sometimes want to add new kinds of operation subjects later on.
There are no runtime or safety advantages to using either approach, other than avoiding the operations to pass things around. The logging module is designed to be thread-safe:
Thread Safety
The logging module is intended to be thread-safe without
any special work needing to be done by its clients. It achieves this
though using threading locks; there is one lock to serialize access to
the module’s shared data, and each handler also creates a lock to
serialize access to its underlying I/O.
It is entirely equivalent to create a new "instance" of the same logger, or to create an alias for the same logger:
>>> a = logging.getLogger('demo')
>>> b = a
>>> c = logging.getLogger('demo')
>>> a is b is c
True
Related
I am using the built in Python "logging" module for my script. When I turn verbosity to "info" it seems like my "debug" messages are significantly slowing down my script.
Some of my "debug" messages print large dictionaries and I'm guessing Python is expanding the text before realizing "debug" messages are disabled. Example:
import pprint
pp = pprint.PrettyPrinter(indent=4)
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
How can I improve my performance? I'd prefer to still use Python's built in logging module. But need to figure out a "clean" way to solve this issue.
There is already a feature of logging for the feature mentioned by dankal444, which is slightly neater:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
Another possible approach is to use %-formatting, which only does the formatting when actually needed (the logging event has to be processed by a handler as well as a logger to get to that point). I know f-strings are the new(ish) hotness and are performant, but it all depends on the exact circumstances as to which will offer the best result.
An example of taking advantage of lazy %-formatting:
class DeferredFormatHelper:
def __init__(self, func, *args, *kwargs):
self.func = func # assumed to return a string
self.args = args
self.kwargs = kwargs
def __str__(self):
# This is called because the format string contains
# a %s for this logging argument, lazily if and when
# the formatting needs to happen
return self.func(*self.args, **self.kwargs)
if logger.isEnabledFor(logging.DEBUG):
arg = DeferredFormatHelper(pp.pformat, obj)
logger.debug('Large Dict Object: %s', arg)
Check if the current level is good enough:
if logger.getEffectiveLevel() <= logging.DEBUG:
logger.debug(f"Large Dict Object: {pp.pformact(obj)}")
This is not super clean but best that I can think of. You just need to encompass with this if performance bottlenecks
I can't verify where your bottleneck is, but if it's because of the pprint library, your logger will never have a chance to do anything about it. Rewriting to clarify.
from pprint import PrettyPrinter
import logging
logger = logging.getLogger()
large_object = {"very": "large container"}
pp = PrettyPrinter(indent=4)
# This is done first.
formatted_msg = pp.pformat(large_object)
# It's already formatted when it's sent to your logger.
logger.debug(f"Large dict object: {formatted_msg}")
I am lazy and want to avoid this line in every python file which uses logging:
logger = logging.getLogger(__name__)
In january I asked how this could be done, and found an answer: Avoid `logger=logging.getLogger(__name__)`
Unfortunately the answer there has the drawback, that you loose the ability to filter.
I really want to avoid this useless and redundant line.
Example:
import logging
def my_method(foo):
logging.info()
Unfortunately I think it is impossible do logger = logging.getLogger(__name__) implicitly if logging.info() gets called for the first time in this file.
Is there anybody out there who knows how to do impossible stuff?
Update
I like Don't Repeat Yourself. If most files contain the same line at the top, I think this is a repetition. It looks like WET. The python interpreter in my head needs to skip this line every time I look there. My subjective feeling: this line is useless bloat. The line should be the implicit default.
Think well if you really want to do this.
Create a Module e.g. magiclog.py like this:
import logging
import inspect
def L():
# FIXME: catch indexing errors
callerframe = inspect.stack()[1][0]
name = callerframe.f_globals["__name__"]
# avoid cyclic ref, see https://docs.python.org/2/library/inspect.html#the-interpreter-stack
del callerframe
return logging.getLogger(name)
Then you can do:
from magiclog import L
L().info("it works!")
I am lazy and want to avoid this line in every python file which uses
logging:
logger = logging.getLogger(__name__)
Well, it's the recommended way:
A good convention to use when naming loggers is to use a module-level
logger, in each module which uses logging, named as follows:
logger = logging.getLogger(__name__)
This means that logger names track the package/module hierarchy, and
it’s intuitively obvious where events are logged just from the logger
name.
That's a quote from the official howto.
I like Don't Repeat Yourself. If most files contain the same line at
the top, I think this is a repetition. It looks like WET. The python
interpreter in my head needs to skip this line every time I look
there. My subjective feeling: this line is useless bloat. The line
should be the implicit default.
It follows "Explicit is better than implicit". Anyway you can easily change a python template in many IDEs to always include this line or make a new template file.
I tried to put getLogger in the module level. However, it has some disadvantages. Here is my example:
from logging.handlers import TimeRotatingFileHandler
log_monitor = logging.getLogger('monitorlog')
log_monitor.setLevel(logging.DEBUG)
log_monitor.propagate = False
handler = TimedRotatingFileHandler('somedirectory/monitor.log',
when='h',
interval=1,
backupCount=30)
monitor_format = logging.Formatter('%(asctime)s: %(message)s')
handler.setFormatter(monitor_format)
log_monitor.addHandler(handler)
The problem is, when some other module import this one, the above code will be executed. It is possible, at that time, the somedirectory does not even exist and the build will fail.
Actually, this logger will be used in a class, so I am thinking of putting getLogger into the class. However, I feel if people create multiple object of that class, that piece of code will be called multiple times. I guess this part of code is supposed only to be called once.
I am pretty new to python. Where do people usually put their getLogger and in this case, where should I put this piece of code?
Short answer, you just need to make sure you do your logger set up after the directory is created.
If you want to import the above and only then create the file, one way to do it is to put your code in a function.
def monitor_log_setup():
log_monitor = logging.getLogger('monitorlog')
log_monitor.setLevel(logging.DEBUG)
log_monitor.propagate = False
handler = TimedRotatingFileHandler('somedirectory/monitor.log',
when='h',
interval=1,
backupCount=30)
monitor_format = logging.Formatter('%(asctime)s: %(message)s')
handler.setFormatter(monitor_format)
log_monitor.addHandler(handler)
return log_monitor
It is now safe to import this module, you just have to make sure the function is called before you want to start logging (after creating the directory).
You can then use logging.getLogger('monitorlog') to return the same logger as defined in the function whenever you need it throughout your code.
I think the problem is that you are trying to mix up the "import" with "init", you expect the API caller after import the library or module, the log object is available, this behaviour will leads to confusing understanding.
I think the best practice is to provide an "init" method, the caller call "init" method, make the object available.
However, you could also provide an implicit init way in the file, or just create the log file if it doesn't exist.
Should a logging instance whose runtime configuration will never be altered be created (via getLogger) inside of each function that uses it, or can I create it once and only once outside of the functions?
Example:
import logging
def homepage_view(...):
log = logging.getLogger(...)
log.debug('Loaded the homepage')
or
import logging
log = logging.getLogger(...)
def homepage_view(...):
log.debug('Loaded the homepage')
The second of these is the recommended best practice, using
log = logging.getLogger(__name__)
at the module level.
Update: It's the best practice because it's simpler. Nothing is gained by invoking getLogger in each function that uses it, and loggers are singletons anyway.
I am writing a Python wrapper for a C library using the cffi.
The C library has to be initialized and shut down. Also, the cffi needs some place to save the state returned from ffi.dlopen().
I can see two paths here:
Either I wrap this whole stateful business in a class like this
class wrapper(object):
def __init__(self):
self.c = ffi.dlopen("mylibrary")
self.c.initialize()
def __del__(self):
self.c.terminate()
Or I provide two global functions that hide the state in a global variable
def initialize():
global __library
__library = ffi.dlopen("mylibrary")
__library.initialize()
def terminate():
__library.terminate()
del __library
The first path is somewhat cumbersome in that it requires the user to always create an object that really serves no other purpose other than managing the library state. On the other hand, it makes sure that terminate() is actually called every time.
The second path seems to result in a somewhat easier API. However, it exposes some hidden global state, which might be a bad thing. Also, if the user forgets to call terminate(), the C library is not unloaded correctly (which is not a big problem on the C side).
Which one of these paths would be more pythonic?
Exposing a wrapper object only makes sense in python if the library actually supports something like multiple instances in one application. If it doesn't support that or it's not really relevant go for kindall's suggestion and just initialize the library when imported and add an atexit handler for cleanup.
Adding wrappers around a stateless api or even an api without support for keeping different sets of state is not really pythonic and would raise expectations that different instances have some kind of isolation.
Example code:
import atexit
# Normal library initialization
__library = ffi.dlopen("mylibrary")
__library.initialize()
# Private library cleanup function
def __terminate():
__library.terminate()
# register function to be called on clean interpreter termination
atexit.register(__terminate)
For more details about atexit this question has some more details, as has the python documentation of course.