Should a logging instance whose runtime configuration will never be altered be created (via getLogger) inside of each function that uses it, or can I create it once and only once outside of the functions?
Example:
import logging
def homepage_view(...):
log = logging.getLogger(...)
log.debug('Loaded the homepage')
or
import logging
log = logging.getLogger(...)
def homepage_view(...):
log.debug('Loaded the homepage')
The second of these is the recommended best practice, using
log = logging.getLogger(__name__)
at the module level.
Update: It's the best practice because it's simpler. Nothing is gained by invoking getLogger in each function that uses it, and loggers are singletons anyway.
Related
Main Question
I am using a module that relies on logging instead of raising error messages. How can I catch logged errors from within Python to react to them (without dissecting the log file)?
Minimal Example
Suppose logging_module.py looks like this:
import logging
import random
def foo():
logger = logging.getLogger("quz")
if random.choice([True,False]):
logger.error("Doooom")
If this module used exceptions, I could do something like this:
from logging_module import foo, Doooom
try:
foo()
except Doooom:
bar()
Assuming that logging_module is written the way it is and I cannot change it, this is impossible. What can I do instead?
What I considered so far
I went through the logging documentation (though I did not read every word), but the only way to access what is logged seems to be dissecting the actual log, which seems overly tedious to me (but I may misunderstand this).
You can add a filter to the logger that the module uses to inspect every log. The documentation has this to say on using filters for something like that:
Although filters are used primarily to filter records based on more
sophisticated criteria than levels, they get to see every record which
is processed by the handler or logger they’re attached to: this can be
useful if you want to do things like counting how many records were
processed by a particular logger or handler
The code below assumes that you are using the logging_module that you showed in the question and tries to emulate what the try-except does: that is, when an error happens inside a call of foo the function bar is called.
import logging
from logging_module import foo
def bar():
print('error was logged')
def filt(r):
if r.levelno == logging.ERROR:
bar()
return True
logger = logging.getLogger('quz')
logger.addFilter(filt)
foo() # bar will be called if this logs an error
While reading Python code, I usually see one of those two conventions:
def something(logger):
logger.info('doing something')
or:
LOGGER = logging.getLogger(__NAME__)
def something():
LOGGER.info('doing something')
Does the former have any advantages, i.e. being thread safe while the other isn't? Or is it purely a stylistic difference?
Use a global logger if you want a fixed logger:
LOGGER = logging.getLogger('stuff.do')
# logger depends on what we are
def do_stuff(operation: Callable):
LOGGER.info('will do stuff')
operation()
LOGGER.info('just did stuff')
do_stuff(add_things)
do_stuff(query_things)
This is commonly used when logging shared operations for diagnostic purposes. For example, a web server would log creating and destroying threads.
Use a logger parameter if you want to change the logger:
# logger depends on what we do
def do_stuff(operation: Callable, logger: Logger):
logger.info('will do stuff')
operation()
logger.info('just did stuff')
do_stuff(add_things, logging.getLogger('add'))
do_stuff(query_things, logging.getLogger('query'))
This is commonly used when logging configurable operations for accounting purposes. For example, a web server would log different kinds of requests and their results.
Which one to use depends solely on whether the choice of logger depends on global or local data.
If the logger choice can be decided globally, doing so avoids polluting function signatures with logger passing. This improves modularity, as you can add/remove logging calls without changing other code. When using logging to find bugs, you likely want to add logging to dubious code sections and remove it from proven ones.
If the logger choice depends on local state, passing loggers or their names around is often the only option. When using logging to document what is going on, you sometimes want to add new kinds of operation subjects later on.
There are no runtime or safety advantages to using either approach, other than avoiding the operations to pass things around. The logging module is designed to be thread-safe:
Thread Safety
The logging module is intended to be thread-safe without
any special work needing to be done by its clients. It achieves this
though using threading locks; there is one lock to serialize access to
the module’s shared data, and each handler also creates a lock to
serialize access to its underlying I/O.
It is entirely equivalent to create a new "instance" of the same logger, or to create an alias for the same logger:
>>> a = logging.getLogger('demo')
>>> b = a
>>> c = logging.getLogger('demo')
>>> a is b is c
True
is in Python any way how to initialize the variable only once and then just import it into the remaining modules?
I have following python project structure:
api
v1
init.py
v2
init.py
init.py
logging.py
logging.py:
from raven import Client
sentry = None
def init_sentry():
global sentry
sentry = 'some_dsn'
api/init.py
from app import logging
logging.init_sentry()
#run flask server (v1,v2)
api/{v1,v2}/init.py
from logging import sentry
try:
1 / 0
except ZeroDivisionError:
sentry.captureException()
In files api/v1/init.py and api/v2/init.py a get a error NoneType on sentry variable. I know I can call init_sentry in all files when I use it, but I'm looking for a better way.
Thanks
First, I think you misspelled init.py, should it be __init__.py.
It is bad programming style to pass data between modules with variable. You should use a class or a function to handle shared data. In such manner you have an API, and it is clear what the variable could be modified by other modules.
But for your question: I would (really not) create a module data.py with a shared = {} dictionary. From other modules, just by importing I can share the data. By looking if a variable, or just a flag moduleA_initialized, you can decide if you need to initialized the module.
As alternative, you can directly write to globals() dictionary. Note: this is worse programming practice, and you should check carefully the names, so that there is no conflicts to any library you may use. gettext write to it, but is pretty special case.
Here is one way to encapsulate the sentry variable and make sure that it is always calling into something, instead of accessing None:
logging.py:
class Sentry(object):
_dsn = None
#classmethod
def _set_dsn(cls, dsn):
cls._dsn = dsn
#classmethod
def __getattr__(cls, item):
return getattr(cls._dsn)
sentry = Sentry
def init_sentry():
Sentry._set_dsn('some_dsn')
Note:
This answer was also correct about the fact that you likely want __init__.py not init.py.
I'm writing some code for an esp8266 micro controller using micro-python and it has some different class as well as some additional methods in the standard built in classes. To allow me to debug on my desktop I've built some helper classes so that the code will run. However I've run into a snag with micro-pythons time function which has a time.sleep_ms method since the standard time.sleep method on micropython does not accept floats. I tried using the following code to extend the built in time class but it fails to import properly. Any thoughts?
class time(time):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def sleep_ms(self, ms):
super().sleep(ms/1000)
This code exists in a file time.py. Secondly I know I'll have issues with having to import time.time that I would like to fix. I also realize I could call this something else and put traps for it in my micro controller code however I would like to avoid any special functions in what's loaded into the controller to save space and cycles.
You're not trying to override a class, you're trying to monkey-patch a module.
First off, if your module is named time.py, it will never be loaded in preference to the built-in time module. Truly built-in (as in compiled into the interpreter core, not just C extension modules that ship with CPython) modules are special, they are always loaded without checking sys.path, so you can't even attempt to shadow the time module, even if you wanted to (you generally don't, and doing so is incredibly ugly). In this case, the built-in time module shadows you; you can't import your module under the plain name time at all, because the built-in will be found without even looking at sys.path.
Secondly, assuming you use a different name and import it for the sole purpose of monkey-patching time (or do something terrible like adding the monkey patch to a custom sitecustomize module, it's not trivial to make the function truly native to the monkey-patched module (defining it in any normal way gives it a scope of the module where it was defined, not the same scope as other functions from the time module). If you don't need it to be "truly" defined as part of time, the simplest approach is just:
import time
def sleep_ms(ms):
return time.sleep(ms / 1000)
time.sleep_ms = sleep_ms
Of course, as mentioned, sleep_ms is still part of your module, and carries your module's scope around with it (that's why you do time.sleep, not just sleep; you could do from time import sleep to avoid qualifying it, but it's still a local alias that might not match time.sleep if someone else monkey-patches time.sleep later).
If you want to make it behave like it's part of the time module, so you can reference arbitrary things in time's namespace without qualification and always see the current function in time, you need to use eval to compile your code in time's scope:
import time
# Compile a string of the function's source to a code object that's not
# attached to any scope at all
# The filename argument is garbage, it's just for exception traceback
# reporting and the like
code = compile('def sleep_ms(ms): sleep(ms / 1000)', 'time.py', 'exec')
# eval the compiled code with a scope of the globals of the time module
# which both binds it to the time module's scope, and inserts the newly
# defined function directly into the time module's globals without
# defining it in your own module at all
eval(code, vars(time))
del code, time # May as well leave your monkey-patch module completely empty
I tried to put getLogger in the module level. However, it has some disadvantages. Here is my example:
from logging.handlers import TimeRotatingFileHandler
log_monitor = logging.getLogger('monitorlog')
log_monitor.setLevel(logging.DEBUG)
log_monitor.propagate = False
handler = TimedRotatingFileHandler('somedirectory/monitor.log',
when='h',
interval=1,
backupCount=30)
monitor_format = logging.Formatter('%(asctime)s: %(message)s')
handler.setFormatter(monitor_format)
log_monitor.addHandler(handler)
The problem is, when some other module import this one, the above code will be executed. It is possible, at that time, the somedirectory does not even exist and the build will fail.
Actually, this logger will be used in a class, so I am thinking of putting getLogger into the class. However, I feel if people create multiple object of that class, that piece of code will be called multiple times. I guess this part of code is supposed only to be called once.
I am pretty new to python. Where do people usually put their getLogger and in this case, where should I put this piece of code?
Short answer, you just need to make sure you do your logger set up after the directory is created.
If you want to import the above and only then create the file, one way to do it is to put your code in a function.
def monitor_log_setup():
log_monitor = logging.getLogger('monitorlog')
log_monitor.setLevel(logging.DEBUG)
log_monitor.propagate = False
handler = TimedRotatingFileHandler('somedirectory/monitor.log',
when='h',
interval=1,
backupCount=30)
monitor_format = logging.Formatter('%(asctime)s: %(message)s')
handler.setFormatter(monitor_format)
log_monitor.addHandler(handler)
return log_monitor
It is now safe to import this module, you just have to make sure the function is called before you want to start logging (after creating the directory).
You can then use logging.getLogger('monitorlog') to return the same logger as defined in the function whenever you need it throughout your code.
I think the problem is that you are trying to mix up the "import" with "init", you expect the API caller after import the library or module, the log object is available, this behaviour will leads to confusing understanding.
I think the best practice is to provide an "init" method, the caller call "init" method, make the object available.
However, you could also provide an implicit init way in the file, or just create the log file if it doesn't exist.