Formatting the message in Python logging - python

I would like to understand if it's possible and how is it possible, to modify the message part of a log message, using the Python logging module.
So basically, you can format a complete log as:
format = '{"timestamp": "%(asctime)s", "logger_level": "%(levelname)s", "log_message": %(message)s}'
However, I would like to make sure the message part is always in json format. Is there any way I can modify the format of only the message part, maybe with a custom logging.Formatter?
Thank you.

The format specification %(message)s tells Python you want to format a string. Try it with %(message)r and it should do the job:
>>> logging.error('{"log_message": %r}', {"a": 55})
ERROR:root:{"log_message": {'a': 55}}

There's an example in the Logging Cookbook which shows one way of doing this.Basically:
import json
import logging
class StructuredMessage:
def __init__(self, message, /, **kwargs):
self.message = message
self.kwargs = kwargs
def __str__(self):
return '%s >>> %s' % (self.message, json.dumps(self.kwargs))
_ = StructuredMessage # optional, to improve readability
logging.basicConfig(level=logging.INFO, format='%(message)s')
logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))
Of course, you can adapt this basic idea to do something closer to what you want/need.
Update: The formatting only happens if the message is actually output. Also, it won't apply to logging from third-party libraries. You would need to subclass Logger before importing any other modules which import logging to achieve that, but it's a documented approach.

Related

Python disabled logging slowing script

I am using the built in Python "logging" module for my script. When I turn verbosity to "info" it seems like my "debug" messages are significantly slowing down my script.
Some of my "debug" messages print large dictionaries and I'm guessing Python is expanding the text before realizing "debug" messages are disabled. Example:
import pprint
pp = pprint.PrettyPrinter(indent=4)
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
How can I improve my performance? I'd prefer to still use Python's built in logging module. But need to figure out a "clean" way to solve this issue.
There is already a feature of logging for the feature mentioned by dankal444, which is slightly neater:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
Another possible approach is to use %-formatting, which only does the formatting when actually needed (the logging event has to be processed by a handler as well as a logger to get to that point). I know f-strings are the new(ish) hotness and are performant, but it all depends on the exact circumstances as to which will offer the best result.
An example of taking advantage of lazy %-formatting:
class DeferredFormatHelper:
def __init__(self, func, *args, *kwargs):
self.func = func # assumed to return a string
self.args = args
self.kwargs = kwargs
def __str__(self):
# This is called because the format string contains
# a %s for this logging argument, lazily if and when
# the formatting needs to happen
return self.func(*self.args, **self.kwargs)
if logger.isEnabledFor(logging.DEBUG):
arg = DeferredFormatHelper(pp.pformat, obj)
logger.debug('Large Dict Object: %s', arg)
Check if the current level is good enough:
if logger.getEffectiveLevel() <= logging.DEBUG:
logger.debug(f"Large Dict Object: {pp.pformact(obj)}")
This is not super clean but best that I can think of. You just need to encompass with this if performance bottlenecks
I can't verify where your bottleneck is, but if it's because of the pprint library, your logger will never have a chance to do anything about it. Rewriting to clarify.
from pprint import PrettyPrinter
import logging
logger = logging.getLogger()
large_object = {"very": "large container"}
pp = PrettyPrinter(indent=4)
# This is done first.
formatted_msg = pp.pformat(large_object)
# It's already formatted when it's sent to your logger.
logger.debug(f"Large dict object: {formatted_msg}")

Declare python module in yaml

I have a yaml file which has some fields with values that are understandable in python, but they get parsed as string values, not that python type I meant. This is my sample:
verbose:
level: logging.DEBUG
and obviously when I load it, the value is string type
config = yaml.load(args.config.read(), Loader=yaml.SafeLoader)
I have no idea how to get exactly logging.DEBUG object, not its string.
Note that I don't look for configuring logging to get logger thing. This logging is just a sample of python module.
There's no out of the box way for that. The simplest and safest way seems to be processing the values manually, e.g:
import logging
class KnownModules:
logging = logging
...
def parse_value(s):
v = KnownModules
for p in s.split('.'):
v = getattr(v, p) # remember to handle AttributeError
return v
However, if you're ok with slightly changing your YAML structure, PyYAML supports some custom YAML tags. For example:
verbose:
level: !!python/name:logging.DEBUG
will make config['verbose']['level'] equal to logging.DEBUG (i.e. 10).
Considering that you're (correctly) using SafeLoader, you may need to combine those methods by defining your own tag.
The YAML loader has no knowledge of what logging.DEBUG might mean except a string "logging.DEBUG" (unless it's tagged with a YAML tag).
For string values that need to be interpreted as e.g. references to module attributes, you will need to parse them after-the-fact, e.g.
def parse_logging_level(level_string: str):
module, _, value = level_string.partition(".")
assert module == "logging"
return logging._nameToLevel[value]
# ...
yaml_data["verbose"]["level"] = parse_logging_level(yaml_data["verbose"]["level"])
Edit: Please see AKX answer. I was not aware of logging._nameToLevel which does not require defining your own enum and is definitely better than using evel. But, I decided to not delete this answer as I think the current preferred design (as of python 3.4) which uses enums is worth mentioning (it would probably be used in the logging module if it was available back then).
If you are absolutely sure that the values provided in the config are legitimate ones, you can use eval like this:
import logging
levelStr = 'logging.DEBUG'
level = eval(levelStr)
But as said in the comments, if you are not sure about the values present in the config file, using eval could be disasterous (see the example provided by AKX in the comments).
A better design is to define an enum for this purpose. Unfortunately the logging module does not provide the levels as enum (they are just constants defined in the module), thus you should define your own.
from enum import Enum
class LogLevel(Enum):
CRITICAL = 50
FATAL = 50
ERROR = 40
WARNING = 30
WARN = 30
INFO = 20
DEBUG = 10
NOTSET = 0
and then you can use it like this:
levelStr = 'DEBUG'
levelInt = LogLevel[levelStr].value # Comparable with logging.DEBUG which is also an integer
But to use this you have to change your yml file a bit and replace logging.DEBUG with DEBUG.

How to catch errors that were only logged?

Main Question
I am using a module that relies on logging instead of raising error messages. How can I catch logged errors from within Python to react to them (without dissecting the log file)?
Minimal Example
Suppose logging_module.py looks like this:
import logging
import random
def foo():
logger = logging.getLogger("quz")
if random.choice([True,False]):
logger.error("Doooom")
If this module used exceptions, I could do something like this:
from logging_module import foo, Doooom
try:
foo()
except Doooom:
bar()
Assuming that logging_module is written the way it is and I cannot change it, this is impossible. What can I do instead?
What I considered so far
I went through the logging documentation (though I did not read every word), but the only way to access what is logged seems to be dissecting the actual log, which seems overly tedious to me (but I may misunderstand this).
You can add a filter to the logger that the module uses to inspect every log. The documentation has this to say on using filters for something like that:
Although filters are used primarily to filter records based on more
sophisticated criteria than levels, they get to see every record which
is processed by the handler or logger they’re attached to: this can be
useful if you want to do things like counting how many records were
processed by a particular logger or handler
The code below assumes that you are using the logging_module that you showed in the question and tries to emulate what the try-except does: that is, when an error happens inside a call of foo the function bar is called.
import logging
from logging_module import foo
def bar():
print('error was logged')
def filt(r):
if r.levelno == logging.ERROR:
bar()
return True
logger = logging.getLogger('quz')
logger.addFilter(filt)
foo() # bar will be called if this logs an error

Using `str.format` to template log messages

I'm trying to use str.format style templating in my logging. Can't seem to get it working properly.
>>> import logging
>>> logging.basicConfig(filename='/tmp/example', format='{asctime} - {levelname} - {message}', style='{', level=logging.INFO)
>>> logger = logging.getLogger(__name__)
>>> logger.warning('blah')
>>> logger.warning('{foo:03d}', {'foo': 42})
Actual output:
2017-02-23 16:11:45,695 - WARNING - blah
2017-02-23 16:12:11,432 - WARNING - {foo:03d}
Expected output:
2017-02-23 16:11:45,695 - WARNING - blah
2017-02-23 16:12:11,432 - WARNING - 042
What am I missing in this setup?
I'm not interested to see workarounds that format the string before it's logged, or Python 2 solutions which use old %-style templating.
Apparently the style argument only applies to information about messages (such as a timestamp, severity, etc.) and not to actual messages.
From the docstring of logger.warning:
warning(msg, *args, **kwargs) method of logging.Logger instance
Log 'msg % args' with severity 'WARNING'.
It seems that the msg is always formatted using old-style formatting, so the style argument of the logger is not even considered.
The logging HOWTO contains a bit more information:
... you cannot directly make logging calls using str.format() or
string.Template syntax, because internally the logging package uses
%-formatting to merge the format string and the variable arguments.
There would no changing this while preserving backward compatibility,
since all logging calls which are out there in existing code will be
using %-format strings.
"I'm not interested to see workarounds that format the string before it's logged" Why not? The old style formats it before it's logged also... You can put the line to do the formatting inside the logger.warning call and it changes nothing functionally. If {foo:42} is just one member of a much larger dictionary you can do something like this:
for key,val in warningsDictionary.iteritems():
logger.warning('{'+key+':'+format(val,'03')+'}')
Whether or not this is sufficient is dependent on what you actually are trying to do.

PyLint message: logging-format-interpolation

For the following code:
logger.debug('message: {}'.format('test'))
pylint produces the following warning:
logging-format-interpolation (W1202):
Use % formatting in logging functions and pass the % parameters as
arguments Used when a logging statement has a call form of
“logging.(format_string.format(format_args...))”. Such
calls should use % formatting instead, but leave interpolation to the
logging function by passing the parameters as arguments.
I know I can turn off this warning, but I'd like to understand it. I assumed using format() is the preferred way to print out statements in Python 3. Why is this not true for logger statements?
It is not true for logger statement because it relies on former "%" format like string to provide lazy interpolation of this string using extra arguments given to the logger call. For instance instead of doing:
logger.error('oops caused by %s' % exc)
you should do
logger.error('oops caused by %s', exc)
so the string will only be interpolated if the message is actually emitted.
You can't benefit of this functionality when using .format().
Per the Optimization section of the logging docs:
Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event.
Maybe this time differences can help you.
Following description is not the answer for your question, but it can help people.
If you want to use fstrings (Literal String Interpolation) for logging, then you can disable it from .pylintrc file with disable=logging-fstring-interpolation, see: related issue and comment.
Also you can disable logging-format-interpolation.
For pylint 2.4:
There are 3 options for logging style in the .pylintrc file: old, new, fstr
fstr option added in 2.4 and removed in 2.5
Description from .pylintrc file (v2.4):
[LOGGING]
# Format style used to check logging format string. `old` means using %
# formatting, `new` is for `{}` formatting,and `fstr` is for f-strings.
logging-format-style=old
for old (logging-format-style=old):
foo = "bar"
self.logger.info("foo: %s", foo)
for new (logging-format-style=new):
foo = "bar"
self.logger.info("foo: {}", foo)
# OR
self.logger.info("foo: {foo}", foo=foo)
Note: you can not use .format() even if you select new option.
pylint still gives the same warning for this code:
self.logger.info("foo: {}".format(foo)) # W1202
# OR
self.logger.info("foo: {foo}".format(foo=foo)) # W1202
for fstr (logging-format-style=fstr):
foo = "bar"
self.logger.info(f"foo: {foo}")
Personally, I prefer fstr option because of PEP-0498.
In my experience a more compelling reason than optimization (for most use cases) for the lazy interpolation is that it plays nicely with log aggregators like Sentry.
Consider a 'user logged in' log message. If you interpolate the user into the format string, you have as many distinct log messages as there are users. If you use lazy interpolation like this, the log aggregator can more reasonably interpret this as the same log message with a bunch of different instances.
Here is an example of why it's better to use %s instead of f-strings in logging.
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> logger = logging.getLogger('MyLogger')
>>>
>>> class MyClass:
... def __init__(self, name: str) -> None:
... self._name = name
... def __str__(self) -> str:
... print('GENERATING STRING')
... return self._name
...
>>> c = MyClass('foo')
>>> logger.debug('Created: %s', c)
>>> logger.debug(f'Created: {c}')
GENERATING STRING
Inspired by Python 3.7 logging: f-strings vs %.
Might be several years after but having to deal with this the other day, I made simple; just formatted the string before logger.
message = 'message: {}'.format('test')
logger.debug(message)
That way there was no need to change any of the settings from log, if later on desire to change to a normal print there is no need to change the formatting or code.
"logging-format-interpolation (W1202)" is another one wrong recommendation from pylint (like many from pep8).
F-string are described as slow vs %, but have you checked ?
With 500_000 rotation of logging with f-string vs % -> f-string:23.01 sec. , %:25.43 sec.
So logging with f-string is faster than %.
When you look at the logging source code : log.error() -> self.logger._log() -> self.makeRecord() -> self._logRecordFactory() -> class LogRecord() -> home made equivalent to format()
code :
import logging
import random
import time
loops = 500_000
r_fstr = 0.0
r_format = 0.0
def test_fstr():
global loops, r_fstr
for i in range(0, loops):
r1 = time.time()
logging.error(f'test {random.randint(0, 1000)}')
r2 = time.time()
r_fstr += r2 - r1
def test_format():
global loops, r_format
for i in range(0 ,loops):
r1 = time.time()
logging.error('test %d', random.randint(0, 1000))
r2 = time.time()
r_format += r2 - r1
test_fstr()
test_format()
print(f'Logging f-string:{round(r_fstr,2)} sec. , %:{round(r_format,2)} sec.')

Categories