PyLint message: logging-format-interpolation - python

For the following code:
logger.debug('message: {}'.format('test'))
pylint produces the following warning:
logging-format-interpolation (W1202):
Use % formatting in logging functions and pass the % parameters as
arguments Used when a logging statement has a call form of
“logging.(format_string.format(format_args...))”. Such
calls should use % formatting instead, but leave interpolation to the
logging function by passing the parameters as arguments.
I know I can turn off this warning, but I'd like to understand it. I assumed using format() is the preferred way to print out statements in Python 3. Why is this not true for logger statements?

It is not true for logger statement because it relies on former "%" format like string to provide lazy interpolation of this string using extra arguments given to the logger call. For instance instead of doing:
logger.error('oops caused by %s' % exc)
you should do
logger.error('oops caused by %s', exc)
so the string will only be interpolated if the message is actually emitted.
You can't benefit of this functionality when using .format().
Per the Optimization section of the logging docs:
Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event.

Maybe this time differences can help you.
Following description is not the answer for your question, but it can help people.
If you want to use fstrings (Literal String Interpolation) for logging, then you can disable it from .pylintrc file with disable=logging-fstring-interpolation, see: related issue and comment.
Also you can disable logging-format-interpolation.
For pylint 2.4:
There are 3 options for logging style in the .pylintrc file: old, new, fstr
fstr option added in 2.4 and removed in 2.5
Description from .pylintrc file (v2.4):
[LOGGING]
# Format style used to check logging format string. `old` means using %
# formatting, `new` is for `{}` formatting,and `fstr` is for f-strings.
logging-format-style=old
for old (logging-format-style=old):
foo = "bar"
self.logger.info("foo: %s", foo)
for new (logging-format-style=new):
foo = "bar"
self.logger.info("foo: {}", foo)
# OR
self.logger.info("foo: {foo}", foo=foo)
Note: you can not use .format() even if you select new option.
pylint still gives the same warning for this code:
self.logger.info("foo: {}".format(foo)) # W1202
# OR
self.logger.info("foo: {foo}".format(foo=foo)) # W1202
for fstr (logging-format-style=fstr):
foo = "bar"
self.logger.info(f"foo: {foo}")
Personally, I prefer fstr option because of PEP-0498.

In my experience a more compelling reason than optimization (for most use cases) for the lazy interpolation is that it plays nicely with log aggregators like Sentry.
Consider a 'user logged in' log message. If you interpolate the user into the format string, you have as many distinct log messages as there are users. If you use lazy interpolation like this, the log aggregator can more reasonably interpret this as the same log message with a bunch of different instances.

Here is an example of why it's better to use %s instead of f-strings in logging.
>>> import logging
>>> logging.basicConfig(level=logging.INFO)
>>> logger = logging.getLogger('MyLogger')
>>>
>>> class MyClass:
... def __init__(self, name: str) -> None:
... self._name = name
... def __str__(self) -> str:
... print('GENERATING STRING')
... return self._name
...
>>> c = MyClass('foo')
>>> logger.debug('Created: %s', c)
>>> logger.debug(f'Created: {c}')
GENERATING STRING
Inspired by Python 3.7 logging: f-strings vs %.

Might be several years after but having to deal with this the other day, I made simple; just formatted the string before logger.
message = 'message: {}'.format('test')
logger.debug(message)
That way there was no need to change any of the settings from log, if later on desire to change to a normal print there is no need to change the formatting or code.

"logging-format-interpolation (W1202)" is another one wrong recommendation from pylint (like many from pep8).
F-string are described as slow vs %, but have you checked ?
With 500_000 rotation of logging with f-string vs % -> f-string:23.01 sec. , %:25.43 sec.
So logging with f-string is faster than %.
When you look at the logging source code : log.error() -> self.logger._log() -> self.makeRecord() -> self._logRecordFactory() -> class LogRecord() -> home made equivalent to format()
code :
import logging
import random
import time
loops = 500_000
r_fstr = 0.0
r_format = 0.0
def test_fstr():
global loops, r_fstr
for i in range(0, loops):
r1 = time.time()
logging.error(f'test {random.randint(0, 1000)}')
r2 = time.time()
r_fstr += r2 - r1
def test_format():
global loops, r_format
for i in range(0 ,loops):
r1 = time.time()
logging.error('test %d', random.randint(0, 1000))
r2 = time.time()
r_format += r2 - r1
test_fstr()
test_format()
print(f'Logging f-string:{round(r_fstr,2)} sec. , %:{round(r_format,2)} sec.')

Related

Pylint doesn't like string.format() and wants me to use f-strings. Is this fixable?

I've upgraded to pylint 2.15.2, and suddenly I'm getting lots of consider-using-f-string warnings whenever I run pylint, where I've used % formatting for strings. I understand why Pylint doesn't want to use the old % formatting, but I also get this error when I try to use string.format() instead. Take the following code as an example:
"""Example module"""
def some_long_complicated_function(a, b):
"""Do something"""
return a + b
def main():
"""Main function"""
a = 2
b = 3
percent_string = "The result of %s + %s is %s" % (
a, b, some_long_complicated_function(a, b)
)
format_string = "The result of {} + {} is {}".format(
a, b, some_long_complicated_function(a, b)
)
f_string = f"The result of {a} + {b} is {some_long_complicated_function(a, b)}"
print(percent_string)
print(format_string)
print(f_string)
if __name__ == "__main__":
main()
When I run pylint on this code, I get the following output:
************* Module pyexample
./pyexample.py:11:21: C0209: Formatting a regular string which could be a f-string (consider-using-f-string)
./pyexample.py:15:20: C0209: Formatting a regular string which could be a f-string (consider-using-f-string)
------------------------------------------------------------------
Your code has been rated at 8.46/10 (previous run: 6.15/10, +2.31)
There are instances like this where I don't want to use an f-string, because I think it actually hampers - not helps - readability, especially in cases like these where I may be writing long function calls inline within the string. In these places I'd rather use string.format(), because you can nicely separate out the format specifiers {} from the functions to generate the strings I want by putting them on a separate line. With f-strings, my lines may end up being too long and I have to resort to using line continuation characters, which again harms the readability IMO.
The problem is, Pylint doesn't like string.format() - it only wants me to use f-strings. I know that this is a 'Convention' not 'Error', but my code has to pass Pylint 100%. I could waive this message, but that's not good practice and there are places in my code where I do want to swap out the %-string formats.
My question:
Is there a way to configure Pylint so that when I run it, it will not flag a consider-using-f-string warning when I use string.format() (only when I use % strings)? I've had a look in the rc-file but I can't see any obvious setting like this.
Or is the only way to fix this to waive the warning entirely?
If you just want to avoid long line or line continuation character, I usually choose to use parentheses:
f_string = (f"The result of {a} + {b} is "
f"{some_long_complicated_function(a, b)}")

Python disabled logging slowing script

I am using the built in Python "logging" module for my script. When I turn verbosity to "info" it seems like my "debug" messages are significantly slowing down my script.
Some of my "debug" messages print large dictionaries and I'm guessing Python is expanding the text before realizing "debug" messages are disabled. Example:
import pprint
pp = pprint.PrettyPrinter(indent=4)
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
How can I improve my performance? I'd prefer to still use Python's built in logging module. But need to figure out a "clean" way to solve this issue.
There is already a feature of logging for the feature mentioned by dankal444, which is slightly neater:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
Another possible approach is to use %-formatting, which only does the formatting when actually needed (the logging event has to be processed by a handler as well as a logger to get to that point). I know f-strings are the new(ish) hotness and are performant, but it all depends on the exact circumstances as to which will offer the best result.
An example of taking advantage of lazy %-formatting:
class DeferredFormatHelper:
def __init__(self, func, *args, *kwargs):
self.func = func # assumed to return a string
self.args = args
self.kwargs = kwargs
def __str__(self):
# This is called because the format string contains
# a %s for this logging argument, lazily if and when
# the formatting needs to happen
return self.func(*self.args, **self.kwargs)
if logger.isEnabledFor(logging.DEBUG):
arg = DeferredFormatHelper(pp.pformat, obj)
logger.debug('Large Dict Object: %s', arg)
Check if the current level is good enough:
if logger.getEffectiveLevel() <= logging.DEBUG:
logger.debug(f"Large Dict Object: {pp.pformact(obj)}")
This is not super clean but best that I can think of. You just need to encompass with this if performance bottlenecks
I can't verify where your bottleneck is, but if it's because of the pprint library, your logger will never have a chance to do anything about it. Rewriting to clarify.
from pprint import PrettyPrinter
import logging
logger = logging.getLogger()
large_object = {"very": "large container"}
pp = PrettyPrinter(indent=4)
# This is done first.
formatted_msg = pp.pformat(large_object)
# It's already formatted when it's sent to your logger.
logger.debug(f"Large dict object: {formatted_msg}")

Declare python module in yaml

I have a yaml file which has some fields with values that are understandable in python, but they get parsed as string values, not that python type I meant. This is my sample:
verbose:
level: logging.DEBUG
and obviously when I load it, the value is string type
config = yaml.load(args.config.read(), Loader=yaml.SafeLoader)
I have no idea how to get exactly logging.DEBUG object, not its string.
Note that I don't look for configuring logging to get logger thing. This logging is just a sample of python module.
There's no out of the box way for that. The simplest and safest way seems to be processing the values manually, e.g:
import logging
class KnownModules:
logging = logging
...
def parse_value(s):
v = KnownModules
for p in s.split('.'):
v = getattr(v, p) # remember to handle AttributeError
return v
However, if you're ok with slightly changing your YAML structure, PyYAML supports some custom YAML tags. For example:
verbose:
level: !!python/name:logging.DEBUG
will make config['verbose']['level'] equal to logging.DEBUG (i.e. 10).
Considering that you're (correctly) using SafeLoader, you may need to combine those methods by defining your own tag.
The YAML loader has no knowledge of what logging.DEBUG might mean except a string "logging.DEBUG" (unless it's tagged with a YAML tag).
For string values that need to be interpreted as e.g. references to module attributes, you will need to parse them after-the-fact, e.g.
def parse_logging_level(level_string: str):
module, _, value = level_string.partition(".")
assert module == "logging"
return logging._nameToLevel[value]
# ...
yaml_data["verbose"]["level"] = parse_logging_level(yaml_data["verbose"]["level"])
Edit: Please see AKX answer. I was not aware of logging._nameToLevel which does not require defining your own enum and is definitely better than using evel. But, I decided to not delete this answer as I think the current preferred design (as of python 3.4) which uses enums is worth mentioning (it would probably be used in the logging module if it was available back then).
If you are absolutely sure that the values provided in the config are legitimate ones, you can use eval like this:
import logging
levelStr = 'logging.DEBUG'
level = eval(levelStr)
But as said in the comments, if you are not sure about the values present in the config file, using eval could be disasterous (see the example provided by AKX in the comments).
A better design is to define an enum for this purpose. Unfortunately the logging module does not provide the levels as enum (they are just constants defined in the module), thus you should define your own.
from enum import Enum
class LogLevel(Enum):
CRITICAL = 50
FATAL = 50
ERROR = 40
WARNING = 30
WARN = 30
INFO = 20
DEBUG = 10
NOTSET = 0
and then you can use it like this:
levelStr = 'DEBUG'
levelInt = LogLevel[levelStr].value # Comparable with logging.DEBUG which is also an integer
But to use this you have to change your yml file a bit and replace logging.DEBUG with DEBUG.

Where does the argparse and ConfigParser string replacement syntax come from?

When specifying help in argparse, I often use strings like %(default)s or %(const)s in the help= argument to display default arguments. The syntax is a bit weird, though: I assume it's left over from the days where python strings were formatted with %, but since python 2.6 the standard way to format strings has been using the format() function.
So are these libraries just using the 'old' replacement syntax, or does this come from somewhere else? It's been stated that the % replacement operator will disappear at some point, will these libraries change to the '{}'.format() syntax then?
Yes, the argparse and ConfigParser libraries use the old-style % string formatting syntax internally. These libraries were developed before str.format() and format() were available, or in the case of argparse the library authors aimed at compatibility with earlier Python versions.
If the % formatting ever is removed, then those libraries will indeed have to move to using string formatting using {} placeholders.
However, for various reasons, the % old-style string formatting style is here to stay for the foreseeable future; it has been 'un-deprecated'; str.format() is to be preferred but % is kept around for backwards compatibility.
The approved way of customizing the help formatting is to subclass HelpFormatter. A user can do this without waiting for a future Python release.
This formatter implements the {}.format in 2 places.
class NewHelpFormatter(argparse.HelpFormatter):
# _format_usage - format usage, but only uses dict(prog=self._prog)
def _format_text(self, text):
# for description, epilog, version
if '{prog}' in text:
text = text.format(prog=self._prog) # change from %
text_width = self._width - self._current_indent
indent = ' ' * self._current_indent
return self._fill_text(text, text_width, indent) + '\n\n'
def _expand_help(self, action):
params = dict(vars(action), prog=self._prog)
for name in list(params):
if params[name] is argparse.SUPPRESS:
del params[name]
for name in list(params):
if hasattr(params[name], '__name__'):
params[name] = params[name].__name__
if params.get('choices') is not None:
choices_str = ', '.join([str(c) for c in params['choices']])
params['choices'] = choices_str
return self._get_help_string(action).format(**params) # change from %
For example:
parser = argparse.ArgumentParser(prog='NewFormatter',
formatter_class=NewHelpFormatter,
description='{prog} description')
parser.add_argument('foo',nargs=3, default=[1,2,3],
help='nargs:{nargs} prog:{prog!r} defaults:{default} last:{default[2]}')
parser.add_argument('--bar',choices=['yes','no'],
help='choices: {choices!r}')
parser.print_help()
produces:
usage: NewFormatter [-h] [--bar {yes,no}] foo foo foo
NewFormatter description
positional arguments:
foo nargs:3 prog:'NewFormatter' defaults:[1, 2, 3] last:3
optional arguments:
-h, --help show this help message and exit
--bar {yes,no} choices: 'yes, no'

string.format() with optional placeholders

I have the following Python code (I'm using Python 2.7.X):
my_csv = '{first},{middle},{last}'
print( my_csv.format( first='John', last='Doe' ) )
I get a KeyError exception because 'middle' is not specified (this is expected). However, I want all of those placeholders to be optional. If those named parameters are not specified, I expect the placeholders to be removed. So the string printed above should be:
John,,Doe
Is there built in functionality to make those placeholders optional, or is some more in depth work required? If the latter, if someone could show me the most simple solution I'd appreciate it!
Here is one option:
from collections import defaultdict
my_csv = '{d[first]},{d[middle]},{d[last]}'
print( my_csv.format( d=defaultdict(str, first='John', last='Doe') ) )
"It does{cond} contain the the thing.".format(cond="" if condition else " not")
Thought I'd add this because it's been a feature since the question was asked, the question still pops up early in google results, and this method is built directly into the python syntax (no imports or custom classes required). It's a simple shortcut conditional statement. They're intuitive to read (when kept simple) and it's often helpful that they short-circuit.
Here's another option that uses the string interpolation operator %:
class DataDict(dict):
def __missing__(self, key):
return ''
my_csv = '%(first)s,%(middle)s,%(last)s'
print my_csv % DataDict(first='John', last='Doe') # John,,Doe
Alternatively, if you prefer using the more modern str.format() method, the following would also work, but is less automatic in the sense that you'll have explicitly define every possible placeholder in advance (although you could modify DataDict.placeholders on-the-fly if desired):
class DataDict(dict):
placeholders = 'first', 'middle', 'last'
default_value = ''
def __init__(self, *args, **kwargs):
self.update(dict.fromkeys(self.placeholders, self.default_value))
dict.__init__(self, *args, **kwargs)
my_csv = '{first},{middle},{last}'
print(my_csv.format(**DataDict(first='John', last='Doe'))) # John,,Doe
I faced the same problem as yours and decided to create a library to solve this problem: pyformatting.
Here is the solution to your problem with pyformatting:
>>> from pyformatting import defaultformatter
>>> default_format = defaultformatter(str)
>>> my_csv = '{first},{middle},{last}'
>>> default_format(my_csv, first='John', last='Doe')
'John,,Doe'
The only problem is pyformatting doesn't support python 2. pyformatting supports python 3.1+
If i see any feedback on the need for 2.7 support i think i will add that support.

Categories