Python logging perforamnce when level is higher - python

One advantage of using logging in Python instead of print is that you can set the level of the logging. When debugging you could set the level to DEBUG and everything below will get printed. If you set the level to ERROR then only error messages will get printed.
In a high-performance application this property is desirable. You want to be able to print some logging information during development/testing/debugging but not when you run it in production.
I want to ask if logging will be an efficient way to suppress debug and info logging when you set the level to ERROR. In other words, would doing the following:
logging.basicConfig(level=logging.ERROR)
logging.debug('something')
will be as efficient as
if not in_debug:
print('...')
Obviously the second code will not cost because checking a boolean is fast and when not in debug mode the code will be faster because it will not print unnecessary stuff. It comes to the cost of having all those if statement though. If logging delivers same performance without the if statements is of course much more desirable.

There's no need for you to check is_debug in your source code, because the Logging module already does that.
Here's an excerpt, with some comments and whitespace removed:
class Logger(Filterer):
def debug(self, msg, *args, **kwargs):
if self.isEnabledFor(DEBUG):
self._log(DEBUG, msg, args, **kwargs)
Just make sure you follow pylint guidelines on how to pass parameters to the logging function, so they don't execute stuff before calling the logging code. See PyLint message: logging-format-interpolation
I first learned about it through pylint warnings, but here's the official documentation that says to use % formatting and pass arguments to be evaluated later: https://docs.python.org/3/howto/logging.html#optimization

Related

Python context management and verbosity

To facilitate testing code as I write it, I include verbosity in almost every module I write, as follows:
class MyObj(object):
def __init__(arg0, kwarg0="default", verbosity=0):
self.a0 = arg0
self.k0 = kwarg0
self.vb = verbosity
def my_method(self):
if verbosity > 2:
print(f"{self} is doing a thing now...")
or
def my_func(arg0, arg1, verbosity=0):
if verbosity > 2:
print(f"doing something to {arg0} and {arg1}...")
if verbosity > 5: # Added on later edit
import ipdb;ipdb.set_trace() # to clarify requirement
do_somthing()
The executable scripts that import these will have collected (from the command line or elsewhere) a verbosity argument which gets passed all the way down the stack.
It's occurred to me to use a context manager so that I wouldn't have to initialize this variable at every level of the stack, something like having this in the driver script:
with args.verbosity as vb:
my_func("x", "y")
Can I do that and then use vb in my_func without having to include it in the signature? Is there a better way to achieve this kind of control?
SUBSEQUENT EDIT: it's clear from the first answers—-thank you for those--that I need to check out the logging module, but in some cases I want to stop execution in the middle to inspect things at a particular stack level (see the ipdb code I am adding with this edit). Would you still recommend that I use logging? (I'm assuming there's a way to get the logging level if I felt compelled to occasionally litter my code with if statements like that one.)
Finally, I'm still interested in whether the context management solution would be expected to work (even if it's not the optimal solution).
To facilitate testing code as I write it, I include verbosity in almost every module I write ...
Don't litter your code with if-statements and prints for this kind of purpose. It makes the code messy, repetitive and less readable.
The use-case is exactly what stdlib logging is for: you can unconditionally log events which describe what the program is doing, at various verbosity levels, and the messages will be displayed - or not - depending on the logging system's configuration.
import logging
log = logging.getLogger(__name__)
def my_func(arg0, arg1):
log.info("doing something to %s and %s...", arg0, arg1)
do_something()
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG, format="%(message)s")
my_func(123, 456)
In the example above, the message will print because it is at level INFO which is above the verbosity level that I've configured logging with (DEBUG). If you configure logging at level WARNING, then it won't display.
Generally the user will control the logging configuration settings (levels, formats, streams, files) via a config file, environment variables, or command-line arguments. It is up to the end-user to choose the specific logging configuration that meets their needs, as the developer you can just log events anytime. No need to worry about where the log events end up going to, or if they end up going anywhere at all.
Another way to do this is levels of logging. For example, Python's builtin logging module has error, warning, info, and debug levels:
import logging
logger = logging.getLogger()
logger.info('Normal message')
logger.debug('Message that only gets printed with high verbosity`)
Simply configure the logging level to debug, warn, etc., and you're basically done! Plus you get lots of native logging goodies.

Capture all stdout/stderr within structlog to generate JSON logs

I am currently trying to get away from print()'s and start with centralized log collection using the ELK stack and the structlog module to generate structured json log lines. This is working perfectly fine for modules that I wrote myself using a loggingHelper module that I can import and use with
logger = Logger()
in other modules and scripts. This is the loggingHelper module class:
class Logger:
"""
Wrapper Class to import within other modules and scripts
All the config and log binding (script
"""
def __init__(self):
self.__log = None
logging.basicConfig(level=logging.DEBUG, format='%(message)s')
structlog.configure(logger_factory=LoggerFactory(),
processors=[structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()])
logger = structlog.get_logger()
main_script = os.path.basename(sys.argv[0]) if sys.argv[0] else None
frame = inspect.stack()[1]
log_invocation = os.path.basename(frame[0].f_code.co_filename)
user = getpass.getuser()
"""
Who executed the __main__, what was the executed __main__ file,
where did the log event happen?
"""
self.__log = logger.bind(executedScript = main_script,
logBirth = log_invocation,
executingUser = user)
def info(self, msg, **kwargs):
self.__log.info(msg, **kwargs)
def debug(self, msg, **kwargs):
self.__log.debug(msg, **kwargs)
def error(self, msg, **kwargs):
self.__log.error(msg, **kwargs)
def warn(self, msg, **kwargs):
self.__log.warning(msg, **kwargs)
This produces nicely formatted output (one JSON per line) that filebeat is able to read and forward to Elasticsearch.
However, third-party librariers completely crush the well-formatted logs.
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "context": "SELECT displayname FROM point_of_sale WHERE name = '123'", "level": "debug", "timestamp": "2019-03-15T12:52:42.792398Z", "message": "querying local"}
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "level": "debug", "timestamp": "2019-03-15T12:52:42.807922Z", "message": "query successful: got 0 rows"}
building service object
auth version used is: v4
Traceback (most recent call last):
File "logAlot.py", line 26, in <module>
ef.EfoodDataControllerMerchantCenter().get_displayname(123)
File "/home/xyz/src/toolkit/commons/connectors/efood.py", line 1126, in get_displayname
return efc.select_from_local(q)['displayname'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
As you can see both info level and error level messages from the third party librara (googleapiclient) are printed without going through the logging processors.
What would be the best way (and most pythonic) of capturing and formatting everything that happens within execution of one script using the loggingHelper module I wrote? Is this even best practice?
Edit: Currently the logger indeed writes to stdout itself, which is then redirected to a file in crontab using >> and 2>&1. This looks like bad practice to me if I want to redirect everything that is written to stdout/stderr by third-party library logging, because this would lead to a loop, correct? Thus my goal is not redirecting, but rather capturing everything in my logging processor. Changed the title accordingly.
Also, here is a rough overview of what I am trying to achieve. I am very open to general criticism and suggestions that diviate from this.
Configuring the logging module
As you already figured out, structlog requires configuration of the
logging functionality already existing in python.
http://www.structlog.org/en/stable/standard-library.html
logging.basicConfig supports options for stream and filename here
https://docs.python.org/3/library/logging.html#logging.basicConfig.
Either you specify a filename which the logger will create a handle to and direct all its output. Depending on how you are set up maybe this would be the file you normally redirect to
import logging
logging.basicConfig(level=logging.DEBUG, format='%(message)s', filename='output.txt')
Or you can pass a StringIO object to the builder, which you can later read from and then redirect to your wished output destination
import logging
import io
stream = io.StringIO()
logging.basicConfig(level=logging.DEBUG, format='%(message)s', stream=stream)
More about StringIO can be read here
https://docs.python.org/3/library/io.html#io.TextIOBase
As #bruno pointed out in his answer, do not do this in an __init__ as you may end up calling this piece of code several times in the same process.
First thing first: you should NOT do any logger config (logging.basicConfig, logging.dictConfig etc) in your class initializer - the logging configuration should be done once and only once at process startup. The whole point of the logging module is to completely decouple logging calls
Second point: I'm no structlog expert (and that's an understatement - it's actually the very first time I hear about this package) but the result you get is what was to be expected from your code snippet: only your own code uses structlog, all other libs (stdlib or 3rd part) will still use the stdlib logger and emit plain text logs.
From what I've seen in structlog doc, it seems to provide some way to wrap the stdlib's loggers using the structlog.stdlib.LoggerFactory and add specific formatters to have a more consistant output. I have not tested this (yet) and the official doc is a bit sparse and lacking usable practical example (at least I couldn't find any) but this article seems to have a more explicit example (to be adapted to your own context and use case of course).
CAVEAT : as I said I never used structlog (first time I hear of this lib) so I might have misunderstood some things, and you will of course have to experiment to find out how to properly configure the whole thing to get it to work as expected.
As a side note: in unix-like systems stdout is supposed to be for program's outputs (I mean "expected output" => the program's actual results), while all error / reporting / debugging messages belong to stderr. Unless you have compelling reasons to do otherwise you should try and stick to this convention (at least for command-line tools so you can chain / pipeline them the unix way).

how to replace print debug message with the logging module

Up to now, I've been peppering my code with 'print debug message' and even 'if condition: print debug message'. But a number of people have told me that's not the best way to do it, and I really should learn how to use the logging module. After a quick read, it looks as though it does everything I could possibly want, and then some. It looks like a learning project in its own right, and I want to work on other projects now and simply use the minimum functionality to help me. If it makes any difference, I am on python 2.6 and will be for the forseeable future, due to library and legacy compatibilities.
All I want to do at the moment is pepper my code with messages that I can turn on and off section by section, as I manage to debug specific regions. As a 'hello_log_world', I tried this, and it doesn't do what I expected
import logging
# logging.basicConfig(level=logging.DEBUG)
logging.error('first error')
logging.debug('first debug')
logging.basicConfig(level=logging.DEBUG)
logging.error('second error')
logging.debug('second debug')
You'll notice I'm using the really basic config, using as many defaults as possible, to keep things simple. But appears that it's too simple, or that I don't understand the programming model behind logging.
I had expected that sys.stderr would end up with
ERROR:root:first error
ERROR:root:second error
DEBUG:root:second debug
... but only the two error messages appear. Setting level=DEBUG doesn't make the second one appear. If I uncomment the basicConfig call at the start of the program, all four get output.
Am I trying to run it at too simple a level?
What's the simplest thing I can add to what I've written there to get my expected behaviour?
Logging actually follows a particular hierarchy (DEBUG -> INFO -> WARNING -> ERROR -> CRITICAL), and the default level is WARNING. Therefore the reason you see the two ERROR messages is because it is ahead of WARNING on the hierarchy chain.
As for the odd commenting behavior, the explanation is found in the logging docs (which as you say are a task unto themselves :) ):
The call to basicConfig() should come before any calls to debug(),
info() etc. As it’s intended as a one-off simple configuration
facility, only the first call will actually do anything: subsequent
calls are effectively no-ops.
However you can use the setLevel parameter to get what you desire:
import logging
logging.getLogger().setLevel(logging.ERROR)
logging.error('first error')
logging.debug('first debug')
logging.getLogger().setLevel(logging.DEBUG)
logging.error('second error')
logging.debug('second debug')
The lack of an argument to getLogger() means that the root logger is modified. This is essentially one step before #del's (good) answer, where you start getting into multiple loggers, each with their own specific properties/output levels/etc.
Rather than modifying the logging levels in your code to control the output, you should consider creating multiple loggers, and setting the logging level for each one individually. For example:
import logging
first_logger = logging.getLogger('first')
second_logger = logging.getLogger('second')
logging.basicConfig()
first_logger.setLevel(logging.ERROR)
second_logger.setLevel(logging.DEBUG)
first_logger.error('first error')
first_logger.debug('first debug')
second_logger.error('second error')
second_logger.debug('second debug')
This outputs:
ERROR:first:first error
ERROR:second:second error
DEBUG:second:second debug

Switching off debug prints

Sometimes I have a lot of prints scattered around function to print debug output.
To switch this debug outputs I came up with this:
def f(debug=False):
print = __builtins__.print if debug else lambda *p: None
Or if I need to print something apart from debug message, I create dprint function for debug messages.
The problem is, when debug=False, this print statements slow down the code considerably, because lambda *p: None is still called, and function invocation are known to be slow.
So, my question is: Is there any better way to efficiently disable all these debug prints for them not to affect code performance?
All the answers are regarding my not using logging module. This is a good to notice, but this doesn't answer the question how to avoid function invocations that slow down the code considerably - in my case 25 times (if it's possible (for example by tinkering with function code object to through away all the lines with print statements or somehow else)). What these answers suggest is replacing print with logging.debug, which should be even slower. And this question is about getting rid of those function calls completely.
I tried using logging instead of lambda *p: None, and no surprise, code became even slower.
Maybe someone would like to see the code where those prints caused 25 slowdown: http://ideone.com/n5PGu
And I don't have anything against logging module. I think it's a good practice to always stick to robust solutions without some hacks. But I thinks there is nothing criminal if I used those hacks in 20-line one-time code snippet.
Not as a restriction, but as a suggestion, maybe it's possible to delete some lines (e.g. starting with print) from function source code and recompile it? I laid out this approach in the answer below. Though I would like to see some comments on that solution, I welcome other approaches to solving this problem.
You should use the logging module instead. See http://docs.python.org/library/logging.html
Then you can set the log level depending on your needs, and create multiple logger objects, that log about different subjects.
import logging
#set your log level
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a log message')
In your case: you could simply replace your print statement with a log statement, e.g.:
import logging
print = __builtins__.print if debug else logging.debug
now the function will only be print anything if you set the logging level to debug
logging.basicConfig(level=logging.DEBUG)
But as a plus, you can use all other logging features on top! logging.error('error!')
Ned Batchelder wrote in the comment:
I suspect the slow down is in the calculation of the arguments to
your debug function. You should be looking for ways to avoid those
calculations. Preprocessing Python is just a distraction.
And he is right as slowdown is actually caused by formatting string with format method which happens regardless if the resulting string will be logged or not.
So, string formatting should be deferred and dismissed if no logging will occur. This may be achieved by refactoring dprint function or using log.debug in the following way:
log.debug('formatted message: %s', interpolated_value)
If message won't be logged, it won't be formatted, unlike print, where it's always formatted regardless of if it'll be logged or discarded.
The solution on log.debug's postponed formatting gave Martijn Pieters here.
Another solution could be to dynamically edit code of f and delete all drpint calls. But this solution is highly unrecommended to be used:
You are correct, you should never resort to this, there are so many
ways it can go wrong. First, Python is not a language designed for
source-level transformations, and it's hard to write it a transformer
such as comment_1 without gratuitously breaking valid code. Second,
this hack would break in all kinds of circumstances - for example,
when defining methods, when defining nested functions, when used in
Cython, when inspect.getsource fails for whatever reason. Python is
dynamic enough that you really don't need this kind of hack to
customize its behavior.
Here is the code of this approach, for those who like to get acquainted with it:
from __future__ import print_function
DEBUG = False
def dprint(*args,**kwargs):
'''Debug print'''
print(*args,**kwargs)
_blocked = False
def nodebug(name='dprint'):
'''Decorator to remove all functions with name 'name' being a separate expressions'''
def helper(f):
global _blocked
if _blocked:
return f
import inspect, ast, sys
source = inspect.getsource(f)
a = ast.parse(source) #get ast tree of f
class Transformer(ast.NodeTransformer):
'''Will delete all expressions containing 'name' functions at the top level'''
def visit_Expr(self, node): #visit all expressions
try:
if node.value.func.id == name: #if expression consists of function with name a
return None #delete it
except(ValueError):
pass
return node #return node unchanged
transformer = Transformer()
a_new = transformer.visit(a)
f_new_compiled = compile(a_new,'<string>','exec')
env = sys.modules[f.__module__].__dict__
_blocked = True
try:
exec(f_new_compiled,env)
finally:
_blocked = False
return env[f.__name__]
return helper
#nodebug('dprint')
def f():
dprint('f() started')
print('Important output')
dprint('f() ended')
print('Important output2')
f()
More information: Replacing parts of the function code on-the-fly
As a hack, yes, that works. (And there is no chance in hell those lambda no-ops are your app's bottleneck.)
However, you really should be doing logging properly by using the logging module.
See http://docs.python.org/howto/logging.html#logging-basic-tutorial for a basic example of how this should be done.
You definitely need to use the logging module of Python, it's very practical and you can change the log level of your application. Example:
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> logging.debug('Test.')
DEBUG:root:Test.

Using print statements only to debug

I have been coding a lot in Python of late. And I have been working with data that I haven't worked with before, using formulae never seen before and dealing with huge files. All this made me write a lot of print statements to verify if it's all going right and identify the points of failure. But, generally, outputting so much information is not a good practice. How do I use the print statements only when I want to debug and let them be skipped when I don't want them to be printed?
The logging module has everything you could want. It may seem excessive at first, but only use the parts you need. I'd recommend using logging.basicConfig to toggle the logging level to stderr and the simple log methods, debug, info, warning, error and critical.
import logging, sys
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
logging.debug('A debug message!')
logging.info('We processed %d records', len(processed_records))
A simple way to do this is to call a logging function:
DEBUG = True
def log(s):
if DEBUG:
print s
log("hello world")
Then you can change the value of DEBUG and run your code with or without logging.
The standard logging module has a more elaborate mechanism for this.
Use the logging built-in library module instead of printing.
You create a Logger object (say logger), and then after that, whenever you insert a debug print, you just put:
logger.debug("Some string")
You can use logger.setLevel at the start of the program to set the output level. If you set it to DEBUG, it will print all the debugs. Set it to INFO or higher and immediately all of the debugs will disappear.
You can also use it to log more serious things, at different levels (INFO, WARNING and ERROR).
First off, I will second the nomination of python's logging framework. Be a little careful about how you use it, however. Specifically: let the logging framework expand your variables, don't do it yourself. For instance, instead of:
logging.debug("datastructure: %r" % complex_dict_structure)
make sure you do:
logging.debug("datastructure: %r", complex_dict_structure)
because while they look similar, the first version incurs the repr() cost even if it's disabled. The second version avoid this. Similarly, if you roll your own, I'd suggest something like:
def debug_stdout(sfunc):
print(sfunc())
debug = debug_stdout
called via:
debug(lambda: "datastructure: %r" % complex_dict_structure)
which will, again, avoid the overhead if you disable it by doing:
def debug_noop(*args, **kwargs):
pass
debug = debug_noop
The overhead of computing those strings probably doesn't matter unless they're either 1) expensive to compute or 2) the debug statement is in the middle of, say, an n^3 loop or something. Not that I would know anything about that.
I don't know about others, but I was used to define a "global constant" (DEBUG) and then a global function (debug(msg)) that would print msg only if DEBUG == True.
Then I write my debug statements like:
debug('My value: %d' % value)
...then I pick up unit testing and never did this again! :)
A better way to debug the code is, by using module clrprint
It prints a color full output only when pass parameter debug=True
from clrprint import *
clrprint('ERROR:', information,clr=['r','y'], debug=True)

Categories