Using print statements only to debug - python

I have been coding a lot in Python of late. And I have been working with data that I haven't worked with before, using formulae never seen before and dealing with huge files. All this made me write a lot of print statements to verify if it's all going right and identify the points of failure. But, generally, outputting so much information is not a good practice. How do I use the print statements only when I want to debug and let them be skipped when I don't want them to be printed?

The logging module has everything you could want. It may seem excessive at first, but only use the parts you need. I'd recommend using logging.basicConfig to toggle the logging level to stderr and the simple log methods, debug, info, warning, error and critical.
import logging, sys
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
logging.debug('A debug message!')
logging.info('We processed %d records', len(processed_records))

A simple way to do this is to call a logging function:
DEBUG = True
def log(s):
if DEBUG:
print s
log("hello world")
Then you can change the value of DEBUG and run your code with or without logging.
The standard logging module has a more elaborate mechanism for this.

Use the logging built-in library module instead of printing.
You create a Logger object (say logger), and then after that, whenever you insert a debug print, you just put:
logger.debug("Some string")
You can use logger.setLevel at the start of the program to set the output level. If you set it to DEBUG, it will print all the debugs. Set it to INFO or higher and immediately all of the debugs will disappear.
You can also use it to log more serious things, at different levels (INFO, WARNING and ERROR).

First off, I will second the nomination of python's logging framework. Be a little careful about how you use it, however. Specifically: let the logging framework expand your variables, don't do it yourself. For instance, instead of:
logging.debug("datastructure: %r" % complex_dict_structure)
make sure you do:
logging.debug("datastructure: %r", complex_dict_structure)
because while they look similar, the first version incurs the repr() cost even if it's disabled. The second version avoid this. Similarly, if you roll your own, I'd suggest something like:
def debug_stdout(sfunc):
print(sfunc())
debug = debug_stdout
called via:
debug(lambda: "datastructure: %r" % complex_dict_structure)
which will, again, avoid the overhead if you disable it by doing:
def debug_noop(*args, **kwargs):
pass
debug = debug_noop
The overhead of computing those strings probably doesn't matter unless they're either 1) expensive to compute or 2) the debug statement is in the middle of, say, an n^3 loop or something. Not that I would know anything about that.

I don't know about others, but I was used to define a "global constant" (DEBUG) and then a global function (debug(msg)) that would print msg only if DEBUG == True.
Then I write my debug statements like:
debug('My value: %d' % value)
...then I pick up unit testing and never did this again! :)

A better way to debug the code is, by using module clrprint
It prints a color full output only when pass parameter debug=True
from clrprint import *
clrprint('ERROR:', information,clr=['r','y'], debug=True)

Related

Python logging perforamnce when level is higher

One advantage of using logging in Python instead of print is that you can set the level of the logging. When debugging you could set the level to DEBUG and everything below will get printed. If you set the level to ERROR then only error messages will get printed.
In a high-performance application this property is desirable. You want to be able to print some logging information during development/testing/debugging but not when you run it in production.
I want to ask if logging will be an efficient way to suppress debug and info logging when you set the level to ERROR. In other words, would doing the following:
logging.basicConfig(level=logging.ERROR)
logging.debug('something')
will be as efficient as
if not in_debug:
print('...')
Obviously the second code will not cost because checking a boolean is fast and when not in debug mode the code will be faster because it will not print unnecessary stuff. It comes to the cost of having all those if statement though. If logging delivers same performance without the if statements is of course much more desirable.
There's no need for you to check is_debug in your source code, because the Logging module already does that.
Here's an excerpt, with some comments and whitespace removed:
class Logger(Filterer):
def debug(self, msg, *args, **kwargs):
if self.isEnabledFor(DEBUG):
self._log(DEBUG, msg, args, **kwargs)
Just make sure you follow pylint guidelines on how to pass parameters to the logging function, so they don't execute stuff before calling the logging code. See PyLint message: logging-format-interpolation
I first learned about it through pylint warnings, but here's the official documentation that says to use % formatting and pass arguments to be evaluated later: https://docs.python.org/3/howto/logging.html#optimization

Python context management and verbosity

To facilitate testing code as I write it, I include verbosity in almost every module I write, as follows:
class MyObj(object):
def __init__(arg0, kwarg0="default", verbosity=0):
self.a0 = arg0
self.k0 = kwarg0
self.vb = verbosity
def my_method(self):
if verbosity > 2:
print(f"{self} is doing a thing now...")
or
def my_func(arg0, arg1, verbosity=0):
if verbosity > 2:
print(f"doing something to {arg0} and {arg1}...")
if verbosity > 5: # Added on later edit
import ipdb;ipdb.set_trace() # to clarify requirement
do_somthing()
The executable scripts that import these will have collected (from the command line or elsewhere) a verbosity argument which gets passed all the way down the stack.
It's occurred to me to use a context manager so that I wouldn't have to initialize this variable at every level of the stack, something like having this in the driver script:
with args.verbosity as vb:
my_func("x", "y")
Can I do that and then use vb in my_func without having to include it in the signature? Is there a better way to achieve this kind of control?
SUBSEQUENT EDIT: it's clear from the first answers—-thank you for those--that I need to check out the logging module, but in some cases I want to stop execution in the middle to inspect things at a particular stack level (see the ipdb code I am adding with this edit). Would you still recommend that I use logging? (I'm assuming there's a way to get the logging level if I felt compelled to occasionally litter my code with if statements like that one.)
Finally, I'm still interested in whether the context management solution would be expected to work (even if it's not the optimal solution).
To facilitate testing code as I write it, I include verbosity in almost every module I write ...
Don't litter your code with if-statements and prints for this kind of purpose. It makes the code messy, repetitive and less readable.
The use-case is exactly what stdlib logging is for: you can unconditionally log events which describe what the program is doing, at various verbosity levels, and the messages will be displayed - or not - depending on the logging system's configuration.
import logging
log = logging.getLogger(__name__)
def my_func(arg0, arg1):
log.info("doing something to %s and %s...", arg0, arg1)
do_something()
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG, format="%(message)s")
my_func(123, 456)
In the example above, the message will print because it is at level INFO which is above the verbosity level that I've configured logging with (DEBUG). If you configure logging at level WARNING, then it won't display.
Generally the user will control the logging configuration settings (levels, formats, streams, files) via a config file, environment variables, or command-line arguments. It is up to the end-user to choose the specific logging configuration that meets their needs, as the developer you can just log events anytime. No need to worry about where the log events end up going to, or if they end up going anywhere at all.
Another way to do this is levels of logging. For example, Python's builtin logging module has error, warning, info, and debug levels:
import logging
logger = logging.getLogger()
logger.info('Normal message')
logger.debug('Message that only gets printed with high verbosity`)
Simply configure the logging level to debug, warn, etc., and you're basically done! Plus you get lots of native logging goodies.

How do you deal with print() once you done with debugging/coding

To python experts:
I put lots of print() to check the value of my variables.
Once I'm done, I need to delete the print(). It quite time-consuming and prompt to human errors.
Would like to learn how do you guys deal with print(). Do you delete it while coding or delete it at the end? Or there is a method to delete it automatically or you don't use print()to check the variable value?
Use logging instead and here is why.
logging module comes with many simple & handy log methods like debug, info, warning, error, critical
ex:-
logging.debug(<debug stuff>)
EDIT
I am not sure I understood correctly about your need to disable logging once you are done, but if you want to do away with console logging you can try
#logger = logging.getLogger() # this gets the root logger
logger = logging.getLogger('my-logger')
logger.propagate = False
# now if you use logger it will not log to console.
EDIT
I think you want to remove all your print()/loggging statements from your code once you make sure everything is working fine!!!!
Try using a regx to comment out all print statements
find . -name '<filename>.py' -exec sed -ri "s/(^\s*)(print.*$)/#\1\2/g" {} \;
You can use logging with debug level and once the debugging is completed, change the level to info. So any statements with logger.debug() will not be printed.
What I do is put print statements in with with a special text marker in the string. I usually use print("XXX", thething). Then I just search for and delete the line with that string. It's also easier to spot in the output.

how to replace print debug message with the logging module

Up to now, I've been peppering my code with 'print debug message' and even 'if condition: print debug message'. But a number of people have told me that's not the best way to do it, and I really should learn how to use the logging module. After a quick read, it looks as though it does everything I could possibly want, and then some. It looks like a learning project in its own right, and I want to work on other projects now and simply use the minimum functionality to help me. If it makes any difference, I am on python 2.6 and will be for the forseeable future, due to library and legacy compatibilities.
All I want to do at the moment is pepper my code with messages that I can turn on and off section by section, as I manage to debug specific regions. As a 'hello_log_world', I tried this, and it doesn't do what I expected
import logging
# logging.basicConfig(level=logging.DEBUG)
logging.error('first error')
logging.debug('first debug')
logging.basicConfig(level=logging.DEBUG)
logging.error('second error')
logging.debug('second debug')
You'll notice I'm using the really basic config, using as many defaults as possible, to keep things simple. But appears that it's too simple, or that I don't understand the programming model behind logging.
I had expected that sys.stderr would end up with
ERROR:root:first error
ERROR:root:second error
DEBUG:root:second debug
... but only the two error messages appear. Setting level=DEBUG doesn't make the second one appear. If I uncomment the basicConfig call at the start of the program, all four get output.
Am I trying to run it at too simple a level?
What's the simplest thing I can add to what I've written there to get my expected behaviour?
Logging actually follows a particular hierarchy (DEBUG -> INFO -> WARNING -> ERROR -> CRITICAL), and the default level is WARNING. Therefore the reason you see the two ERROR messages is because it is ahead of WARNING on the hierarchy chain.
As for the odd commenting behavior, the explanation is found in the logging docs (which as you say are a task unto themselves :) ):
The call to basicConfig() should come before any calls to debug(),
info() etc. As it’s intended as a one-off simple configuration
facility, only the first call will actually do anything: subsequent
calls are effectively no-ops.
However you can use the setLevel parameter to get what you desire:
import logging
logging.getLogger().setLevel(logging.ERROR)
logging.error('first error')
logging.debug('first debug')
logging.getLogger().setLevel(logging.DEBUG)
logging.error('second error')
logging.debug('second debug')
The lack of an argument to getLogger() means that the root logger is modified. This is essentially one step before #del's (good) answer, where you start getting into multiple loggers, each with their own specific properties/output levels/etc.
Rather than modifying the logging levels in your code to control the output, you should consider creating multiple loggers, and setting the logging level for each one individually. For example:
import logging
first_logger = logging.getLogger('first')
second_logger = logging.getLogger('second')
logging.basicConfig()
first_logger.setLevel(logging.ERROR)
second_logger.setLevel(logging.DEBUG)
first_logger.error('first error')
first_logger.debug('first debug')
second_logger.error('second error')
second_logger.debug('second debug')
This outputs:
ERROR:first:first error
ERROR:second:second error
DEBUG:second:second debug

Switching off debug prints

Sometimes I have a lot of prints scattered around function to print debug output.
To switch this debug outputs I came up with this:
def f(debug=False):
print = __builtins__.print if debug else lambda *p: None
Or if I need to print something apart from debug message, I create dprint function for debug messages.
The problem is, when debug=False, this print statements slow down the code considerably, because lambda *p: None is still called, and function invocation are known to be slow.
So, my question is: Is there any better way to efficiently disable all these debug prints for them not to affect code performance?
All the answers are regarding my not using logging module. This is a good to notice, but this doesn't answer the question how to avoid function invocations that slow down the code considerably - in my case 25 times (if it's possible (for example by tinkering with function code object to through away all the lines with print statements or somehow else)). What these answers suggest is replacing print with logging.debug, which should be even slower. And this question is about getting rid of those function calls completely.
I tried using logging instead of lambda *p: None, and no surprise, code became even slower.
Maybe someone would like to see the code where those prints caused 25 slowdown: http://ideone.com/n5PGu
And I don't have anything against logging module. I think it's a good practice to always stick to robust solutions without some hacks. But I thinks there is nothing criminal if I used those hacks in 20-line one-time code snippet.
Not as a restriction, but as a suggestion, maybe it's possible to delete some lines (e.g. starting with print) from function source code and recompile it? I laid out this approach in the answer below. Though I would like to see some comments on that solution, I welcome other approaches to solving this problem.
You should use the logging module instead. See http://docs.python.org/library/logging.html
Then you can set the log level depending on your needs, and create multiple logger objects, that log about different subjects.
import logging
#set your log level
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a log message')
In your case: you could simply replace your print statement with a log statement, e.g.:
import logging
print = __builtins__.print if debug else logging.debug
now the function will only be print anything if you set the logging level to debug
logging.basicConfig(level=logging.DEBUG)
But as a plus, you can use all other logging features on top! logging.error('error!')
Ned Batchelder wrote in the comment:
I suspect the slow down is in the calculation of the arguments to
your debug function. You should be looking for ways to avoid those
calculations. Preprocessing Python is just a distraction.
And he is right as slowdown is actually caused by formatting string with format method which happens regardless if the resulting string will be logged or not.
So, string formatting should be deferred and dismissed if no logging will occur. This may be achieved by refactoring dprint function or using log.debug in the following way:
log.debug('formatted message: %s', interpolated_value)
If message won't be logged, it won't be formatted, unlike print, where it's always formatted regardless of if it'll be logged or discarded.
The solution on log.debug's postponed formatting gave Martijn Pieters here.
Another solution could be to dynamically edit code of f and delete all drpint calls. But this solution is highly unrecommended to be used:
You are correct, you should never resort to this, there are so many
ways it can go wrong. First, Python is not a language designed for
source-level transformations, and it's hard to write it a transformer
such as comment_1 without gratuitously breaking valid code. Second,
this hack would break in all kinds of circumstances - for example,
when defining methods, when defining nested functions, when used in
Cython, when inspect.getsource fails for whatever reason. Python is
dynamic enough that you really don't need this kind of hack to
customize its behavior.
Here is the code of this approach, for those who like to get acquainted with it:
from __future__ import print_function
DEBUG = False
def dprint(*args,**kwargs):
'''Debug print'''
print(*args,**kwargs)
_blocked = False
def nodebug(name='dprint'):
'''Decorator to remove all functions with name 'name' being a separate expressions'''
def helper(f):
global _blocked
if _blocked:
return f
import inspect, ast, sys
source = inspect.getsource(f)
a = ast.parse(source) #get ast tree of f
class Transformer(ast.NodeTransformer):
'''Will delete all expressions containing 'name' functions at the top level'''
def visit_Expr(self, node): #visit all expressions
try:
if node.value.func.id == name: #if expression consists of function with name a
return None #delete it
except(ValueError):
pass
return node #return node unchanged
transformer = Transformer()
a_new = transformer.visit(a)
f_new_compiled = compile(a_new,'<string>','exec')
env = sys.modules[f.__module__].__dict__
_blocked = True
try:
exec(f_new_compiled,env)
finally:
_blocked = False
return env[f.__name__]
return helper
#nodebug('dprint')
def f():
dprint('f() started')
print('Important output')
dprint('f() ended')
print('Important output2')
f()
More information: Replacing parts of the function code on-the-fly
As a hack, yes, that works. (And there is no chance in hell those lambda no-ops are your app's bottleneck.)
However, you really should be doing logging properly by using the logging module.
See http://docs.python.org/howto/logging.html#logging-basic-tutorial for a basic example of how this should be done.
You definitely need to use the logging module of Python, it's very practical and you can change the log level of your application. Example:
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> logging.debug('Test.')
DEBUG:root:Test.

Categories