The Python's pprint function I use for logging dictionaries is quite slow for larger objects. I cannot find a way to completely turn off the processing of that function using the standard logging library. Run the code below as an example:
logging.disable(50)
logging.log(msg=sum(range(20000000)), level=0)
Even though the result did not show up, the sum was still definitely computed (crank up the number inside of range to see what I mean). Is there a standard way in the logging module to disable the computing completely? If not, other suggestions are also welcomed.
the problem is is that anything that is at the root level (this includes arguments into a function, just not the interior of the function,) is computed as soon as python see's it(ie its computed before python ever calls the log function)... to solve this you can put your logic inside a function ... and then only execute the logic if it is appropriate
def lazy_log(callable_fn,log_name,log_level):
logger = logging.getLogger(log_name)
if logger.getEffectiveLevel() > log_level:
return
logger.log(msg=str(callable_fn()),level=log_level)
I guess might work for you
lazy_log(lambda : sum(range(1000000)), None, 51)
Related
Sometimes I have a lot of prints scattered around function to print debug output.
To switch this debug outputs I came up with this:
def f(debug=False):
print = __builtins__.print if debug else lambda *p: None
Or if I need to print something apart from debug message, I create dprint function for debug messages.
The problem is, when debug=False, this print statements slow down the code considerably, because lambda *p: None is still called, and function invocation are known to be slow.
So, my question is: Is there any better way to efficiently disable all these debug prints for them not to affect code performance?
All the answers are regarding my not using logging module. This is a good to notice, but this doesn't answer the question how to avoid function invocations that slow down the code considerably - in my case 25 times (if it's possible (for example by tinkering with function code object to through away all the lines with print statements or somehow else)). What these answers suggest is replacing print with logging.debug, which should be even slower. And this question is about getting rid of those function calls completely.
I tried using logging instead of lambda *p: None, and no surprise, code became even slower.
Maybe someone would like to see the code where those prints caused 25 slowdown: http://ideone.com/n5PGu
And I don't have anything against logging module. I think it's a good practice to always stick to robust solutions without some hacks. But I thinks there is nothing criminal if I used those hacks in 20-line one-time code snippet.
Not as a restriction, but as a suggestion, maybe it's possible to delete some lines (e.g. starting with print) from function source code and recompile it? I laid out this approach in the answer below. Though I would like to see some comments on that solution, I welcome other approaches to solving this problem.
You should use the logging module instead. See http://docs.python.org/library/logging.html
Then you can set the log level depending on your needs, and create multiple logger objects, that log about different subjects.
import logging
#set your log level
logging.basicConfig(level=logging.DEBUG)
logging.debug('This is a log message')
In your case: you could simply replace your print statement with a log statement, e.g.:
import logging
print = __builtins__.print if debug else logging.debug
now the function will only be print anything if you set the logging level to debug
logging.basicConfig(level=logging.DEBUG)
But as a plus, you can use all other logging features on top! logging.error('error!')
Ned Batchelder wrote in the comment:
I suspect the slow down is in the calculation of the arguments to
your debug function. You should be looking for ways to avoid those
calculations. Preprocessing Python is just a distraction.
And he is right as slowdown is actually caused by formatting string with format method which happens regardless if the resulting string will be logged or not.
So, string formatting should be deferred and dismissed if no logging will occur. This may be achieved by refactoring dprint function or using log.debug in the following way:
log.debug('formatted message: %s', interpolated_value)
If message won't be logged, it won't be formatted, unlike print, where it's always formatted regardless of if it'll be logged or discarded.
The solution on log.debug's postponed formatting gave Martijn Pieters here.
Another solution could be to dynamically edit code of f and delete all drpint calls. But this solution is highly unrecommended to be used:
You are correct, you should never resort to this, there are so many
ways it can go wrong. First, Python is not a language designed for
source-level transformations, and it's hard to write it a transformer
such as comment_1 without gratuitously breaking valid code. Second,
this hack would break in all kinds of circumstances - for example,
when defining methods, when defining nested functions, when used in
Cython, when inspect.getsource fails for whatever reason. Python is
dynamic enough that you really don't need this kind of hack to
customize its behavior.
Here is the code of this approach, for those who like to get acquainted with it:
from __future__ import print_function
DEBUG = False
def dprint(*args,**kwargs):
'''Debug print'''
print(*args,**kwargs)
_blocked = False
def nodebug(name='dprint'):
'''Decorator to remove all functions with name 'name' being a separate expressions'''
def helper(f):
global _blocked
if _blocked:
return f
import inspect, ast, sys
source = inspect.getsource(f)
a = ast.parse(source) #get ast tree of f
class Transformer(ast.NodeTransformer):
'''Will delete all expressions containing 'name' functions at the top level'''
def visit_Expr(self, node): #visit all expressions
try:
if node.value.func.id == name: #if expression consists of function with name a
return None #delete it
except(ValueError):
pass
return node #return node unchanged
transformer = Transformer()
a_new = transformer.visit(a)
f_new_compiled = compile(a_new,'<string>','exec')
env = sys.modules[f.__module__].__dict__
_blocked = True
try:
exec(f_new_compiled,env)
finally:
_blocked = False
return env[f.__name__]
return helper
#nodebug('dprint')
def f():
dprint('f() started')
print('Important output')
dprint('f() ended')
print('Important output2')
f()
More information: Replacing parts of the function code on-the-fly
As a hack, yes, that works. (And there is no chance in hell those lambda no-ops are your app's bottleneck.)
However, you really should be doing logging properly by using the logging module.
See http://docs.python.org/howto/logging.html#logging-basic-tutorial for a basic example of how this should be done.
You definitely need to use the logging module of Python, it's very practical and you can change the log level of your application. Example:
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> logging.debug('Test.')
DEBUG:root:Test.
Let's say I have two python scripts A.py and B.py. I'm looking for a way to run B from within A in such a way that:
B believes it is __main__ (so that code in an if __name__=="__main__" block in B will run)
B is not actually __main__ (so that it does not, e.g., overwrite the "__main__" entry in sys.modules)
Exceptions raised within B propagate to A (i.e., could be caught with an except clause in A).
Those exceptions, if not caught, generate a correct traceback referencing line numbers within B.
I've tried various techniques, but none seem to satisfy all my requirements.
using tools from the subprocess module means exceptions in B do not propagate to A.
execfile("B.py", {}) runs B, but it doesn't think it's main.
execfile("B.py", {'__name__': '__main__'}) makes B.py think it's main, but it also seems to screw up the exception traceback printing, so that the tracebacks refer to lines within A (i.e., the real __main__).
using imp.load_source with __main__ as the name almost works, except that it actually modifies sys.modules, thus stomping on the existing value of __main__
Is there any way to get what I want?
(The reason I'm doing this is because I'm doing some cleanup on an existing library. This library has no real test suite, just a set of "example" scripts that produce certain output. I'm trying to leverage these as tests to ensure that my cleanup doesn't affect the library's ability to execute these examples, so I want to run each example script from within my test suite. I'd like to be able to see exceptions from these scripts within the test script so the test script can report the type of failure, instead of just reporting a generic SubprocessError whenever an example script raises some exception.)
Your use case makes sense, but I still think you'd be better off refactoring the tests such that they can be run externally.
Do you test scripts have something like this?
def test():
pass
if __name__ == '__main__':
test()
If not, perhaps you should convert your tests to calling a function such as test. Then, from your main test script, you can just:
import test1
test1.test()
import test2
test2.test()
Provide a common interface to running tests, that the tests themselves use. Having a big block of code in a __main__ check is Not A Good Thing.
Sorry that I didn't answer the question you asked, but I feel this is the correct solution without deviating too far from the original test code.
Answering my own question because the result is kind of interesting and might be useful to others:
It turns out I was wrong: execfile("B.py", {'__name__': '__main__'} is the way to go after all. It does correctly produce the tracebacks. What I was seeing with incorrect line numbers weren't exceptions but warnings. These warnings were produced using warnings.warn("blah", stacklevel=2). The stacklevel=2 argument is supposed to allow for things like deprecation warnings to be raised where the deprecated thing is used, rather than at the warning call (see the documentation).
However, it seems that the execfile-d file doesn't count as a "stack level" for this purpose, and is invisible for stacklevel purposes. So if code at the top level of an execfile-d module causes a warning with stacklevel 2, the warning is not raised at the right line number in the execfile-d source file; instead it is raised at the corresponding line number of the file which is running the execfile.
This is unfortunate, but I can live with it, since these are only warnings, so they won't impact the actual performance of the tests. (I didn't notice at first that it was only the warnings that were affected by the line-number mismatches, because there were lots of warnings and exceptions intermixed in the test output.)
Is there a way to view a list of all of my variables in python while the program is running without setting breakpoints? Printing is too messy because I have a lot of variables that are constantly changing.
Thanks
Maybe inspect helps you, but you have to filter the information then.
Usage like:
> import inspect
> a = 5
> f = inspect.currentframe()
> print f.f_locals
...
...
'a': 5
...
Maybe it's worth to mention that you cannot iterate over the resulting dictionary in a for loop because asignment to a variable would change that dictionary. You have to iterate only over the keys (at least that's what I just found out).
Example:
for v in f.f_locals.keys():
if not v.startswith("_"):
print v
Look at the first line: simply writing for v in f.f_locals would not succeed.
If you are running on Pydev (python extension for eclipse), you can easily watch your variables, however you'll need to set a breakpoint initially, then only step into/over your code.
Grab a debugger: http://winpdb.org/ and then you can watch them and see them in there; however if you're just looking for a function to print all the variables defined in the local scope, just calls locals().
If your question is referring to python as a language (as opposed to "python as an ecosystem of applications and utilities"), AFAIK, the answer is "no" (not out-of-the-box anyhow). Of course there are a number of IDE's that allow you to use debug to see the values of variables [see Sumit answer, for example].
However if during development you wanted a "live monitor" of a number of variables, you could use the logging module, and define your own Handler class so as to redirect the information "live" somewhere. Setting the message to log.debug level would make trivial to activate/deactivate this feature by simply changing the minimum log level of your application.
HTH!
I have been coding a lot in Python of late. And I have been working with data that I haven't worked with before, using formulae never seen before and dealing with huge files. All this made me write a lot of print statements to verify if it's all going right and identify the points of failure. But, generally, outputting so much information is not a good practice. How do I use the print statements only when I want to debug and let them be skipped when I don't want them to be printed?
The logging module has everything you could want. It may seem excessive at first, but only use the parts you need. I'd recommend using logging.basicConfig to toggle the logging level to stderr and the simple log methods, debug, info, warning, error and critical.
import logging, sys
logging.basicConfig(stream=sys.stderr, level=logging.DEBUG)
logging.debug('A debug message!')
logging.info('We processed %d records', len(processed_records))
A simple way to do this is to call a logging function:
DEBUG = True
def log(s):
if DEBUG:
print s
log("hello world")
Then you can change the value of DEBUG and run your code with or without logging.
The standard logging module has a more elaborate mechanism for this.
Use the logging built-in library module instead of printing.
You create a Logger object (say logger), and then after that, whenever you insert a debug print, you just put:
logger.debug("Some string")
You can use logger.setLevel at the start of the program to set the output level. If you set it to DEBUG, it will print all the debugs. Set it to INFO or higher and immediately all of the debugs will disappear.
You can also use it to log more serious things, at different levels (INFO, WARNING and ERROR).
First off, I will second the nomination of python's logging framework. Be a little careful about how you use it, however. Specifically: let the logging framework expand your variables, don't do it yourself. For instance, instead of:
logging.debug("datastructure: %r" % complex_dict_structure)
make sure you do:
logging.debug("datastructure: %r", complex_dict_structure)
because while they look similar, the first version incurs the repr() cost even if it's disabled. The second version avoid this. Similarly, if you roll your own, I'd suggest something like:
def debug_stdout(sfunc):
print(sfunc())
debug = debug_stdout
called via:
debug(lambda: "datastructure: %r" % complex_dict_structure)
which will, again, avoid the overhead if you disable it by doing:
def debug_noop(*args, **kwargs):
pass
debug = debug_noop
The overhead of computing those strings probably doesn't matter unless they're either 1) expensive to compute or 2) the debug statement is in the middle of, say, an n^3 loop or something. Not that I would know anything about that.
I don't know about others, but I was used to define a "global constant" (DEBUG) and then a global function (debug(msg)) that would print msg only if DEBUG == True.
Then I write my debug statements like:
debug('My value: %d' % value)
...then I pick up unit testing and never did this again! :)
A better way to debug the code is, by using module clrprint
It prints a color full output only when pass parameter debug=True
from clrprint import *
clrprint('ERROR:', information,clr=['r','y'], debug=True)
I love being able to modify the arguments the get sent to a function, using settrace, like :
import sys
def trace_func(frame,event,arg):
value = frame.f_locals["a"]
if value % 2 == 0:
value += 1
frame.f_locals["a"] = value
def f(a):
print a
if __name__ == "__main__":
sys.settrace(trace_func)
for i in range(0,5):
f(i)
And this will print:
1
1
3
3
5
What other cool stuff can you do using settrace?
I would strongly recommend against abusing settrace. I'm assuming you understand this stuff, but others coming along later may not. There are a few reasons:
Settrace is a very blunt tool. The OP's example is a simple one, but there's practically no way to extend it for use in a real system.
It's mysterious. Anyone coming to look at your code would be completely stumped why it was doing what it was doing.
It's slow. Invoking a Python function for every line of Python executed is going to slow down your program by many multiples.
It's usually unnecessary. The original example here could have been accomplished in a few other ways (modify the function, wrap the function in a decorator, call it via another function, etc), any of which would have been better than settrace.
It's hard to get right. In the original example, if you had not called f directly, but instead called g which called f, your trace function wouldn't have done its job, because you returned None from the trace function, so it's only invoked once and then forgotten.
It will keep other tools from working. This program will not be debuggable (because debuggers use settrace), it will not be traceable, it will not be possible to measure its code coverage, etc. Part of this is due to lack of foresight on the part of the Python implementors: they gave us settrace but no gettrace, so it's difficult to have two trace functions that work together.
Trace functions make for cool hacks. It's fun to be able to abuse it, but please don't use it for real stuff. If I sound hectoring, I apologize, but this has been done in real code, and it's a pain. For example, DecoratorTools uses a trace function to perform the magic feat of making this syntax work in Python 2.3:
# Method decorator example
from peak.util.decorators import decorate
class Demo1(object):
decorate(classmethod) # equivalent to #classmethod
def example(cls):
print "hello from", cls
A neat hack, but unfortunately, it meant that any code that used DecoratorTools wouldn't work with coverage.py (or debuggers, I guess). Not a good tradeoff if you ask me. I changed coverage.py to provide a mode that lets it work with DecoratorTools, but I wish I hadn't had to.
Even code in the standard library sometimes gets this stuff wrong. Pyexpat decided to be different than every other extension module, and invoke the trace function as if it were Python code. Too bad they did a bad job of it.
</rant>
I made a module called pycallgraph which generates call graphs using sys.settrace().
Of course, code coverage is accomplished with the trace function. One cool thing we haven't had before is branch coverage measurement, and that's coming along nicely, about to be released in an alpha version of coverage.py.
So for example, consider this function:
def foo(x):
if x:
y = 10
return y
if you test it with this call:
assert foo(1) == 10
then statement coverage will tell you that all the lines of the function were executed. But of course, there's a simple problem in that function: calling it with 0 raises a UnboundLocalError.
Branch measurement would tell you that there's a branch in the code that isn't fully exercised, because only one leg of the branch is ever taken.
For example, get the memory consumption of Python code line-by-line: http://pypi.python.org/pypi/memory_profiler
One latest project that uses settrace heavily is PySnooper
It helps new programmers to trace/log/monitor their program output. Cheers!
I don't have an exhaustively comprehensive answer but one thing I did with it, with the help of another user on SO, was create a program that generates the trace tables of other Python programs.
The python debugger Pdb uses sys.settrace to analyse lines to debug.
Here's an c optimization/extension for pdb that also uses sys.settrace
https://bitbucket.org/jagguli/cpdb