Python disabled logging slowing script - python

I am using the built in Python "logging" module for my script. When I turn verbosity to "info" it seems like my "debug" messages are significantly slowing down my script.
Some of my "debug" messages print large dictionaries and I'm guessing Python is expanding the text before realizing "debug" messages are disabled. Example:
import pprint
pp = pprint.PrettyPrinter(indent=4)
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
How can I improve my performance? I'd prefer to still use Python's built in logging module. But need to figure out a "clean" way to solve this issue.

There is already a feature of logging for the feature mentioned by dankal444, which is slightly neater:
if logger.isEnabledFor(logging.DEBUG):
logger.debug(f"Large Dict Object: {pp.pformat(obj)}")
Another possible approach is to use %-formatting, which only does the formatting when actually needed (the logging event has to be processed by a handler as well as a logger to get to that point). I know f-strings are the new(ish) hotness and are performant, but it all depends on the exact circumstances as to which will offer the best result.
An example of taking advantage of lazy %-formatting:
class DeferredFormatHelper:
def __init__(self, func, *args, *kwargs):
self.func = func # assumed to return a string
self.args = args
self.kwargs = kwargs
def __str__(self):
# This is called because the format string contains
# a %s for this logging argument, lazily if and when
# the formatting needs to happen
return self.func(*self.args, **self.kwargs)
if logger.isEnabledFor(logging.DEBUG):
arg = DeferredFormatHelper(pp.pformat, obj)
logger.debug('Large Dict Object: %s', arg)

Check if the current level is good enough:
if logger.getEffectiveLevel() <= logging.DEBUG:
logger.debug(f"Large Dict Object: {pp.pformact(obj)}")
This is not super clean but best that I can think of. You just need to encompass with this if performance bottlenecks

I can't verify where your bottleneck is, but if it's because of the pprint library, your logger will never have a chance to do anything about it. Rewriting to clarify.
from pprint import PrettyPrinter
import logging
logger = logging.getLogger()
large_object = {"very": "large container"}
pp = PrettyPrinter(indent=4)
# This is done first.
formatted_msg = pp.pformat(large_object)
# It's already formatted when it's sent to your logger.
logger.debug(f"Large dict object: {formatted_msg}")

Related

Declare python module in yaml

I have a yaml file which has some fields with values that are understandable in python, but they get parsed as string values, not that python type I meant. This is my sample:
verbose:
level: logging.DEBUG
and obviously when I load it, the value is string type
config = yaml.load(args.config.read(), Loader=yaml.SafeLoader)
I have no idea how to get exactly logging.DEBUG object, not its string.
Note that I don't look for configuring logging to get logger thing. This logging is just a sample of python module.
There's no out of the box way for that. The simplest and safest way seems to be processing the values manually, e.g:
import logging
class KnownModules:
logging = logging
...
def parse_value(s):
v = KnownModules
for p in s.split('.'):
v = getattr(v, p) # remember to handle AttributeError
return v
However, if you're ok with slightly changing your YAML structure, PyYAML supports some custom YAML tags. For example:
verbose:
level: !!python/name:logging.DEBUG
will make config['verbose']['level'] equal to logging.DEBUG (i.e. 10).
Considering that you're (correctly) using SafeLoader, you may need to combine those methods by defining your own tag.
The YAML loader has no knowledge of what logging.DEBUG might mean except a string "logging.DEBUG" (unless it's tagged with a YAML tag).
For string values that need to be interpreted as e.g. references to module attributes, you will need to parse them after-the-fact, e.g.
def parse_logging_level(level_string: str):
module, _, value = level_string.partition(".")
assert module == "logging"
return logging._nameToLevel[value]
# ...
yaml_data["verbose"]["level"] = parse_logging_level(yaml_data["verbose"]["level"])
Edit: Please see AKX answer. I was not aware of logging._nameToLevel which does not require defining your own enum and is definitely better than using evel. But, I decided to not delete this answer as I think the current preferred design (as of python 3.4) which uses enums is worth mentioning (it would probably be used in the logging module if it was available back then).
If you are absolutely sure that the values provided in the config are legitimate ones, you can use eval like this:
import logging
levelStr = 'logging.DEBUG'
level = eval(levelStr)
But as said in the comments, if you are not sure about the values present in the config file, using eval could be disasterous (see the example provided by AKX in the comments).
A better design is to define an enum for this purpose. Unfortunately the logging module does not provide the levels as enum (they are just constants defined in the module), thus you should define your own.
from enum import Enum
class LogLevel(Enum):
CRITICAL = 50
FATAL = 50
ERROR = 40
WARNING = 30
WARN = 30
INFO = 20
DEBUG = 10
NOTSET = 0
and then you can use it like this:
levelStr = 'DEBUG'
levelInt = LogLevel[levelStr].value # Comparable with logging.DEBUG which is also an integer
But to use this you have to change your yml file a bit and replace logging.DEBUG with DEBUG.

Formatting the message in Python logging

I would like to understand if it's possible and how is it possible, to modify the message part of a log message, using the Python logging module.
So basically, you can format a complete log as:
format = '{"timestamp": "%(asctime)s", "logger_level": "%(levelname)s", "log_message": %(message)s}'
However, I would like to make sure the message part is always in json format. Is there any way I can modify the format of only the message part, maybe with a custom logging.Formatter?
Thank you.
The format specification %(message)s tells Python you want to format a string. Try it with %(message)r and it should do the job:
>>> logging.error('{"log_message": %r}', {"a": 55})
ERROR:root:{"log_message": {'a': 55}}
There's an example in the Logging Cookbook which shows one way of doing this.Basically:
import json
import logging
class StructuredMessage:
def __init__(self, message, /, **kwargs):
self.message = message
self.kwargs = kwargs
def __str__(self):
return '%s >>> %s' % (self.message, json.dumps(self.kwargs))
_ = StructuredMessage # optional, to improve readability
logging.basicConfig(level=logging.INFO, format='%(message)s')
logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))
Of course, you can adapt this basic idea to do something closer to what you want/need.
Update: The formatting only happens if the message is actually output. Also, it won't apply to logging from third-party libraries. You would need to subclass Logger before importing any other modules which import logging to achieve that, but it's a documented approach.

Python: Where to put logging.getLogger

I tried to put getLogger in the module level. However, it has some disadvantages. Here is my example:
from logging.handlers import TimeRotatingFileHandler
log_monitor = logging.getLogger('monitorlog')
log_monitor.setLevel(logging.DEBUG)
log_monitor.propagate = False
handler = TimedRotatingFileHandler('somedirectory/monitor.log',
when='h',
interval=1,
backupCount=30)
monitor_format = logging.Formatter('%(asctime)s: %(message)s')
handler.setFormatter(monitor_format)
log_monitor.addHandler(handler)
The problem is, when some other module import this one, the above code will be executed. It is possible, at that time, the somedirectory does not even exist and the build will fail.
Actually, this logger will be used in a class, so I am thinking of putting getLogger into the class. However, I feel if people create multiple object of that class, that piece of code will be called multiple times. I guess this part of code is supposed only to be called once.
I am pretty new to python. Where do people usually put their getLogger and in this case, where should I put this piece of code?
Short answer, you just need to make sure you do your logger set up after the directory is created.
If you want to import the above and only then create the file, one way to do it is to put your code in a function.
def monitor_log_setup():
log_monitor = logging.getLogger('monitorlog')
log_monitor.setLevel(logging.DEBUG)
log_monitor.propagate = False
handler = TimedRotatingFileHandler('somedirectory/monitor.log',
when='h',
interval=1,
backupCount=30)
monitor_format = logging.Formatter('%(asctime)s: %(message)s')
handler.setFormatter(monitor_format)
log_monitor.addHandler(handler)
return log_monitor
It is now safe to import this module, you just have to make sure the function is called before you want to start logging (after creating the directory).
You can then use logging.getLogger('monitorlog') to return the same logger as defined in the function whenever you need it throughout your code.
I think the problem is that you are trying to mix up the "import" with "init", you expect the API caller after import the library or module, the log object is available, this behaviour will leads to confusing understanding.
I think the best practice is to provide an "init" method, the caller call "init" method, make the object available.
However, you could also provide an implicit init way in the file, or just create the log file if it doesn't exist.

Most Pythonic way to provide function metadata at compile time?

I am building a very basic platform in the form of a Python 2.7 module. This module has a read-eval-print loop where entered user commands are mapped to function calls. Since I am trying to make it easy to build plugin modules for my platform, the function calls will be from my Main module to an arbitrary plugin module. I'd like a plugin builder to be able to specify the command that he wants to trigger his function, so I've been looking for a Pythonic way to remotely enter a mapping in the command->function dict in the Main module from the plugin module.
I've looked at several things:
Method name parsing: the Main module would import the plugin module
and scan it for method names that match a certain format. For
example, it might add the download_file_command(file) method to its
dict as "download file" -> download_file_command. However, getting a
concise, easy-to-type command name (say, "dl") requires that the
function's name also be short, which isn't good for code
readability. It also requires the plugin developer to conform to a
precise naming format.
Cross-module decorators: decorators would let
the plugin developer name his function whatever he wants and simply
add something like #Main.register("dl"), but they would necessarily
require that I both modify another module's namespace and keep
global state in the Main module. I understand this is very bad.
Same-module decorators: using the same logic as above, I could add a
decorator that adds the function's name to some command name->function mapping local to the
plugin module and retrieve the mapping to the Main module with an
API call. This requires that certain methods always be present or
inherited though, and - if my understanding of decorators is correct - the function will only register itself the first time it is run and will unnecessarily re-register itself every subsequent time
thereafter.
Thus, what I really need is a Pythonic way to annotate a function with the command name that should trigger it, and that way can't be the function's name. I need to be able to extract the command name->function mapping when I import the module, and any less work on the plugin developer's side is a big plus.
Thanks for the help, and my apologies if there are any flaws in my Python understanding; I'm relatively new to the language.
Building or Standing on the first part of #ericstalbot's answer, you might find it convenient to use a decorator like the following.
################################################################################
import functools
def register(command_name):
def wrapped(fn):
#functools.wraps(fn)
def wrapped_f(*args, **kwargs):
return fn(*args, **kwargs)
wrapped_f.__doc__ += "(command=%s)" % command_name
wrapped_f.command_name = command_name
return wrapped_f
return wrapped
################################################################################
#register('cp')
def copy_all_the_files(*args, **kwargs):
"""Copy many files."""
print "copy_all_the_files:", args, kwargs
################################################################################
print "Command Name: ", copy_all_the_files.command_name
print "Docstring : ", copy_all_the_files.__doc__
copy_all_the_files("a", "b", keep=True)
Output when run:
Command Name: cp
Docstring : Copy many files.(command=cp)
copy_all_the_files: ('a', 'b') {'keep': True}
User-defined functions can have arbitrary attributes. So you could specify that plug-in functions have an attribute with a certain name. For example:
def a():
return 1
a.command_name = 'get_one'
Then, in your module you could build a mapping like this:
import inspect #from standard library
import plugin
mapping = {}
for v in plugin.__dict__.itervalues():
if inspect.isfunction(v) and v.hasattr('command_name'):
mapping[v.command_name] = v
To read about arbitrary attributes for user-defined functions see the docs
There are two parts in a plugin system:
Discover plugins
Trigger some code execution in a plugin
The proposed solutions in your question address only the second part.
There many ways to implement both depending on your requirements e.g., to enable plugins, they could be specified in a configuration file for your application:
plugins = some_package.plugin_for_your_app
another_plugin_module
# ...
To implement loading of the plugin modules:
plugins = [importlib.import_module(name) for name in config.get("plugins")]
To get a dictionary: command name -> function:
commands = {name: func
for plugin in plugins
for name, func in plugin.get_commands().items()}
Plugin author can use any method to implement get_commands() e.g., using prefixes or decorators — your main application shouldn't care as long as get_commands() returns the command dictionary for each plugin.
For example, some_plugin.py (full source):
def f(a, b):
return a + b
def get_commands():
return {"add": f, "multiply": lambda x,y: x*y}
It defines two commands add, multiply.

Easy Python ASync. Precompiler?

imagine you have an io heavy function like this:
def getMd5Sum(path):
with open(path) as f:
return md5(f.read()).hexdigest()
Do you think Python is flexible enough to allow code like this (notice the $):
def someGuiCallback(filebutton):
...
path = filebutton.getPath()
md5sum = $getMd5Sum()
showNotification("Md5Sum of file: %s" % md5sum)
...
To be executed something like this:
def someGuiCallback_1(filebutton):
...
path = filebutton.getPath()
Thread(target=someGuiCallback_2, args=(path,)).start()
def someGuiCallback_2(path):
md5sum = getMd5Sum(path)
glib.idle_add(someGuiCallback_3, md5sum)
def someGuiCallback_3(md5sum):
showNotification("Md5Sum of file: %s" % md5sum)
...
(glib.idle_add just pushes a function onto the queue of the main thread)
I've thought about using decoraters, but they don't allow me to access the 'content' of the function after the call. (the showNotification part)
I guess I could write a 'compiler' to change the code before execution, but it doesn't seam like the optimal solution.
Do you have any ideas, on how to do something like the above?
You can use import hooks to achieve this goal...
PEP 302 - New Import Hooks
PEP 369 - Post Import Hooks
... but I'd personally view it as a little bit nasty.
If you want to go down that route though, essentially what you'd be doing is this:
You add an import hook for an extension (eg ".thpy")
That import hook is then responsible for (essentially) passing some valid code as a result of the import.
That valid code is given arguments that effectively relate to the file you're importing.
That means your precompiler can perform whatever transformations you like to the source on the way in.
On the downside:
Whilst using import hooks in this way will work, it will surprise the life out of any maintainer or your code. (Bad idea IMO)
The way you do this relies upon imputil - which has been removed in python 3.0, which means your code written this way has a limited lifetime.
Personally I wouldn't go there, but if you do, there's an issue of the Python Magazine where doing this sort of thing is covered in some detail, and I'd advise getting a back issue of that to read up on it. (Written by Paul McGuire, April 2009 issue, probably available as PDF).
Specifically that uses imputil and pyparsing as it's example, but the principle is the same.
How about something like this:
def performAsync(asyncFunc, notifyFunc):
def threadProc():
retValue = asyncFunc()
glib.idle_add(notifyFunc, retValue)
Thread(target=threadProc).start()
def someGuiCallback(filebutton):
path = filebutton.getPath()
performAsync(
lambda: getMd5Sum(path),
lambda md5sum: showNotification("Md5Sum of file: %s" % md5sum)
)
A bit ugly with the lambdas, but it's simple and probably more readable than using precompiler tricks.
Sure you can access function code (already compiled) from decorator, disassemble and hack it. You can even access the source of module it's defined in and recompile it. But I think this is not necessary. Below is an example using decorated generator, where yield statement serves as a delimiter between synchronous and asynchronous parts:
from threading import Thread
import hashlib
def async(gen):
def func(*args, **kwargs):
it = gen(*args, **kwargs)
result = it.next()
Thread(target=lambda: list(it)).start()
return result
return func
#async
def test(text):
# synchronous part (empty in this example)
yield # Use "yield value" if you need to return meaningful value
# asynchronous part[s]
digest = hashlib.md5(text).hexdigest()
print digest

Categories