What is the difference between DEBUG and INFO in Python logging?

What is the difference between DEBUG and INFO in Python logging? - python

When using the logging library, when should I log using DEBUG, and when should I use INFO instead? All I know is that they are used to show what a program is doing during normal operation.

You can set up to only show logs of certain level.
DEBUG and INFO are two levels, info being a more neutral one, used for non-essential stuff and debug being the one that you might use for displaying stuff that might help you debug something.
It is up to you what you use each level for, and what levels you might want to see in your logs. If you disable a level, it will simply not be shown in logs.
Logging has 5 levels, and you can set the levels you need via setLevel() function. See here: https://docs.python.org/3/library/logging.html

There are no predetermined roles other than DEBUG being a higher verbosity level than INFO.
Their names imply that INFO is supposed to report on a program's progress while DEBUG is to report info for diagnosing problems.
The key thing to watch for when choosing which level to use for a specific message is to make each level give a full picture of what is going on, with the corresponding level of detail. See How to debug a Python program running as a service? for details.
E.g. in one of my programs that utilized a user-provided script to do tasks, I used:
INFO -- progress on tasks
VERBOSE (custom level with ID 15) -- info for diagnosing problems in the user script
DEBUG -- info for diagnosing problems in the program itself

If you view your log messages as part of your application's user interface, INFO messages are for consumption by the administrators or users, whereas debug messages are for consumption by its programmers. Messages should be designed and emitted with this in mind.

Related

Separate logging for task queues

I'm using a task queue with python (RQ). Since workers run concurrently, without any configuration messages from all workers are mixed up.
I want to organize logging such that at any time I can get the exact full log for a given task run by a given worker. Workers run on different machines, so preferably logs would be sent over the network to some central collector, but to get started, I'd also be happy with local logging to file, as long as the messages of each task end up in a separate log file.
My question has two parts:
how to implement this in python code. I suppose that, for the "log to file" case, I could do something like this at the beginning of each task function:
logging.basicConfig(filename="some_unique_id_for_this_task_and_worker.log", level=logging.DEBUG, format="whatever")
logging.debug("My message...")
# etc.
but when it comes to logging over the network, I'm struggling to understand how the logger should be configured so that all log messages from the same task are recognizable at the collector. This is purposely vague because I haven't chosen a given technology or protocol to do this collection yet, and I'm looking for suggestions.
Assuming that the previous requirement can be accomplished, when logging over the network, what's a centralized solution that can give me the full log for a given task? I mean really showing me the full text log, not using a search interface returning events or lines (as eg, IIRC, in splunk or elasticsearch).
Thanks

Since you're running multiple processes (the RQ workers) you could probably use one of the recipes in the logging cookbook. If you want to use a SocketHandler and a socket server to receive and send messages to a file, you should also look at this recipe in the cookbook. It has a part related to running a socket listener in production.

Doesn't log_level parameter in python logging module affect performance?

I am using an API, to get some service from my project. The API call is taking too long, so I thought one of reasons could be lots and lots of logs that I have put across the project and the IO reads/writes are taking time.
I am using logging. My guess was as a LOG_LEVEL discard logs of lower priority, with higher priorities the API call should be completed in less time. But the time is almost same(difference being in the range of 1/10th of seconds).
The only reference regarding LOG_LEVEL and performance I got from here is
The beauty of this is that if you set the log level to WARN, info and debug messages have next to no performance impact.
Some points I should note here
I have not configured my logs to stream to any log service, like Kibana.
I have checked with this kind of situations, I am not doing any prepossessing in log message.
I have done basic Logger initialization,i.e,
import logging
logger = logging.getLogger(__name__)
and not used any file to write logs into as follows. LOG_LEVEL is given as one of the environment variable.
logging.basicConfig(filename="file_name.log")
Considering every other thing is optimal(if also everything is not optimal, then too higher priority logs should take less time), am I wrong in my guess of more time because of log read/writes? If no, then why use of high priority LOG_LEVEL flags are not decreasing the time?
In which default location, logging module store the logs?

What's the difference between log level performances?
Setting the log level can effect performance but may not be very noticeable until at scale.
When you set the level, you're creating a way to stop the logging process from continuing, and very little happens before this is checked with any individual log. For example, here is what CRITICAL logs look like in the code:
if self.isEnabledFor(CRITICAL):
self._log(CRITICAL, msg, args, **kwargs)
The logger itself has much more to do as part of _log than just this check so there would be time gains by setting a log level. But, it is fairly optimized so at the point you have initiated a logger at all, unless the difference in the amount of calls is quite large you probably won't be able to notice it much.
If you removed any reference to the logging instead of just setting level, you would get more performance gains because that check is not happening at all (which obviously takes some amount of time).
Where are logs stored by default?
By default, without setting a File, StreamHandler [source] is enabled and without specifying a specific stream, it will stream to sys.stderr. When you set a file, it creates a FileHandler which inherits from the same functions as StreamHandlers [source].
How do I optimize?
For the question you didn't ask, which is How do I speed up logging? I would suggest looking at this which gives some advice. Part of that advice is what I pointed out above but telling you to explicitly check your log level, and you can even cache that result and check the cache instead which should reduce time even further.
Check out this answer for even more on optimizing logging.
And finally, if you want to determine the speed issues with your code, whether it is from logging or not, you need to use a profiler. There are built in profiling functions in Python, check here.

One log level isn't more performant than another, however, if a level is enabled for logging, loggers are nested (In your example, this would happen if __name__ had dots in it like mypackage.core.logs), and the version of Python you are running can. This is because three things happen when you make a logging call:
The logger determines if the logging level is enabled.
This will happen for every call. In versions of Python before 3.7, this call was not cached and nested loggers took longer to determine if they were enabled or not. How much longer? In some benchmarks it was twice as much time. That said, this is heavily dependent on log nesting and even when logging millions of messages, this may only save a few seconds of system time.
The logger processes the record.
This is where the optimizations outlined in the documentation come into play. They allow the record creation to skip some steps.
The logger send the record to the handler.
This maybe the default, StreamHandler, the FileHandler, the SysLogHandler, or any number of build-in or custom handlers. In your example, you are using the FileHandler to write to file_name.log in the current directory. This may be fine for smaller applications, larger applications would benefit from using an external logger like syslog or the systemd journal. The main reason for this is because these operate in a separate process and are optimized for processing a large number of logs.

Why get a new logger object in each new module?

The python logging module has a common pattern (ex1, ex2) where in each module you get a new logger object for each python module.
I'm not a fan of blindly following patterns and so I would like to understand a little bit more.
Why get a new logger object in each new module?
Why not have everyone just use the same root logger and configure the formatter with %(module)s?
Is there examples where this pattern is NECESSARY/NEEDED (i.e. because of some sort of performance reason[1])?
[1]
In a multi-threaded python program, is there some sort of hidden synchronization issues that is fixed by using multiple logging objects?

Each logger can be configured separately. Generally, a module logger is not configured at all in the module itself. You create a distinct logger and use it to log messages of varying levels of detail. Whoever uses the logger decides what level of messages to see, where to send those messages, and even how to display them. They may want everything (DEBUG and up) from one module logged to a file, while another module they may only care if a serious error occurs (in which case they want it e-mailed directly to them). If every module used the same (root) logger, you wouldn't have that kind of flexibility.

The logger name defines where (logically) in your application events occur. Hence, the recommended pattern
logger = logging.getLogger(__name__)
uses logger names which track the Python package hierarchy. This in turn allows whoever is configuring logging to turn verbosity up or down for specific loggers. If everything just used the root logger, one couldn't get fine grained control of verbosity, which is important when systems reach a certain size / complexity.
The logger names don't need to exactly track the package names - you could have multiple loggers in certain packages, for example. The main deciding factor is how much flexibility is needed (if you're writing an application) and perhaps also how much flexibility your users need (if you're writing a library).

Python logs throttling

Currently I have a couple of python written modules that constantly log something to files. My problem is that when something happens (consider it as a disaster for all the modules, lack of connections with some services or something else) all of them start to log same or very similar (within a module) messages flooding my logs gatherer (which follows to other issues).
Is there any python logger which includes something like reducing the frequency of providing the same or similar logs (it would be nice if got smaller and smaller till reducing the number of such logs)?
Or is it possible to write such a logger by myself? I've been searching a lot but unfortunately did not find any suitable implementation/solution/idea.

How to do logging.DEBUG in django code?

In django we have settings.py that defines the DEBUG for the whole project.
Now,
My have debug level is independently configured in settings.py.
How should I use logging.DEBUG ?
Way1:
if settings.DEBUG:
logging.debug("Debug message")
Way2:
# Without checking settings.DEBUG
logging.debug("Debug message")
What is a good practice ?
I think we should use Way2 since logging level already decides - if the message will be logged or not.
But, some say that Way1 is a standard practice.

I think it's not a good thing to rely too much on a global setting such as DEBUG, which changes the whole behavior of your app.
What if you want to audit code and log stuff in production ? You're not going to turn DEBUG to true to do this, are you ? You'd rather tone down your log filter.
On a more stylistic point of view, it makes little sense and is not very pythonistic to have 2 settings (DEBUG and log level) affect a single behavior.
Long answer short: my opinion is that method 2 is superior, technically and stylisticly speaking.

The second method is fine, and in fact i use it all the time.
The only reason I am putting an answer here is because we in-fact did something like (1) in a work project a few years back, it turned out that although we were not logging anything at debug level in production to a file the cost of creating the debug message was in itself quite expensive and impacting performance.
i.e.
(1) In production the debug level message is not created at all, just a boolean check instead.
(2) In production the debug messages are created and propagated but just not logged into a file (well if that is in fact how you have setup your logging).
The project was a pretty big calculation farm where every ounce of performance mattered, this hasn't been the case for me ever since and might not be the case for you, but hey... i just thought i would mention it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.