Why get a new logger object in each new module? - python

The python logging module has a common pattern (ex1, ex2) where in each module you get a new logger object for each python module.
I'm not a fan of blindly following patterns and so I would like to understand a little bit more.
Why get a new logger object in each new module?
Why not have everyone just use the same root logger and configure the formatter with %(module)s?
Is there examples where this pattern is NECESSARY/NEEDED (i.e. because of some sort of performance reason[1])?
[1]
In a multi-threaded python program, is there some sort of hidden synchronization issues that is fixed by using multiple logging objects?

Each logger can be configured separately. Generally, a module logger is not configured at all in the module itself. You create a distinct logger and use it to log messages of varying levels of detail. Whoever uses the logger decides what level of messages to see, where to send those messages, and even how to display them. They may want everything (DEBUG and up) from one module logged to a file, while another module they may only care if a serious error occurs (in which case they want it e-mailed directly to them). If every module used the same (root) logger, you wouldn't have that kind of flexibility.

The logger name defines where (logically) in your application events occur. Hence, the recommended pattern
logger = logging.getLogger(__name__)
uses logger names which track the Python package hierarchy. This in turn allows whoever is configuring logging to turn verbosity up or down for specific loggers. If everything just used the root logger, one couldn't get fine grained control of verbosity, which is important when systems reach a certain size / complexity.
The logger names don't need to exactly track the package names - you could have multiple loggers in certain packages, for example. The main deciding factor is how much flexibility is needed (if you're writing an application) and perhaps also how much flexibility your users need (if you're writing a library).

Related

Doesn't log_level parameter in python logging module affect performance?

I am using an API, to get some service from my project. The API call is taking too long, so I thought one of reasons could be lots and lots of logs that I have put across the project and the IO reads/writes are taking time.
I am using logging. My guess was as a LOG_LEVEL discard logs of lower priority, with higher priorities the API call should be completed in less time. But the time is almost same(difference being in the range of 1/10th of seconds).
The only reference regarding LOG_LEVEL and performance I got from here is
The beauty of this is that if you set the log level to WARN, info and debug messages have next to no performance impact.
Some points I should note here
I have not configured my logs to stream to any log service, like Kibana.
I have checked with this kind of situations, I am not doing any prepossessing in log message.
I have done basic Logger initialization,i.e,
import logging
logger = logging.getLogger(__name__)
and not used any file to write logs into as follows. LOG_LEVEL is given as one of the environment variable.
logging.basicConfig(filename="file_name.log")
Considering every other thing is optimal(if also everything is not optimal, then too higher priority logs should take less time), am I wrong in my guess of more time because of log read/writes? If no, then why use of high priority LOG_LEVEL flags are not decreasing the time?
In which default location, logging module store the logs?
What's the difference between log level performances?
Setting the log level can effect performance but may not be very noticeable until at scale.
When you set the level, you're creating a way to stop the logging process from continuing, and very little happens before this is checked with any individual log. For example, here is what CRITICAL logs look like in the code:
if self.isEnabledFor(CRITICAL):
self._log(CRITICAL, msg, args, **kwargs)
The logger itself has much more to do as part of _log than just this check so there would be time gains by setting a log level. But, it is fairly optimized so at the point you have initiated a logger at all, unless the difference in the amount of calls is quite large you probably won't be able to notice it much.
If you removed any reference to the logging instead of just setting level, you would get more performance gains because that check is not happening at all (which obviously takes some amount of time).
Where are logs stored by default?
By default, without setting a File, StreamHandler [source] is enabled and without specifying a specific stream, it will stream to sys.stderr. When you set a file, it creates a FileHandler which inherits from the same functions as StreamHandlers [source].
How do I optimize?
For the question you didn't ask, which is How do I speed up logging? I would suggest looking at this which gives some advice. Part of that advice is what I pointed out above but telling you to explicitly check your log level, and you can even cache that result and check the cache instead which should reduce time even further.
Check out this answer for even more on optimizing logging.
And finally, if you want to determine the speed issues with your code, whether it is from logging or not, you need to use a profiler. There are built in profiling functions in Python, check here.
One log level isn't more performant than another, however, if a level is enabled for logging, loggers are nested (In your example, this would happen if __name__ had dots in it like mypackage.core.logs), and the version of Python you are running can. This is because three things happen when you make a logging call:
The logger determines if the logging level is enabled.
This will happen for every call. In versions of Python before 3.7, this call was not cached and nested loggers took longer to determine if they were enabled or not. How much longer? In some benchmarks it was twice as much time. That said, this is heavily dependent on log nesting and even when logging millions of messages, this may only save a few seconds of system time.
The logger processes the record.
This is where the optimizations outlined in the documentation come into play. They allow the record creation to skip some steps.
The logger send the record to the handler.
This maybe the default, StreamHandler, the FileHandler, the SysLogHandler, or any number of build-in or custom handlers. In your example, you are using the FileHandler to write to file_name.log in the current directory. This may be fine for smaller applications, larger applications would benefit from using an external logger like syslog or the systemd journal. The main reason for this is because these operate in a separate process and are optimized for processing a large number of logs.

Best practices for logging in python

I have written a simple python package which has a set of functions that perform simple operations (data manipulation). I am trying to enhance the package and add more functionality for logging, which leads me to this question.
Should I expect the user of the package to pass in a file descriptor or a file handler of the python logging module into the methods of the package, or should the package itself have its own logging module which the methods within the package employ.
I can see benefits (user controls logging and can maintain a flow of function calls based on the same handler) and cons (users logger is not good enough) in both, however what is / are best practices in this case.
In your module, create a logger object:
import logging
LOGGER = logging.getLogger(__name__)
And then call the appropriate functions on that object:
LOGGER.debug('you probably dont care about this')
LOGGER.info('some info message')
LOGGER.error('whoops, something went wrong')
If the user has configured the logging subsystem correctly, then the messages will automatically go where the user wants them to go (a file, stderr, syslog, etc.)

Logging from Multiple Modules to the Same Text File

I've inherited a heap of Python code, that runs a bunch of different processes, but doesn't log anything. I want to set up a good logging process for some of the more important tasks. (I'll set it up for everything eventually.)
The way the code base is set up, there are a bunch of modules that are reused by multiple scripts. What I'd like to do is set the logging up so that messages are logged to stdout, as well as to a text file associated with the script that called it.
From what I've gathered this should be possible, e.g. logging.basicConfig() appears to do almost what I want.
How do I configure my logging so that all the modules log to the same text file, and to stdout at the same time?
Edit: The difference between this, and What is the most pythonic way of logging for multiple modules and multiple handlers with specified encoding? is that I also want to be able to call the modules from different scripts. Possibly at the same time.

How to use python logging setLoggerClass?

I want to add a custom logger to a python application I am working on. I think that the logging module intends to support this kind of customization but I don't really see how it works with the typical way of using the logging module. Normally, you would create a logger object in a module like,
import logging
logger = logging.getLogger(__name__)
However, somewhere in the application, probably the entry-point, I will have to tell the logging module to use my logger class,
logging.setLoggerClass(MyLogger)
However, this is often going to be called after modules have been imported and the logger objects are already allocated. I can think of a couple of ways to work around this problem (using a manager class to register logger allocations or calling getLogger() for each log record -- yuk), but this does not feel right and I wanted to know what the right way of doing this is.
Any logging initialisation / settings / customisation should be done before the application code runs. You can do this by putting it in the __init__.py file of your main application directory (this means it'll run before any of your modules are imported / read).
You can also put it in a settings.py module and importing that module as the first thing in your application. As long as the logging setup code runs before any getLogger calls are made then you're good.

Python multi-threaded application with memory leak from the thread-specific logger instances

I have a server subclass spawning threaded response handlers, the handlers in turn start application threads. Everything is going smoothly except when I use ObjGraph I see the correct number of application threads running ( I am load testing and have it throttled to keep 35 applications instances running).
Invoking objgraph.typestats() provides a break down of how many instances of each object are currently live in the interpreter (according to the GC). Looking at that output for memory leaks I find 700 logger instances - which would be the total number of response handlers spawned by the server.
I have called logger.removehandler(memoryhandler) and logger.removehandler(filehandler) when the application thread exits the run() method to ensure that there are no lingering references to the logger instances, also the logger instances is completely isolated within the application thread (there are no external references to it). As a final stab at eliminating these logger instances the last statement in run() is del self.logger
To get the logger in init() I provide it a suitably large random number to name it so it will be distinct for file access - I use the same large number as part of the log file name to avoid application log collisions.
The long and the short is I have 700 logger instances tracked by the GC but only 35 active threads - how do I go about killing off these loggers? A more cumbersome engineer solution is to create a pool of loggers and just acquire one for the life of the application thread but that is creating more code to maintain when the GC should simply handle this automatically.
Don't create potentially unbounded numbers of loggers, that's not good practice - there are other ways of getting context-sensitive information into your logs, as documented here.
You also don't need to have a logger as an instance attribute: loggers are singletons so you can just get a particular one by name from anywhere. The recommended practice is to name loggers at module level using
logger = logging.getLogger(__name__)
which suffices for most scenarios.
From your question I can't tell whether you appreciate that handlers and loggers aren't the same thing - for example you talk about removeHandler calls (which might serve to free the handler instances because their reference counts go to zero, but you won't free any logger instances by doing so).
Generally, loggers are named after parts of your application which generate events of interest.
If you want each thread to e.g. write to a different file, you can create a new filename each time, and then close the handler when you're done and the thread is about to terminate (that closing is important to free handler resources). Or, you can log everything to one file with thread ids or other discriminators included in the log output, and use post-processing on the log file.
I met the same memory leak when using logging.Logger(), and you may try to manually close the handler fd when the logger is useless, like:
for handler in logger.handlers:
handler.close()

Categories