What is celery.utils.log.ProcessAwareLoggerobject doing in logging.Logger.manager.loggerDict - python

I am inspecting the logging.Logger.manager.loggerDict by doing:
import logging
logging.Logger.manager.loggerDict
and the dict is as follows:
{
'nose.case': <celery.utils.log.ProcessAwareLoggerobjectat0x112c8dcd0>,
'apps.friends': <logging.PlaceHolderobjectat0x1147720d0>,
'oauthlib.oauth2.rfc6749.grant_types.client_credentials': <celery.utils.log.ProcessAwareLoggerobjectat0x115c48710>,
'apps.adapter.views': <celery.utils.log.ProcessAwareLoggerobjectat0x116a847d0>,
'apps.accounts.views': <celery.utils.log.ProcessAwareLoggerobjectat0x116976990>,
}
There are more but I truncated it
My questions are :
How come celery is involved in logging of various other non-celery apps? Is it because logging is done in an async way and somehow logging framework detects presence of celery and uses it?
For two of my own files that are logging using logger = logging.getLogger(__name__) , I see one is PlaceHolderObject and other two it is celery.utils.log.ProcessAwareLogger object - although these latter two are called in views and not in celery processes. How did it become this way then
Thanks

Celery itself replaces the (global) logger class, using the logging.setLoggerClass method, with a ProcessAwareLogger class that does a couple of things: avoid trying to log while in a signal handler, and add a process name to logs. This happens as soon as Celery's logging system is set up. You're seeing this class even on your own loggers because of the global nature of setLoggerClass.
As for why, exactly, Celery is designed like that, I think you'd have to ask a developer of Celery, but effectively it allows Celery to ensure that signal handler safety and process name are taken care of even if you use your own loggers in your app.
The python logging docs note:
If you are implementing asynchronous signal handlers using the signal module, you may not be able to use logging from within such handlers. This is because lock implementations in the threading module are not always re-entrant, and so cannot be invoked from such signal handlers.
Celery uses signal so this may be a reason for wanting to globally enforce its logger class.

Related

Best practices for logging in python

I have written a simple python package which has a set of functions that perform simple operations (data manipulation). I am trying to enhance the package and add more functionality for logging, which leads me to this question.
Should I expect the user of the package to pass in a file descriptor or a file handler of the python logging module into the methods of the package, or should the package itself have its own logging module which the methods within the package employ.
I can see benefits (user controls logging and can maintain a flow of function calls based on the same handler) and cons (users logger is not good enough) in both, however what is / are best practices in this case.
In your module, create a logger object:
import logging
LOGGER = logging.getLogger(__name__)
And then call the appropriate functions on that object:
LOGGER.debug('you probably dont care about this')
LOGGER.info('some info message')
LOGGER.error('whoops, something went wrong')
If the user has configured the logging subsystem correctly, then the messages will automatically go where the user wants them to go (a file, stderr, syslog, etc.)

How to use python logging setLoggerClass?

I want to add a custom logger to a python application I am working on. I think that the logging module intends to support this kind of customization but I don't really see how it works with the typical way of using the logging module. Normally, you would create a logger object in a module like,
import logging
logger = logging.getLogger(__name__)
However, somewhere in the application, probably the entry-point, I will have to tell the logging module to use my logger class,
logging.setLoggerClass(MyLogger)
However, this is often going to be called after modules have been imported and the logger objects are already allocated. I can think of a couple of ways to work around this problem (using a manager class to register logger allocations or calling getLogger() for each log record -- yuk), but this does not feel right and I wanted to know what the right way of doing this is.
Any logging initialisation / settings / customisation should be done before the application code runs. You can do this by putting it in the __init__.py file of your main application directory (this means it'll run before any of your modules are imported / read).
You can also put it in a settings.py module and importing that module as the first thing in your application. As long as the logging setup code runs before any getLogger calls are made then you're good.

Each thread create its own logger instance, logging their own event

I have tried logging in Python. It looks like once a logging instance is created by a thread it won't be deleted. However, my program should produce more than 100 threads per minute, and each will create their own logger, which may result in a kind of memory leak (logging.Logger instances will not be collected by the garbage collector).
Can anyone help me on this, is there a way to use logger for multi-threaded applications?
In the python logging module, loggers are managed by a logging.Manager instance. usually there is only one logging manager, available as logging.Logger.manager. Loggers are identified by their name. Each time you use logging.getLogger('name') this call is acutally forwarded to logging.Logger.manager.getLogger which holds a dict of loggers and returns the same logger for each 'name' every time.
so if you don't use a different name when getting the logger from a thread, you're actually using the same logger instance each time and don't have to worry about a memory leak.

Temporary changing python logging handlers

I'm working on an app that uses the standard logging module to do logging. We have a setup where we log to a bunch of files based on levels etc. We also use celery to run some jobs out of the main app (maintenance stuff usually that's time consuming).
The celery task does nothing other than call functions (lets say spam) which do the actual work. These functions use the logging module to output status messages. Now, I want to write a decorator that hijacks all the logging calls made by spam and puts them into a StringIO so that I can put them somewhere.
One of the solutions I had was to insert a handler for the root logger while the function is executing that grabs everything. However, this is messing with global shared objects which might be problematic later.
I came across this answer but it's not exactly what I'm looking for.
The thing about the StringIO is, there could be multiple processes running (Celery tasks), hence multiple StringIOs, right?
You can do something like this:
In the processes run under Celery, add to the root logger a handler which sends events to a socket (SocketHandler for TCP or DatagramHandler for UDP).
Create a socket receiver to receive and handle the events, as documented here. This acts like a consolidated StringIO across multiple Celery-run processes.
If you are using multiprocessing, you can also use the approach described here. Though that post talks about Python 3.2, the functionality is also available for Python 2.x using logutils.
Update: If you want to avoid a separate receiver process, you can log to a database directly, using a handler similar to that in this answer. If you want to buffer all the logging till the end of the process, you can use a MemoryHandler in conjunction with a database handler to achieve this.
For the StringIO handler, you could add an extra handler for the root logger that would grab everything, but at the same time add a dummy filter (Logger.addFilter) that filters everything out (so nothing is actually logged to StringIO).
You could then write a decorator for spam that removes the filter (Logger.removeFilter) before the function executes, and adds the dummy filter back after.

Python multi-threaded application with memory leak from the thread-specific logger instances

I have a server subclass spawning threaded response handlers, the handlers in turn start application threads. Everything is going smoothly except when I use ObjGraph I see the correct number of application threads running ( I am load testing and have it throttled to keep 35 applications instances running).
Invoking objgraph.typestats() provides a break down of how many instances of each object are currently live in the interpreter (according to the GC). Looking at that output for memory leaks I find 700 logger instances - which would be the total number of response handlers spawned by the server.
I have called logger.removehandler(memoryhandler) and logger.removehandler(filehandler) when the application thread exits the run() method to ensure that there are no lingering references to the logger instances, also the logger instances is completely isolated within the application thread (there are no external references to it). As a final stab at eliminating these logger instances the last statement in run() is del self.logger
To get the logger in init() I provide it a suitably large random number to name it so it will be distinct for file access - I use the same large number as part of the log file name to avoid application log collisions.
The long and the short is I have 700 logger instances tracked by the GC but only 35 active threads - how do I go about killing off these loggers? A more cumbersome engineer solution is to create a pool of loggers and just acquire one for the life of the application thread but that is creating more code to maintain when the GC should simply handle this automatically.
Don't create potentially unbounded numbers of loggers, that's not good practice - there are other ways of getting context-sensitive information into your logs, as documented here.
You also don't need to have a logger as an instance attribute: loggers are singletons so you can just get a particular one by name from anywhere. The recommended practice is to name loggers at module level using
logger = logging.getLogger(__name__)
which suffices for most scenarios.
From your question I can't tell whether you appreciate that handlers and loggers aren't the same thing - for example you talk about removeHandler calls (which might serve to free the handler instances because their reference counts go to zero, but you won't free any logger instances by doing so).
Generally, loggers are named after parts of your application which generate events of interest.
If you want each thread to e.g. write to a different file, you can create a new filename each time, and then close the handler when you're done and the thread is about to terminate (that closing is important to free handler resources). Or, you can log everything to one file with thread ids or other discriminators included in the log output, and use post-processing on the log file.
I met the same memory leak when using logging.Logger(), and you may try to manually close the handler fd when the logger is useless, like:
for handler in logger.handlers:
handler.close()

Categories