Temporary changing python logging handlers - python

I'm working on an app that uses the standard logging module to do logging. We have a setup where we log to a bunch of files based on levels etc. We also use celery to run some jobs out of the main app (maintenance stuff usually that's time consuming).
The celery task does nothing other than call functions (lets say spam) which do the actual work. These functions use the logging module to output status messages. Now, I want to write a decorator that hijacks all the logging calls made by spam and puts them into a StringIO so that I can put them somewhere.
One of the solutions I had was to insert a handler for the root logger while the function is executing that grabs everything. However, this is messing with global shared objects which might be problematic later.
I came across this answer but it's not exactly what I'm looking for.

The thing about the StringIO is, there could be multiple processes running (Celery tasks), hence multiple StringIOs, right?
You can do something like this:
In the processes run under Celery, add to the root logger a handler which sends events to a socket (SocketHandler for TCP or DatagramHandler for UDP).
Create a socket receiver to receive and handle the events, as documented here. This acts like a consolidated StringIO across multiple Celery-run processes.
If you are using multiprocessing, you can also use the approach described here. Though that post talks about Python 3.2, the functionality is also available for Python 2.x using logutils.
Update: If you want to avoid a separate receiver process, you can log to a database directly, using a handler similar to that in this answer. If you want to buffer all the logging till the end of the process, you can use a MemoryHandler in conjunction with a database handler to achieve this.

For the StringIO handler, you could add an extra handler for the root logger that would grab everything, but at the same time add a dummy filter (Logger.addFilter) that filters everything out (so nothing is actually logged to StringIO).
You could then write a decorator for spam that removes the filter (Logger.removeFilter) before the function executes, and adds the dummy filter back after.

Related

Separate logging for task queues

I'm using a task queue with python (RQ). Since workers run concurrently, without any configuration messages from all workers are mixed up.
I want to organize logging such that at any time I can get the exact full log for a given task run by a given worker. Workers run on different machines, so preferably logs would be sent over the network to some central collector, but to get started, I'd also be happy with local logging to file, as long as the messages of each task end up in a separate log file.
My question has two parts:
how to implement this in python code. I suppose that, for the "log to file" case, I could do something like this at the beginning of each task function:
logging.basicConfig(filename="some_unique_id_for_this_task_and_worker.log", level=logging.DEBUG, format="whatever")
logging.debug("My message...")
# etc.
but when it comes to logging over the network, I'm struggling to understand how the logger should be configured so that all log messages from the same task are recognizable at the collector. This is purposely vague because I haven't chosen a given technology or protocol to do this collection yet, and I'm looking for suggestions.
Assuming that the previous requirement can be accomplished, when logging over the network, what's a centralized solution that can give me the full log for a given task? I mean really showing me the full text log, not using a search interface returning events or lines (as eg, IIRC, in splunk or elasticsearch).
Thanks
Since you're running multiple processes (the RQ workers) you could probably use one of the recipes in the logging cookbook. If you want to use a SocketHandler and a socket server to receive and send messages to a file, you should also look at this recipe in the cookbook. It has a part related to running a socket listener in production.

Python Multiprocessing returning results with Logging and running frozen on Windows

I need some help with implementing logging while multiprocessing and running the application frozen under Windows. There are dozens of topics on this subject and I have spent a lot of time reviewing and testing those. I have also extensively reviewed the documentation, but I cannot figure out how to implement this in my code.
I have created a minimum example which runs fine on Linux, but crashes on Windows (even when not frozen). The example I created is just one of many iterations I have put my code through.
You can find the minimum example on github. Any assistance to get this example working would be greatly appreciated.
Thank you.
Marc.
The basic
On Linux, a child process is created by fork method by default. That means, the child process inherits almost everything from the parent process.
On Windows, the child process is created by spawn method.
That means, a child process is started almost from crash, re-imports and re-executes any code that is outside of the guard cloud if __name__ == '__main__'.
Why it worked or failed
On Linux, because the logger object is inherited, your program will start logging.
But it is far from perfect since you log directly to the file.
Sooner or later, log lines will be overlapped or IO error on file happens due to race condition between processes.
On Windows, since you didn't pass the logger object to the child process, and it re-imports your pymp_global module, logger is a None object. So when you try logging with a None object, it crashes for sure.
The solution
Logging with multiprocessing is not an easy task.
For it to work on Windows, you must either pass a logger object to child processes and/or log with QueueHandler. Another similar solution for inter-process communication is to use SocketHandler.
The idea is that only one thread or process does the logging. Other processes just send the log records. This prevents the race condition and ensures the log is written out after the critical process got time to do its job.
So how to implement it?
I have encountered this logging problem before and already written the code.
You can just use it with logger-tt package.
#pymp.py
from logging import getLogger
from logger_tt import setup_logging
setup_logging(use_multiprocessing=True)
logger = getLogger(__name__)
# other code below
For other modules
#pymp_common.py
from logging import getLogger
logger = getLogger(__name__)
# other code below
This saves you from writing all the logging config code everywhere manually.
You may consider changing the log_config file to suit your need.

What is celery.utils.log.ProcessAwareLoggerobject doing in logging.Logger.manager.loggerDict

I am inspecting the logging.Logger.manager.loggerDict by doing:
import logging
logging.Logger.manager.loggerDict
and the dict is as follows:
{
'nose.case': <celery.utils.log.ProcessAwareLoggerobjectat0x112c8dcd0>,
'apps.friends': <logging.PlaceHolderobjectat0x1147720d0>,
'oauthlib.oauth2.rfc6749.grant_types.client_credentials': <celery.utils.log.ProcessAwareLoggerobjectat0x115c48710>,
'apps.adapter.views': <celery.utils.log.ProcessAwareLoggerobjectat0x116a847d0>,
'apps.accounts.views': <celery.utils.log.ProcessAwareLoggerobjectat0x116976990>,
}
There are more but I truncated it
My questions are :
How come celery is involved in logging of various other non-celery apps? Is it because logging is done in an async way and somehow logging framework detects presence of celery and uses it?
For two of my own files that are logging using logger = logging.getLogger(__name__) , I see one is PlaceHolderObject and other two it is celery.utils.log.ProcessAwareLogger object - although these latter two are called in views and not in celery processes. How did it become this way then
Thanks
Celery itself replaces the (global) logger class, using the logging.setLoggerClass method, with a ProcessAwareLogger class that does a couple of things: avoid trying to log while in a signal handler, and add a process name to logs. This happens as soon as Celery's logging system is set up. You're seeing this class even on your own loggers because of the global nature of setLoggerClass.
As for why, exactly, Celery is designed like that, I think you'd have to ask a developer of Celery, but effectively it allows Celery to ensure that signal handler safety and process name are taken care of even if you use your own loggers in your app.
The python logging docs note:
If you are implementing asynchronous signal handlers using the signal module, you may not be able to use logging from within such handlers. This is because lock implementations in the threading module are not always re-entrant, and so cannot be invoked from such signal handlers.
Celery uses signal so this may be a reason for wanting to globally enforce its logger class.

Logging from Multiple Modules to the Same Text File

I've inherited a heap of Python code, that runs a bunch of different processes, but doesn't log anything. I want to set up a good logging process for some of the more important tasks. (I'll set it up for everything eventually.)
The way the code base is set up, there are a bunch of modules that are reused by multiple scripts. What I'd like to do is set the logging up so that messages are logged to stdout, as well as to a text file associated with the script that called it.
From what I've gathered this should be possible, e.g. logging.basicConfig() appears to do almost what I want.
How do I configure my logging so that all the modules log to the same text file, and to stdout at the same time?
Edit: The difference between this, and What is the most pythonic way of logging for multiple modules and multiple handlers with specified encoding? is that I also want to be able to call the modules from different scripts. Possibly at the same time.

How do I collect up logs for an App Engine request using Python logging?

Using Google App Engine, Python 2.7, threadsafe:true, webapp2.
I would like to include all logging.XXX() messages in my API responses, so I need an efficient way to collect up all the log messages that occur during the scope of a request. I also want to operate in threadsafe:true, so I need to be careful to get only the right log messages.
Currently, my strategy is to add a logging.Handler at the start of my webapp2 dispatch method, and then remove it at the end. To collect logs only for my thread, I instantiate the logging.Handler with the name of the current thread; the handler will simply throw out log records that are from a different thread. I am using thread name and not thread ID because I was getting some unexpected results on dev_appserver when using the ID.
Questions:
Is it efficient to constantly be adding/removing logging.Handler objects in this fashion? I.e., every request will add, then remove, a Handler. Is this "cheap"?
Is this the best way to get only the logging messages for my request? My big assumption is that each request gets its own thread, and that thread name will actually select the right items.
Am I fundamentally misunderstanding Python logging? Perhaps I should only have a single additional Handler added once at the "module-level" statically, and my dispatch should do something lighter.
Any advice is appreciated. I don't have a good understanding of what Python (and specifically App Engine Python) does under the hood with respect to logging. Obviously, this is eminently possible because the App Engine Log Viewer does exactly the same thing: it displays all the log messages for that request. In fact, if I could piggyback on that somehow, that would be even better. It absolutely needs to be super-cheap though - i.e., an RPC call is not going to cut it.
I can add some code if that will help.
I found lots of goodness here:
from google.appengine.api import logservice
entries = logservice.logs_buffer().parse_logs()

Categories