Context-dependent log level in Python - python

I'm prototyping a web application framework in Python (mostly for educative purposes) and I'm stuck on one feature I've wanted for such a long time: per-route log level.
The goal of this feature is to identify some specific entry points for which we're performing diagnostics. For example, I want to track what's going on when callers hit POST /sessions/login. Now, I want to get 100% of log entries for code hit by request processing for this URL. And this means everything, including whatever goes on in 3rd-party applications.
Example: fictional application has two routes: /sessions/login and /sessions/info. Both request handlers hit the same database code in package users, which uses logger myapp.users.db. Request processing for /sessions/login should emit log messages on logger myapp.users.db, but request processing for /sessions/info should not.
The problem is that this doesn't fit well with Python's logging library, which decomposes logging in a hierarchical fashion, which is nice for layering (e.g. controlling the log level by application layers).
What I really want is a context-dependent log level. The natural implementation that comes to mind is something that makes logger.getEffectiveLevel() return a thread-local log level (with debug middleware conditionally lowering the log level to debug if the request URL is subject to debugging). However, I'm looking at the logging flow in the Python documentation, and I don't understand how to implement this using any of the many different types of configuration hooks.
Question: how would you implement a context-dependent log level in Python?
Update: I found a partial solution.
context = threading.local()
class ContextualLogger(logging.Logger):
def getEffectiveLevel(self):
global context
level = getattr(context, 'log_level', logging.NOTSET)
if level == logging.NOTSET:
level = super(ContextualLogger, self).getEffectiveLevel()
return level
logging.setLoggerClass(ContextualLogger)
However, this doesn't work for the root logger. Any ideas?
Update: it's also possible to monkey patch the getEffectiveLevel() function.
context = threading.local()
# Monkey patch "getEffectiveLevel()" to consult the current setting in the
# `context.log_level` thread-local storage. If that value is present, use
# it to override the current value; else, compute the level using the usual
# infrastructure.
default_getEffectiveLevel = logging.Logger.getEffectiveLevel
def patched_getEffectiveLevel(self):
level = getattr(context, 'log_level', logging.NOTSET)
if level == logging.NOTSET:
level = default_getEffectiveLevel(self)
return level
logging.Logger.getEffectiveLevel = patched_getEffectiveLevel
Now, this works even for the root logger. I have to admit that I'm a little uncomfortable with monkey patching this function, but then again it falls back onto the usual infrastructure so it's actually not as dirty as it looks.

You're better off using a logging.Filter which is attached to your loggers (or handlers) which uses the context to either drop the event (by returning False from the filter method) or allow the event to be logged (by returning True from the filter method).
Though not exactly for your use case, I illustrated use of filters with thread-local context in this post.

Related

How to disable the logging of Uvicorn?

I am working on FastAPI - Uvicorn. I want to disable the logging by uvicorn. I need only the logs which are logged by the server.
I referred to this blog and implemented the logging.
You could change log level for getting only needed messages, there are a bunch of possible options:
uvicorn main:app --log-level critical
I think I had the same issue.
To disable the logger one must first find the loggers, that shall be disabled. I did this by following this stackoverflow post and this GitHub-Issue.
For me it works to disable just two loggers from uvicorn:
import logging
# ....CODE....
uvicorn_error = logging.getLogger("uvicorn.error")
uvicorn_error.disabled = True
uvicorn_access = logging.getLogger("uvicorn.access")
uvicorn_access.disabled = True
At first I tried the answer provided by #Sanchouz, but this didn't work out for me - Further setting propagate = false is by some regarded as a bad practice (see this) . As I wanted to do it programmatically I couldn't test the answer provided by #funnydman.
Hope this helps anyone, thinklex.
Problem definition
I had a similar problem and I found a solution. In my case, I created a small website with FastAPI to launch web scrappers in separate processes. I also created class-wrapper for loggers from logging module. My problem was: when I started my app inside Docker container with uvicorn ... command, which includes some settings for logging into file, all logging from any web scrapper would go into both scrapper's separate log file and server's log file. I have a lot of stuff to log, so it's quite a problem.
Short answer
When you get your logger, just set it's propagate property to False like this:
logger = logging.getLogger(logger_name)
logger.propagate = False
Long answer
At first I spent some time debugging insides of a logging module and I found a function called callHandlers, which loops through handlers of current logger and it's parents. I wrongly assumed, that root logger was responsible for that problem, but after some more testing it turned out, that actually root logger didn't have any handlers. It means, that some of uvicorn's logger was responsible for that, which makes sense, also considering Thinklex's solution. I tried his solution too, but it doesn't fit me, because it disabled uvicorn's logging completely, and I don't want that, so I'll stick with preventing propagation on my loggers.

Python switch a logger off by default

In Python, I want to optionally log from a module - but have this logging off by default, enabled with a function call. (The output from this file will be very spammy - so best off by default)
I want my code to look something like this.
log = logging.getLogger("module")
log.switch_off()
---
import module
module.log.switch_on()
I can't seem to find an option to disable a logger.
Options considered:
Using filters: I think this is a bit confusing for the client
Setting a level higher than one I use to log: (e.g. logging.CRITICAL). I don't like that we could inadvertently throw log lines into normal output if we use that level.
Use a flag and add ifs
Require the client to exclude our log events. See logging config
There are two pieces at play here. Python has logging.Logger objects and logging.Handler objects that work together to serve you logging information. Loggers handle the logic of collecting logging information, and deciding whether logs should be emitted to associated handlers. If the logging level of your log record is less severe than the level specified in the logger, it will not pass info to associated handlers.
Handlers have the same feature, and since handlers are the last line between log records and defined output, you would likely want to disable the interaction there. To accomplish this, and avoid having logs inadvertently logged elsewhere, you can add a new logging level to your application:
logging.addLevelName(logging.CRITICAL + 1, "DISABLELOGGING")
Note: This only maps the name to value for purposes of formatting, so you will need to add a member to the logging module as well:
logging.DISABLELOGGING = logging.CRITICAL + 1
Setting it to a value higher than CRITICAL ensures that no normal log event will pass and be emitted.
Then you just need to set your handler to the level you defined:
handler.setLevel(logging.DISABLELOGGING)
and now there should be no logs that pass the handler, and therefore no output shown.

How can I measure the coverage (in production system)?

I would like to measure the coverage of my Python code which gets executed in the production system.
I want an answer to this question:
Which lines get executed often (hot spots) and which lines are never used (dead code)?
Of course this must not slow down my production site.
I am not talking about measuring the coverage of tests.
I assume you are not talking about test suite code coverage which the other answer is referring to. That is a job for CI indeed.
If you want to know which code paths are hit often in your production system, then you're going to have to do some instrumentation / profiling. This will have a cost. You cannot add measurements for free. You can do it cheaply though and typically you would only run it for short amounts of time, long enough until you have your data.
Python has cProfile to do full profiling, measuring call counts per function etc. This will give you the most accurate data but will likely have relatively high impact on performance.
Alternatively, you can do statistical profiling which basically means you sample the stack on a timer instead of instrumenting everything. This can be much cheaper, even with high sampling rate! The downside of course is a loss of precision.
Even though it is surprisingly easy to do in Python, this stuff is still a bit much to put into an answer here. There is an excellent blog post by the Nylas team on this exact topic though.
The sampler below was lifted from the Nylas blog with some tweaks. After you start it, it fires an interrupt every millisecond and records the current call stack:
import collections
import signal
class Sampler(object):
def __init__(self, interval=0.001):
self.stack_counts = collections.defaultdict(int)
self.interval = interval
def start(self):
signal.signal(signal.VTALRM, self._sample)
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
def _sample(self, signum, frame):
stack = []
while frame is not None:
formatted_frame = '{}({})'.format(
frame.f_code.co_name,
frame.f_globals.get('__name__'))
stack.append(formatted_frame)
frame = frame.f_back
formatted_stack = ';'.join(reversed(stack))
self.stack_counts[formatted_stack] += 1
signal.setitimer(signal.ITIMER_VIRTUAL, self.interval, 0)
You inspect stack_counts to see what your program has been up to. This data can be plotted in a flame-graph which makes it really obvious to see in which code paths your program is spending the most time.
If i understand it right you want to learn which parts of your application is used most often by users.
TL;DR;
Use one of the metrics frameworks for python if you do not want to do it by hand. Some of them are above:
DataDog
Prometheus
Prometheus Python Client
Splunk
It is usually done by function level and it actually depends on application;
If it is a desktop app with internet access:
You can create a simple db and collect how many times your functions are called. For accomplish it you can write a simple function and call it inside every function that you want to track. After that you can define an asynchronous task to upload your data to internet.
If it is a web application:
You can track which functions are called from js (mostly preferred for user behaviour tracking) or from web api. It is a good practice to start from outer to go inner. First detect which end points are frequently called (If you are using a proxy like nginx you can analyze server logs to gather information. It is the easiest and cleanest way). After that insert a logger to every other function that you want to track and simply analyze your logs for every week or month.
But you want to analyze your production code line by line (it is a very bad idea) you can start your application with python profilers. Python has one already: cProfile.
Maybe make a text file and through your every program method just append some text referenced to it like "Method one executed". Run the web application like 10 times thoroughly as a viewer would and after this make a python program that reads the file and counts a specific parts of it or maybe even a pattern and adds it to a variable and outputs the variables.

Intercept all logging in Python

I am new to Python and even newer with Python logging.
My situation is, I have a system where I need to trace the data through the process. So I decided to use the Python's logging system itself to trace the information.
Initially I created a new Logging Handler, where in the emit function it send the logger to another server, only if the logger has in its extra attributes some variable I am using to trace.
So far so good, in all log (from debug to critical) I can trace the data. My problem is that, if someone set the LogLevel to critical and I am tracing a data using the LogLevel info, I won't get the trace, since the log won't be processed.
I thought in two solutions. First, create a custom LogLevel to use for the trace, which I think isn't the right choice. The second, that I believe is the right one, is to intercept all logs, and check if there is that extra variable in it. If the log has it, doesn't matter the log level, I will send the log to the server any ways.
Since I am new with Python, I can't understand how the log system works. Do I need another function in my handler? Do I need to create a custom LogRecord?
class RQHandler(logging.Handler):
def __init__(
self, formatter=JSONFormatter(), level=logging.NOTSET,
connection_pool=None
):
# run the regular Handler __init__
logging.Handler.__init__(self, level)
self.formatter = formatter
def emit(self, record):
# Send to the other
...
In the logging module, Handler is a subclass of Filterer, which has method filter(self, record)
It looks like a simple solution might be to override the filter method in RQHandler.
Look through the original Filterer code to see what it's doing first, but you should be able to override it to return True to force the LogRecord to always be emitted.

Advantages of logging vs. print() + logging best practices

I'm currently working on 1.0.0 release of pyftpdlib module.
This new release will introduce some backward incompatible changes in
that certain APIs will no longer accept bytes but unicode.
While I'm at it, as part of this breackage, I was contemplating the
possibility to get rid of my logging functions, which currently use the
print statement, and use the logging module instead.
As of right now pyftpdlib delegates the logging to 3 functions:
def log(s):
"""Log messages intended for the end user."""
print s
def logline(s):
"""Log commands and responses passing through the command channel."""
print s
def logerror(s):
"""Log traceback outputs occurring in case of errors."""
print >> sys.stderr, s
The user willing to customize logs (e.g. write them to a file) is
supposed to just overwrite these 3 functions as in:
>>> from pyftpdlib import ftpserver
>>>
>>> def log2file(s):
... open('ftpd.log', 'a').write(s)
...
>>> ftpserver.log = ftpserver.logline = ftpserver.logerror = log2file
Now I'm wondering: what benefits would imply to get rid of this approach
and use logging module instead?
From a module vendor perspective, how exactly am I supposed to
expose logging functionalities in my module?
Am I supposed to do this:
import logging
logger = logging.getLogger("pyftpdlib")
...and state in my doc that "logger" is the object which is supposed
to be used in case the user wants to customize how logs behave?
Is it legitimate to deliberately set a pre-defined format output as in:
FORMAT = '[%(asctime)] %(message)s'
logging.basicConfig(format=FORMAT)
logger = logging.getLogger('pyftpdlib')
...?
Can you think of a third-party module I can take cues from where the logging functionality is exposed and consolidated as part of the public API?
Thanks in advance.
libraries (ftp server or client library) should never initialize the logging system.
So it's ok to instantiate a logger object and to point at logging.basicConfig in the
documentation (or provide a function along the lines of basicConfig with fancier output
and let the user choose among his logging configuration strategy, plain basicConfig or
library provided configuration)
frameworks (e.g. django) or servers (ftp server daemon)
should initialize the logging system to a reasonable
default and allow for customization of logging system configuration.
Typically libraries should just create a NullHandler handler, which is simply a do nothing handler. The end user or application developer who uses your library can then configure the logging system. See the section Configuring Logging for a Library in the logging documentation for more information. In particular, see the note which begins
It is strongly advised that you do not add any handlers other than NullHandler to your library's loggers.
In your case I would simply create a logging handler, as per the logging documentation,
import logging
logging.getLogger('pyftpdlib').addHandler(logging.NullHandler())
Edit The logging implementation sketched out in the question seems perfectly reasonable. In your documentation just mention logger and discuss or point users to the logging.setLevel and logging.setFormatter methods for customising the output from your library. Rather than using logging.basicConfig(format=FORMAT) you could consider using logging.config.fileConfig to manage the settings for your output and document the configuration file somewhere in your documentation, again pointing the user to the logging module documentation for the format expected in this file.
Here is a resource I used to make a customizable logger. I didn't change much, I just added an if statement, and pass in whether or not I want to log to a file or just the console.
Check this Colorer out. It's really nice for colorizing the output so DEBUG looks different than WARN which looks different than INFO.
The Logging module bundles a heck of a lot of nice functionality, like SMTP logging, file rotation logging (so you can save a couple old log files, but not make 100s of them every time something goes wrong).
If you ever want to migrate to Python 3, using the logging module will remove the need to change your print statements.
Logging is awesome depending on what you're doing, I've only lightly used it before to see where I am in a program (if you're running this function, color this way), but it has significantly more power than a regular print statement.
You can look at Django (just create a sample project) and see how it initialize logger subsystem.
There is also a contextual logger helper that I've written some time ago - this logger automatically takes name of module/class/function is was initialized from. This is very useful for debug messages where you can see right-through that module spits the messages and how the call flow goes.

Categories