Capture all stdout/stderr within structlog to generate JSON logs - python

I am currently trying to get away from print()'s and start with centralized log collection using the ELK stack and the structlog module to generate structured json log lines. This is working perfectly fine for modules that I wrote myself using a loggingHelper module that I can import and use with
logger = Logger()
in other modules and scripts. This is the loggingHelper module class:
class Logger:
"""
Wrapper Class to import within other modules and scripts
All the config and log binding (script
"""
def __init__(self):
self.__log = None
logging.basicConfig(level=logging.DEBUG, format='%(message)s')
structlog.configure(logger_factory=LoggerFactory(),
processors=[structlog.stdlib.add_log_level,
structlog.processors.TimeStamper(fmt="iso"),
structlog.processors.JSONRenderer()])
logger = structlog.get_logger()
main_script = os.path.basename(sys.argv[0]) if sys.argv[0] else None
frame = inspect.stack()[1]
log_invocation = os.path.basename(frame[0].f_code.co_filename)
user = getpass.getuser()
"""
Who executed the __main__, what was the executed __main__ file,
where did the log event happen?
"""
self.__log = logger.bind(executedScript = main_script,
logBirth = log_invocation,
executingUser = user)
def info(self, msg, **kwargs):
self.__log.info(msg, **kwargs)
def debug(self, msg, **kwargs):
self.__log.debug(msg, **kwargs)
def error(self, msg, **kwargs):
self.__log.error(msg, **kwargs)
def warn(self, msg, **kwargs):
self.__log.warning(msg, **kwargs)
This produces nicely formatted output (one JSON per line) that filebeat is able to read and forward to Elasticsearch.
However, third-party librariers completely crush the well-formatted logs.
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "context": "SELECT displayname FROM point_of_sale WHERE name = '123'", "level": "debug", "timestamp": "2019-03-15T12:52:42.792398Z", "message": "querying local"}
{"executingUser": "xyz", "logBirth": "efood.py", "executedScript": "logAlot.py", "level": "debug", "timestamp": "2019-03-15T12:52:42.807922Z", "message": "query successful: got 0 rows"}
building service object
auth version used is: v4
Traceback (most recent call last):
File "logAlot.py", line 26, in <module>
ef.EfoodDataControllerMerchantCenter().get_displayname(123)
File "/home/xyz/src/toolkit/commons/connectors/efood.py", line 1126, in get_displayname
return efc.select_from_local(q)['displayname'].values[0]
IndexError: index 0 is out of bounds for axis 0 with size 0
As you can see both info level and error level messages from the third party librara (googleapiclient) are printed without going through the logging processors.
What would be the best way (and most pythonic) of capturing and formatting everything that happens within execution of one script using the loggingHelper module I wrote? Is this even best practice?
Edit: Currently the logger indeed writes to stdout itself, which is then redirected to a file in crontab using >> and 2>&1. This looks like bad practice to me if I want to redirect everything that is written to stdout/stderr by third-party library logging, because this would lead to a loop, correct? Thus my goal is not redirecting, but rather capturing everything in my logging processor. Changed the title accordingly.
Also, here is a rough overview of what I am trying to achieve. I am very open to general criticism and suggestions that diviate from this.

Configuring the logging module
As you already figured out, structlog requires configuration of the
logging functionality already existing in python.
http://www.structlog.org/en/stable/standard-library.html
logging.basicConfig supports options for stream and filename here
https://docs.python.org/3/library/logging.html#logging.basicConfig.
Either you specify a filename which the logger will create a handle to and direct all its output. Depending on how you are set up maybe this would be the file you normally redirect to
import logging
logging.basicConfig(level=logging.DEBUG, format='%(message)s', filename='output.txt')
Or you can pass a StringIO object to the builder, which you can later read from and then redirect to your wished output destination
import logging
import io
stream = io.StringIO()
logging.basicConfig(level=logging.DEBUG, format='%(message)s', stream=stream)
More about StringIO can be read here
https://docs.python.org/3/library/io.html#io.TextIOBase
As #bruno pointed out in his answer, do not do this in an __init__ as you may end up calling this piece of code several times in the same process.

First thing first: you should NOT do any logger config (logging.basicConfig, logging.dictConfig etc) in your class initializer - the logging configuration should be done once and only once at process startup. The whole point of the logging module is to completely decouple logging calls
Second point: I'm no structlog expert (and that's an understatement - it's actually the very first time I hear about this package) but the result you get is what was to be expected from your code snippet: only your own code uses structlog, all other libs (stdlib or 3rd part) will still use the stdlib logger and emit plain text logs.
From what I've seen in structlog doc, it seems to provide some way to wrap the stdlib's loggers using the structlog.stdlib.LoggerFactory and add specific formatters to have a more consistant output. I have not tested this (yet) and the official doc is a bit sparse and lacking usable practical example (at least I couldn't find any) but this article seems to have a more explicit example (to be adapted to your own context and use case of course).
CAVEAT : as I said I never used structlog (first time I hear of this lib) so I might have misunderstood some things, and you will of course have to experiment to find out how to properly configure the whole thing to get it to work as expected.
As a side note: in unix-like systems stdout is supposed to be for program's outputs (I mean "expected output" => the program's actual results), while all error / reporting / debugging messages belong to stderr. Unless you have compelling reasons to do otherwise you should try and stick to this convention (at least for command-line tools so you can chain / pipeline them the unix way).

Related

Python logging perforamnce when level is higher

One advantage of using logging in Python instead of print is that you can set the level of the logging. When debugging you could set the level to DEBUG and everything below will get printed. If you set the level to ERROR then only error messages will get printed.
In a high-performance application this property is desirable. You want to be able to print some logging information during development/testing/debugging but not when you run it in production.
I want to ask if logging will be an efficient way to suppress debug and info logging when you set the level to ERROR. In other words, would doing the following:
logging.basicConfig(level=logging.ERROR)
logging.debug('something')
will be as efficient as
if not in_debug:
print('...')
Obviously the second code will not cost because checking a boolean is fast and when not in debug mode the code will be faster because it will not print unnecessary stuff. It comes to the cost of having all those if statement though. If logging delivers same performance without the if statements is of course much more desirable.
There's no need for you to check is_debug in your source code, because the Logging module already does that.
Here's an excerpt, with some comments and whitespace removed:
class Logger(Filterer):
def debug(self, msg, *args, **kwargs):
if self.isEnabledFor(DEBUG):
self._log(DEBUG, msg, args, **kwargs)
Just make sure you follow pylint guidelines on how to pass parameters to the logging function, so they don't execute stuff before calling the logging code. See PyLint message: logging-format-interpolation
I first learned about it through pylint warnings, but here's the official documentation that says to use % formatting and pass arguments to be evaluated later: https://docs.python.org/3/howto/logging.html#optimization

How to redirect logger warnings of external module with Nullhandler?

Apologies if this question is similar to others posted on SO, but I have tried many of the answers given and could not achieve what I am attempting to do.
I have some code that calls an external module:
import trafilatura
# after obtaining article_html
text = trafilatura.extract(article_html, language=en)
This sometimes prints out a warning on the console, which comes from the following code in the trafilatura module:
# at the top of the file
LOGGER = logging.getLogger(__name__)
# in the method that I'm calling
LOGGER.warning('HTML lang detection failed')
I'd like to not print this and other messages produced by the module directly to the console, but to store them somewhere such that I can edit the messages and decide what to do with them. (Specifically, I want to save the messages in slightly modified form but only given certain circumstances.) I am not using the logging library in my own code.
I have tried the following solution suggestions:
buf = io.StringIO()
with contextlib.redirect_stderr(buf): # I also tried redirect_stdout
text = trafilatura.extract(article_html, language=en)
and
buf = io.StringIO()
sysout = sys.stdout
syserr = sys.stderr
sys.stdout = sys.stderr = buf
text = trafilatura.extract(article_html, language=en)
sys.stdout = sysout
sys.stderr = syserr
However, in both cases buf remains empty and trafilatura still prints its logging messages to the console. Testing the redirects above with other calls (e.g. print("test")) they seem to catch those just fine, so apparently LOGGER.warning() from trafilatura is just not printing to stderr or stdout?
I thought I could set a different output stream target for trafilatura's LOGGER, but it is using a NullHandler so I could neither figure out its stream target nor do I know how to change it:
# from trafilatura's top-level __init__.py
logging.getLogger(__name__).addHandler(NullHandler())
Any ideas? Thanks in advance.
The idea here is to work within the standard logging lib of python. Adding a NullHandler is actually standard recommended practice for libraries that add a logger because it prevents falling back to stderr if no logging configuration is present.
What is likely happening here is that those logs are propagating to the root logger which got some handler attached somewhere else. You can stop that by getting the logger of the module in your code and setting it to not propagate:
# assuming that "trafilatura" is the __name__ of the module:
logger = logging.getLogger("trafilatura")
logger.propagate = False

Python context management and verbosity

To facilitate testing code as I write it, I include verbosity in almost every module I write, as follows:
class MyObj(object):
def __init__(arg0, kwarg0="default", verbosity=0):
self.a0 = arg0
self.k0 = kwarg0
self.vb = verbosity
def my_method(self):
if verbosity > 2:
print(f"{self} is doing a thing now...")
or
def my_func(arg0, arg1, verbosity=0):
if verbosity > 2:
print(f"doing something to {arg0} and {arg1}...")
if verbosity > 5: # Added on later edit
import ipdb;ipdb.set_trace() # to clarify requirement
do_somthing()
The executable scripts that import these will have collected (from the command line or elsewhere) a verbosity argument which gets passed all the way down the stack.
It's occurred to me to use a context manager so that I wouldn't have to initialize this variable at every level of the stack, something like having this in the driver script:
with args.verbosity as vb:
my_func("x", "y")
Can I do that and then use vb in my_func without having to include it in the signature? Is there a better way to achieve this kind of control?
SUBSEQUENT EDIT: it's clear from the first answers—-thank you for those--that I need to check out the logging module, but in some cases I want to stop execution in the middle to inspect things at a particular stack level (see the ipdb code I am adding with this edit). Would you still recommend that I use logging? (I'm assuming there's a way to get the logging level if I felt compelled to occasionally litter my code with if statements like that one.)
Finally, I'm still interested in whether the context management solution would be expected to work (even if it's not the optimal solution).
To facilitate testing code as I write it, I include verbosity in almost every module I write ...
Don't litter your code with if-statements and prints for this kind of purpose. It makes the code messy, repetitive and less readable.
The use-case is exactly what stdlib logging is for: you can unconditionally log events which describe what the program is doing, at various verbosity levels, and the messages will be displayed - or not - depending on the logging system's configuration.
import logging
log = logging.getLogger(__name__)
def my_func(arg0, arg1):
log.info("doing something to %s and %s...", arg0, arg1)
do_something()
if __name__ == "__main__":
logging.basicConfig(level=logging.DEBUG, format="%(message)s")
my_func(123, 456)
In the example above, the message will print because it is at level INFO which is above the verbosity level that I've configured logging with (DEBUG). If you configure logging at level WARNING, then it won't display.
Generally the user will control the logging configuration settings (levels, formats, streams, files) via a config file, environment variables, or command-line arguments. It is up to the end-user to choose the specific logging configuration that meets their needs, as the developer you can just log events anytime. No need to worry about where the log events end up going to, or if they end up going anywhere at all.
Another way to do this is levels of logging. For example, Python's builtin logging module has error, warning, info, and debug levels:
import logging
logger = logging.getLogger()
logger.info('Normal message')
logger.debug('Message that only gets printed with high verbosity`)
Simply configure the logging level to debug, warn, etc., and you're basically done! Plus you get lots of native logging goodies.

Save each level of logging in a different file

I am working with the Python logging module but I don't know how get writing in different files for each type of logging level.
For example, I want the debug-level messages (and only debug) saved in the file debug.log.
This is the code I have so far, I have tried with handlers but does not work anymore. I don't know If it is possible to do so I wish. This is the code I have using handlers and different files for each level of logging, but it does not work, since the Debug file save messages of all other levels.
import logging as _logging
self.logger = _logging.getLogger(__name__)
_handler_debug = _logging.FileHandler(self.debug_log_file)
_handler_debug.setLevel(_logging.DEBUG)
self.logger.addHandler(_handler_debug)
_handler_info = _logging.FileHandler(self.info_log_file)
_handler_info.setLevel(_logging.INFO)
self.logger.addHandler(_handler_info)
_handler_warning = _logging.FileHandler(self.warning_log_file)
_handler_warning.setLevel(_logging.WARNING)
self.logger.addHandler(_handler_warning)
_handler_error = _logging.FileHandler(self.error_log_file)
_handler_error.setLevel(_logging.ERROR)
self.logger.addHandler(_handler_error)
_handler_critical = _logging.FileHandler(self.critical_log_file)
_handler_critical.setLevel(_logging.CRITICAL)
self.logger.addHandler(_handler_critical)
self.logger.debug("debug test")
self.logger.info("info test")
self.logger.warning("warning test")
self.logger.error("error test")
self.logger.critical("critical test")
And the debug.log contains this:
debug test
info test
warning test
error test
critical test
I'm working with classes, so I have adapted a bit the code
Possible duplicate to: python logging specific level only
Please see the accepted answer there.
You need to use filters for each handler that restrict the output to the log level supplied.

Where can I check tornado's log file?

I think there was a default log file, but I didn't find it yet.
Sometimes the HTTP request process would throw an exception on the screen, but I suggest it also goes somewhere on the disk or I wouldn't know what was wrong during a long run test.
P.S.: write an exception handler is another topic; first I'd like to know my question's answer.
I found something here:
https://groups.google.com/forum/?fromgroups=#!topic/python-tornado/px4R8Tkfa9c
But it also didn't mention where can I find those log.
It uses standard python logging module by default.
Here is definition:
access_log = logging.getLogger("tornado.access")
app_log = logging.getLogger("tornado.application")
gen_log = logging.getLogger("tornado.general")
It doesn't write to files by default. You can run it using supervisord and define in supervisord config, where log files will be located. It will capture output of tornado and write it to files.
Also you can think this way:
tornado.options.options['log_file_prefix'].set('/opt/logs/my_app.log')
tornado.options.parse_command_line()
But in this case - measure performance. I don't suggest you to write to files directly from tornado application, if it can be delegated.
FYI: parse_command_line just enables pretty console logging.
With newer versions, you may do
args = sys.argv
args.append("--log_file_prefix=/opt/logs/my_app.log")
tornado.options.parse_command_line(args)
or as #ColeMaclean mentioned, providing
--log_file_prefix=PATH
at command line
There's no logfile by default.
You can use the --log_file_prefix=PATH command line option to set one.
Tornado just uses the Python stdlib's logging module, if you're trying to do anything more complicated.
Use RotatingFileHandler:
import logging
from logging.handlers import RotatingFileHandler
log_path = "/path/to/tornado.access.log"
logger_ = logging.getLogger("tornado.access")
logger_.setLevel(logging.INFO)
logger_.propagate = False
handler = RotatingFileHandler(log_path, maxBytes=1024*1024*1024, backupCount=3)
handler.setFormatter(logging.Formatter("[%(name)s][%(asctime)s][%(levelname)s][%(pathname)s:%(lineno)d] > %(message)s"))
logger_.addHandler(handler)

Categories