Combining Python trace information and logging - python

I'm trying to write a highly modular Python logging system (using the logging module) and include information from the trace module in the log message.
For example, I want to be able to write a line of code like:
my_logger.log_message(MyLogFilter, "this is a message")
and have it include the trace of where the "log_message" call was made, instead of the actual logger call itself.
I almost have the following code working except for the fact that the trace information is from the logging.debug() call rather than the my_logger.log_message() one.
class MyLogFilter(logging.Filter):
def __init__(self):
self.extra = {"error_code": 999}
self.level = "debug"
def filter(self, record):
for key in self.extra.keys():
setattr(record, key, self.extra[key])
class myLogger(object):
def __init__(self):
fid = logging.FileHandler("test.log")
formatter = logging.Formatter('%(pathname)s:%(lineno)i, %(error_code)%I, %(message)s'
fid.setFormatter(formatter)
self.my_logger = logging.getLogger(name="test")
self.my_logger.setLevel(logging.DEBUG)
self.my_logger.addHandler(fid)
def log_message(self, lfilter, message):
xfilter = lfilter()
self.my_logger.addFilter(xfilter)
log_funct = getattr(self.logger, xfilter.level)
log_funct(message)
if __name__ == "__main__":
logger = myLogger()
logger.log_message(MyLogFilter, "debugging")
This is a lot of trouble to go through in order to make a simple logging.debug call but in reality, I will have a list of many different versions of MyLogFilter at different logging levels that contain different values of the "error_code" attribute and I'm trying to make the log_message() call as short and sweet as possible because it will be repeated numerous times.
I would appreciate any information about how to do what I want to, or if I'm completely off on the wrong track and if that's the case, what I should be doing instead.
I would like to stick to the internal python modules of "logging" and "trace" if that's possible instead of using any external solutions.

or if I'm completely off on the wrong track and if that's the case, what I should be doing instead.
My strong suggestion is that you view logging as a solved problem and avoid reinventing the wheel.
If you need more than the standard library's logging module provides, it's probably something like structlog (pip install structlog)
Structlog will give you:
data binding
cloud native structured logging
pipelines
...and more
It will handle most local and cloud use cases.
Below is one common configuration that will output colorized logging to a .log file, to stdout, and can be extended further to log to eg AWS CloudWatch.
Notice there is an included processor: StackInfoRenderer -- this will include stack information to all logging calls with a 'truthy' value for stack_info (this is also in stdlib's logging btw). If you only want stack info for exceptions, then you'd want to do something like exc_info=True for your logging calls.
main.py
from structlog import get_logger
from logging_config import configure_local_logging
configure_local_logging()
logger = get_logger()
logger.info("Some random info")
logger.debug("Debugging info with stack", stack_info=True)
try:
assert 'foo'=='bar'
catch Exception as e:
logger.error("Error info with an exc", exc_info=e)
logging_config.py
import logging
import structlog
def configure_local_logging(filename=__name__):
"""Provides a structlog colorized console and file renderer for logging in eg ING tickets"""
timestamper = structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M:%S")
pre_chain = [
structlog.stdlib.add_log_level,
timestamper,
]
logging.config.dictConfig({
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"plain": {
"()": structlog.stdlib.ProcessorFormatter,
"processor": structlog.dev.ConsoleRenderer(colors=False),
"foreign_pre_chain": pre_chain,
},
"colored": {
"()": structlog.stdlib.ProcessorFormatter,
"processor": structlog.dev.ConsoleRenderer(colors=True),
"foreign_pre_chain": pre_chain,
},
},
"handlers": {
"default": {
"level": "DEBUG",
"class": "logging.StreamHandler",
"formatter": "colored",
},
"file": {
"level": "DEBUG",
"class": "logging.handlers.WatchedFileHandler",
"filename": filename + ".log",
"formatter": "plain",
},
},
"loggers": {
"": {
"handlers": ["default", "file"],
"level": "DEBUG",
"propagate": True,
},
}
})
structlog.configure_once(
processors=[
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
timestamper,
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
],
context_class=dict,
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
Structlog can do quite a bit more than this. I suggest you check it out.

It turns out the missing piece to the puzzle is using the "traceback" module rather than the "trace" one. It's simple enough to parse the output of traceback to pull out the source filename and line number of the ".log_message()" call.
If my logging needs become any more complicated then I'll definitely look into struct_log. Thank you for that information as I'd never heard about it before.

Related

How do you specify output log file using structlog?

I feel like this should be super simple but I cannot figure out how to specify the path for the logfile when using structlog. The documentation states that you can use traditional logging alongside structlog so I tried this:
logger = structlog.getLogger(__name__)
logging.basicConfig(filename=logfile_path, level=logging.ERROR)
logger.error("TEST")
The log file gets created but of course "TEST" doesn't show up inside it. It's just blank.
For structlog log entries to appear in that file, you have to tell structlog to use stdlib logging for output. You can find three different approaches in the docs, depending on your other needs.
I was able to get an example working by following the docs to log to both stdout and a file.
import logging.config
import structlog
timestamper = structlog.processors.TimeStamper(fmt="iso")
logging.config.dictConfig({
"version": 1,
"disable_existing_loggers": False,
"handlers": {
"default": {
"level": "DEBUG",
"class": "logging.StreamHandler",
},
"file": {
"level": "DEBUG",
"class": "logging.handlers.WatchedFileHandler",
"filename": "test.log",
},
},
"loggers": {
"": {
"handlers": ["default", "file"],
"level": "DEBUG",
"propagate": True,
},
}
})
structlog.configure(
processors=[
structlog.stdlib.add_log_level,
structlog.stdlib.PositionalArgumentsFormatter(),
timestamper,
structlog.processors.StackInfoRenderer(),
structlog.processors.format_exc_info,
structlog.stdlib.ProcessorFormatter.wrap_for_formatter,
],
logger_factory=structlog.stdlib.LoggerFactory(),
wrapper_class=structlog.stdlib.BoundLogger,
cache_logger_on_first_use=True,
)
structlog.get_logger("test").info("hello")
If if you just wanted to log to a file you could use the snippet hynek suggested.
logging.basicConfig(filename='test.log', encoding='utf-8', level=logging.DEBUG)

What does "()" do in python log config

I have seen a python dict log config in uvicorn's source code.
In that, they have defined formatters as
{
"default": {
"()": "uvicorn.logging.DefaultFormatter",
"fmt": "%(levelprefix)s %(asctime)s %(message)s",
"datefmt": "%Y-%m-%d %H:%M:%S",
},
"access": {
"()": "uvicorn.logging.AccessFormatter",
"fmt": '%(levelprefix)s %(asctime)s :: %(client_addr)s - "%(request_line)s" %(status_code)s',
"use_colors": True
},
}
also, we can see, they defined an empty logger ( not sure what should I call it) as,
"": {"handlers": ["default"], "level": "INFO"},
^^^^ - see, Empty key
So, here is my questions,
What does the "()" do in formatters section of python logger?
What does the "" do in loggers section python logger?
This dictionary is used to configure logging with logging.config.dictConfig().
The "()" key indicates that custom instantiation is required [source]:
In all cases below where a ‘configuring dict’ is mentioned, it will be checked for the special '()' key to see if a custom instantiation is required. If so, the mechanism described in User-defined objects below is used to create an instance; otherwise, the context is used to determine what to instantiate.
In the case of the formatter config in the OP's question, the "()" indicates that those classes should be used to instantiate a Formatter.
I do not see the empty string in the loggers section of the dictionary, but here are the related docs:
loggers - the corresponding value will be a dict in which each key is a logger name and each value is a dict describing how to configure the corresponding Logger instance.
The configuring dict is searched for the following keys:
level (optional). The level of the logger.
propagate (optional). The propagation setting of the logger.
filters (optional). A list of ids of the filters for this logger.
handlers (optional). A list of ids of the handlers for this logger.
The specified loggers will be configured according to the level, propagation, filters and handlers specified.
So a "" key in the loggers dictionary would instantiate a logger with the name "", like logging.getLogger("").
One might use a custom logging formatter for a variety of reasons. uvicorn uses a custom formatter to log different levels in different colors. The Python Logging Cookbook has an example of using a custom formatter to use UTC times instead of local times in logging messages.
import logging
import time
class UTCFormatter(logging.Formatter):
converter = time.gmtime
LOGGING = {
...
'formatters': {
'utc': {
'()': UTCFormatter,
'format': '%(asctime)s %(message)s',
},
'local': {
'format': '%(asctime)s %(message)s',
}
},
...
}
if __name__ == '__main__':
logging.config.dictConfig(LOGGING)
logging.warning('The local time is %s', time.asctime())
Here is the output. Note that in the first line, UTC time is used instead of local time, because the UTCFormatter is used.
2015-10-17 12:53:29,501 The local time is Sat Oct 17 13:53:29 2015
2015-10-17 13:53:29,501 The local time is Sat Oct 17 13:53:29 2015

Add stdout of subprocess to JSON report if test case fails

I'm investigating methods of adding to the JSON report generated by either pytest-json or pytest-json-report: I'm not hung up on either plugin. So far, I've done the bulk of my evaluation using pytest-json. So, for example, the JSON object has this for a test case
{
"name": "fixture_test.py::test_failure1",
"duration": 0.0012421607971191406,
"run_index": 2,
"setup": {
"name": "setup",
"duration": 0.00011181831359863281,
"outcome": "passed"
},
"call": {
"name": "call",
"duration": 0.0008759498596191406,
"outcome": "failed",
"longrepr": "def test_failure1():\n> assert 3 == 4, \"3 always equals 3\"\nE AssertionError: 3 always equals 3\nE assert 3 == 4\n\nfixture_test.py:19: AssertionError"
},
"teardown": {
"name": "teardown",
"duration": 0.00014257431030273438,
"outcome": "passed"
},
"outcome": "failed"
}
This is from experiments I'm trying. In practice, some of the test cases are done by spawning a sub-process via Popen and the assert is that a certain string appears in the stdout. In the event that the test case fails, I need to add a key/value to the call dictionary which contains the stdout of that subprocess. I have tried in vain thus far to find the correct fixture or apparatus to accomplish this. It seems that the pytest_exception_interact may be the way to go, but drilling into the JSON structure has thus far eluded me. All I need to do is add/modify the JSON structure at the point of an error. It seems that pytest_runtest_call is too heavy handed.
Alternatively, is there a means of altering the value of longrepr in the above? I've been unable to find the correct way of doing either of these and it's time to ask.
As it would appear, the pytest-json project is rather defunct. The developer/owner of pytest-json-report has this to say (under Related Tools at this link).
pytest-json has some great features but appears to be unmaintained. I borrowed some ideas and test cases from there.
The pytest-json-report project handles exactly the case that I'm requiring: capturing stdout from a subprocess and putting it into the JSON report. A crude example of doing so follows:
import subprocess as sp
import pytest
import sys
import re
def specialAssertHandler(str, assertMessage):
# because pytest automatically captures stdout,stderr this is all that's needed
# when the report is generated, this will be in a field named "stdout"
print(str)
return assertMessage
def test_subProcessStdoutCapture():
# NOTE: if you're version of Python 3 is sufficiently mature, add text=True also
proc = sp.Popen(['find', '.', '-name', '*.json'], stdout=sp.PIPE)
# NOTE: I had this because on the Ubuntu I was using, this is the version of
# Python and the return of proc.stdout.read() is a binary object not a string
if sys.version[0] == 3 and sys.version[6]:
output = proc.stdout.read().decode()
elif sys.version[0] == 2:
# The other version of Python I'm using is 2.7.15, it's exceedingly frustrating
# that the Python language def changed so between 2 and 3. In 2, the output
# was already a string object
output = proc.stdout.read()
m = re.search('some string', output)
assert m is not None, specialAssertHandler(output, "did not find 'some string' in output")
With the above, using the pytest-json-report, the full output of the subprocess is captured by the infrastructure and placed into the afore mentioned report. An excerpt showing this is below:
{
"nodeid": "expirment_test.py::test_stdout",
"lineno": 25,
"outcome": "failed",
"keywords": [
"PyTest",
"test_stdout",
"expirment_test.py"
],
"setup": {
"duration": 0.0002694129943847656,
"outcome": "passed"
},
"call": {
"duration": 0.02718186378479004,
"outcome": "failed",
"crash": {
"path": "/home/afalanga/devel/PyTest/expirment_test.py",
"lineno": 32,
"message": "AssertionError: Expected to find always\nassert None is not None"
},
"traceback": [
{
"path": "expirment_test.py",
"lineno": 32,
"message": "AssertionError"
}
],
"stdout": "./.report.json\n./report.json\n./report1.json\n./report2.json\n./simple_test.json\n./testing_addition.json\n\n",
"longrepr": "..."
},
"teardown": {
"duration": 0.0004875659942626953,
"outcome": "passed"
}
}
The field longrepr holds the full text of the test case but in the interest of brevety, it is made an ellipsis. In the field crash, the value of assertMessage from my example is placed. This shows that it is possible to place such messages into the report at the point of occurrence instead of post processing.
I think it may be possible to "cleverly" handle this using the hook I referenced in my original question pytest_exception_interact. If I find it is so, I'll update this answer with a demonstration.

Format Airflow Logs in JSON

I have a requirement to log the Apache Airflow logs to stdout in JSON format. Airflow does not seem to project this capability out of the box. I have found a couple python modules that are capable of this task, but I cannot get the implementation to work.
Currently, I am applying a class in airflow/utils/logging.py to modify the logger, shown below:
from pythonjsonlogger import jsonlogger
class StackdriverJsonFormatter(jsonlogger.JsonFormatter, object):
def __init__(self, fmt="%(levelname) %(asctime) %(nanotime) %(severity) %(message)", style='%', *args, **kwargs):
jsonlogger.JsonFormatter.__init__(self, fmt=fmt, *args, **kwargs)
def process_log_record(self, log_record):
if log_record.get('level'):
log_record['severity'] = log_record['level']
del log_record['level']
else:
log_record['severity'] = log_record['levelname']
del log_record['levelname']
if log_record.get('asctime'):
log_record['timestamp'] = log_record['asctime']
del log_record['asctime']
now = datetime.datetime.now().strftime('%Y-%m-%dT%H:%M:%S.%fZ')
log_record['nanotime'] = now
return super(StackdriverJsonFormatter, self).process_log_record(log_record)
I am implementing this code in /airflow/settings.py as shown below:
from airflow.utils import logging as logconf
def configure_logging(log_format=LOG_FORMAT):
handler = logconf.logging.StreamHandler(sys.stdout)
formatter = logconf.StackdriverJsonFormatter()
handler.setFormatter(formatter)
logging = logconf.logging.getLogger()
logging.addHandler(handler)
''' code below was original airflow source code
logging.root.handlers = []
logging.basicConfig(
format=log_format, stream=sys.stdout, level=LOGGING_LEVEL)
'''
I have tried a couple different variations of this and can't get the python-json-logger to transform the logs to JSON. Perhaps I'm not getting to the root logger? Another option I have considered is manually formatting the logs to a JSON string. No luck with that yet either. Any alternative ideas, tips, or support are appreciated.
Cheers!
I don't know if you ever solved this problem, but after some frustrating tinkering, I ended up getting this to play nice with airflow. For reference, I followed a lot of this article to get it working: https://www.astronomer.io/guides/logging/. The main issue was that the airflow logging only accepts a string template for the logging format, which json-logging can't plug into. So you have to create your own logging classes and connect it to a custom logging config class.
Copy the log template here into your $AIRFLOW_HOME/config folder, and change DEFAULT_CONFIG_LOGGING to CONFIG_LOGGING. When you're successful, bring up airflow and you'll get a log message on airflow startup that says Successfully imported user-defined logging config from logging_config.LOGGING_CONFIG. If this is the first .py file in the config folder don't forget to add a blank __init__.py file to get python to pick it up
Write your custom JsonFormatter to inject into your handler. I did mine off of this one.
Write the custom log handler classes. Since I was looking for JSON logging, mine look like this:
from airflow.utils.log.file_processor_handler import FileProcessorHandler
from airflow.utils.log.file_task_handler import FileTaskHandler
from airflow.utils.log.logging_mixin import RedirectStdHandler
from pythonjsonlogger import jsonlogger
class JsonStreamHandler(RedirectStdHandler):
def __init__(self, stream):
super(JsonStreamHandler, self).__init__(stream)
json_formatter = CustomJsonFormatter('(timestamp) (level) (name) (message)')
self.setFormatter(json_formatter)
class JsonFileTaskHandler(FileTaskHandler):
def __init__(self, base_log_folder, filename_template):
super(JsonFileTaskHandler, self).__init__(base_log_folder, filename_template)
json_formatter = CustomJsonFormatter('(timestamp) (level) (name) (message)')
self.setFormatter(json_formatter)
class JsonFileProcessorHandler(FileProcessorHandler):
def __init__(self, base_log_folder, filename_template):
super(JsonFileProcessorHandler, self).__init__(base_log_folder, filename_template)
json_formatter = CustomJsonFormatter('(timestamp) (level) (name) (message)')
self.setFormatter(json_formatter)
class JsonRotatingFileHandler(RotatingFileHandler):
def __init__(self, filename, mode, maxBytes, backupCount):
super(JsonRotatingFileHandler, self).__init__(filename, mode, maxBytes, backupCount)
json_formatter = CustomJsonFormatter('(timestamp) (level) (name) (message)')
self.setFormatter(json_formatter)
Hook them up to the logging configs in your custom logging_config.py file.
'handlers': {
'console': {
'class': 'logging_handler.JsonStreamHandler',
'stream': 'sys.stdout'
},
'task': {
'class': 'logging_handler.JsonFileTaskHandler',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
},
'processor': {
'class': 'logging_handler.JsonFileProcessorHandler',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
}
}
...
and
DEFAULT_DAG_PARSING_LOGGING_CONFIG = {
'handlers': {
'processor_manager': {
'class': 'logging_handler.JsonRotatingFileHandler',
'formatter': 'airflow',
'filename': DAG_PROCESSOR_MANAGER_LOG_LOCATION,
'mode': 'a',
'maxBytes': 104857600, # 100MB
'backupCount': 5
}
}
...
And json logs should be output, both in the DAG logs and the output as well.
Hope this helps!

Organizing my config variable for webapp2

For simplicity I think I need to rewrite this to just one statement
config = {'webapp2_extras.jinja2': {'template_path': 'templates',
'filters': {
'timesince': filters.timesince,
'datetimeformat': filters.datetimeformat},
'environment_args': {'extensions': ['jinja2.ext.i18n']}}}
config['webapp2_extras.sessions'] = \
{'secret_key': 'my-secret-key'}
Then I want to know where to put it if I use multiple files with multiple request handlers. Should I just put it in one file and import it to the others? Since the session code is secret, what are your recommendation for handling it via source control? To always change the secret before or after committing to source control?
Thank you
Just add 'webapp2_extras.sessions' to your dict initializer:
config = {'webapp2_extras.jinja2': {'template_path': 'templates',
'filters': {
'timesince': filters.timesince,
'datetimeformat': filters.datetimeformat},
'environment_args': {'extensions': ['jinja2.ext.i18n']}},
'webapp2_extras.sessions': {'secret_key': 'my-secret-key'}}
This would be clearer if the nesting were explicit, though:
config = {
'webapp2_extras.jinja2': {
'template_path': 'templates',
'filters': {
'timesince': filters.timesince,
'datetimeformat': filters.datetimeformat
},
'environment_args': {'extensions': ['jinja2.ext.i18n']},
},
'webapp2_extras.sessions': {'secret_key': 'my-secret-key'}
}
I would recommend storing those in a datastore Entity for more flexibility and caching them in the instance memory at startup.
You could also consider having a config.py file excluded from the source control, if you want to get things done quickly.

Categories