I am trying to figure out the best approach to apply some custom processing on Python logging messages with minimal impact to our codebase.
The problem is this: we have many different projects logging a lot of things, and among these can be found some AWS keys. As a security requirement, we need to strip out all AWS keys from the logs, and there are multiple ways to go about this:
The naive approach would be to go in each and every project, and modify each logging call to manually strip out keys. This is the least preferred approach as it would be the most manual.
Implement a different module that provides the same function as the logging module (like info, error, ...) and each function definition would first apply a regex to filter out AWS keys, and then call the actual logging method behind the scenes. Then each project can be modified to something like import custom_logging_module as logging and none of the logging calls need to be modified. The drawback of this approach though is that it looks like every logging call comes from this module in the log, so you can't track where your messages originate from.
Not sure in what form yet, but it sounds like it would be possible to implement a custom Logger or LogRecord and register it when initializing the logging. This wouldn't have the problems of the previous approach.
I have done some research on approach #3 but couldn't really find a way to do this. Does anyone have experience applying some custom processing on logging messages that would apply to this use case?
You could use a custom LogRecord class to achieve this, as long as you could identify keys in text unambiguously. For example:
import logging
import re
KEY = 'PK_SOME_PUBLIC_KEY'
SECRET_KEY = 'SK_SOME_PRIVATE_KEY'
class StrippingLogRecord(logging.LogRecord):
pattern = re.compile(r'\b[PS]K_\w+\b', re.I)
def getMessage(self):
message = super(StrippingLogRecord, self).getMessage()
message = self.pattern.sub('-- key redacted --', message)
return message
if hasattr(logging, 'setLogRecordFactory'):
# 3.x has this
logging.setLogRecordFactory(StrippingLogRecord)
else:
# 2.x needs monkey-patching
logging.LogRecord = StrippingLogRecord
logging.basicConfig(level=logging.DEBUG)
logging.debug('Message with a %s', KEY)
logging.debug('Message with a %s', SECRET_KEY)
In my example I've assumed you could use a simple regex to spot keys, but a more sophisticated alternative method could be used if that's not workable.
Note that the above code should be run before any of the code which logs keys.
Related
I am new to python and just trying to learn and find better ways to write code. I want to create a custom class for logging and use package logging inside it. I want the function in this class to be reusable and do my logging from other scripts rather than writing custom code in each and every scripts. Is there a good link you guys can share? Is this the right way to handle logging? I just want to avoid writing the same code in every script if I can reuse it from one module.
I would highly appreciate any reply.
You can build a custom class that utilizes the built in python logging library. There isn't really any right way to handle logging as the library allows you to use 5 standard levels indicating the severity of events (DEBUG, INFO, WARNING, ERROR, and CRITICAL). The way you use these levels are application specific. Here's another good explanation of the package.
It's indeed a good idea to keep all your logging configuration (formatters, level, handlers) in one place.
create a class wrapping a custom logger with your configuration
expose methods for logging with different levels
import this class wherever you want
create an instance of this class to log where you want
To make sure all you custom logging objects have the same config, you should make logging class own the configuration.
I don't think there's any links I can share for the whole thing but you can find links for the individual details I mentioned easily enough.
This question geared toward OOP best practices.
Background:
I've created a set of scripts that are either automatically triggered by cronjobs or are constantly running in the background to collect data in real time. In the past, I've used Python's smtplib to send myself notifications when errors occur or a job is successfully completed. Recently, I migrated these programs to the Google Cloud platform which by default blocks popular SMTP ports. To get around this I used linux's mail command to continue sending myself the reports.
Originally, my hacky solution was to have two separate modules for sending alerts that were initiated based on an argument I passed to the main script.
Ex:
$ python mycode.py my_arg
if sys.argv[1] == 'my_arg':
mailer = Class1()
else:
mailer = Class2()
I want to improve upon this and create a module that automatically handles this without the added code. The question I have is whether it is "proper" to include a conditional statement while initializing the class to handle the situation.
Ex:
Class Alert(object):
def __init__(self, sys.platform, other_args):
# Google Cloud Platform
if sys.platform == "linux":
#instantiate Class1 variables and methods
#local copy
else:
#instantiate Class2 variables and methods
My gut instinct says this is wrong but I'm not sure what the proper approach would be.
I'm mostly interested in answers regarding how to create OO classes/modules that handle environmental dependencies to provide the same service. In my case, a blocked port requires a different set of code altogether.
Edit: After some suggestions here are my favorite readings on this topic.
http://python-3-patterns-idioms-test.readthedocs.io/en/latest/Factory.html
This seems like a wonderful use-case for a factory class, which encapsulates the conditional, and always returns an instance of one of N classes, all of which implement the same interface, so that the rest of your code can use it without caring about the concrete class being used.
This is a way to do it. But I would rather use something like creating a dynamic class instance. To do that, you could have only one class instead of selecting from two different classes. The class would then take some arguments and return the result depending the on the arguments provided. There are quite some examples out there and I'm sure you can use them in your use-case. Try searching for how to create a dynamic class in python.
The python logging library allows to log based on different levels:
https://docs.python.org/3/howto/logging.html#logging-levels
But I would like to use it to log based on custom tags, for example "show_intermediate_results" or "display_waypoints_in_the_code" or "report_time_for_each_module" and so on...
Those tags cannot be measured in a severity ladder, during development i would sometimes want to see them and sometimes not depending on what i am developing/debugging at the moment.
So the question is if I can use the logging library to do that?
Btw, i DO want to use the library and not write something by myself because i want it to be thread safe.
As per the documentation, you can use logging.Filter objects with Logger and Handler instances
for more sophisticated filtering than is provided by levels.
This is a Python question but a Django-specific solution is acceptable.
For a class I'm writing I would like to prefix log output on a per-instance basis. I do not want to interfere with the logging destinations that were set up. These are the solutions I can think of:
create a new logger as a sub-logger of the module, reconfigure the parent handlers with different formatters: mylog.info("foo") => prefixfoo
create a wrapper log class with the info(), warn() etc methods, each adding the prefix before calling the wrapped logger: mylog.info("foo")
store the prefix in the instance and manually add: log.info(self.p+"foo")
create a prefix-adding function that I manually wrap all log calls with: log.info(p("foo"))
Obviously I prefer solution 1 but I don't know how to do that.
What is the best solution? I'm a newbie Python programmer so I'm probably trying to solve the wrong problem :-)
I'd like something equivalent to
calling method: $METHOD_NAME
args: $ARGS
output: $OUTPUT
to be automatically logged to a file (via the logging module, possibly) for every (user-defined) method call. The best solution I can come up with is to write a decorator that will do this, and then add it to every function. Is there a better way?
Thanks
You could look at the trace module in the standard library, which
allows you to trace program execution, generate annotated statement coverage listings, print caller/callee relationships and list functions executed during a program run. It can be used in another program or from the command line.
You can also log to disk:
import sys
import trace
# create a Trace object, telling it what to ignore, and whether to
# do tracing or line-counting or both.
tracer = trace.Trace(
ignoredirs=[sys.prefix, sys.exec_prefix],
trace=0,
count=1)
# run the new command using the given tracer
tracer.run('main()')
# make a report, placing output in /tmp
r = tracer.results()
r.write_results(show_missing=True, coverdir="/tmp")
One approach that might simplify things a bit would be to use a metaclass to automatically apply your decorator for you. It'd cut down on the typing at the expense of requiring you to delve into the arcane and mysterious world of metaclass programming.
It depends how exactly are you going to use it.
Most generic approach would be to follow stdlib's 'profile' module path and therefore have control over each call, but its somewhat slow.
If you know which modules you need to track before giving them control, I'd go with iterating over all their members and wrapping with tracking decorator. This way tracked code stays clean and it doesn't take too much coding to implement.
A decorator would be a simple approach for a smaller project, however with decorators you have to be careful about passing arguments to make sure that they don't get mangled on their way through. A metaclass would probably be more of the "right" way to do it without having to worry about adding decorators to every new method.