Best practices for logging in Python in shared libraries

Best practices for logging in Python in shared libraries - python

In a nutshell, I write ETL pipelines. They are usually described in high-level scripts. In them, I use different internal libraries (we manage them) that provide utility functions, tooling or internal data structure.
What are the common best practices about logging when dealing with multiple packages import from different repositories?
My questions are:
1) Should I put logs in libraries? Or only in top-level scripts?
On one hand, It could be useful to display some information in some library functions/classes. On the other hand, it imposes the library client usage of a particular logger.
I checked a few open-source projects and it seems that there are no logs at all.
2) If we indeed put logs in all shared libraries, what is the best practices in Python to pass a unique logger to everything?
I want my logging format and strategy to be consistent in each library call as everything is run as part "as a whole". Should I init my logger in the main script and pass the same logger in every object I create? It seems redundant to me. I saw another pattern where all classes that need logging would inherit from a logging class. It seems to me that it might overkill and complicates the overall architecture.
I read in another stackoverflow that Actually every logger is a child of the parent's package logger. How to apply that when the packages come different repositories?
thanks

Add a logger with no handlers (or with just the null handler) to the library and do all the internal logging with that. Give it a name that is related to the library. When you do that any app that uses the lib can get the logger and add a handler to access the logs as needed.
An example would be the requests library which does something similar to that.
import logging
import requests
r = logging.getLogger('requests')
r.addHandler(logging.StreamHandler())
r.setLevel(logging.DEBUG)
requests.get('http://stackoverflow.com')
will print
Starting new HTTP connection (1): stackoverflow.com
http://stackoverflow.com:80 "GET / HTTP/1.1" 301 143
Starting new HTTPS connection (1): stackoverflow.com
https://stackoverflow.com:443 "GET / HTTP/1.1" 200 23886

Related

Python logging, how to prevent functions from library from logging?

I'm making a project and using a library from the requirements of the project. The library implements logging and automatically logs to a file, when I call its functions.
Now, I'm implementing logging by myself, and only want my own messages to be logged to the file and not the library messages.
One solution I thought of would be switching the logging file each time I call a function from the library and then removing that file, but that seems overly complicated and clutterly. What can I do in order to not log the messages from the library?
P.S.:
I'm using the logging library and I initalize it as:
logging.basicConfig(level = logging.INFO,filename = loggingFile,format = "%(message)s")
, which means, all messages from myself and from the library will get logged in loggingFile

Libraries should not directly output anything to logs - that should be done only by handlers configured by an application. A library that logs output is an anti-pattern - if a library definitely does that, I'd log a bug against that library's issue tracker.
On the other hand, it might be that the library is only outputting stuff because you have configured output via your basicConfig() call. If you need more than basic configuration, don't usebasicConfig() - use the other APIs provided.

Pyramid override default requests log to add new parameters

So I use Pyramid and I need to log all outgoing requests. I added this to configuration.ini:
[logger_requests]
level = DEBUG
handlers = console
qualname = urllib3
And this works fine.
1 2019-12-19T14:44:14.888+02:00 kazibo-msi APPNAME - DEBUG [urllib3.connectionpool][139843373852416 route="/status" x_request_id="9f7286e1-c6be-4136-83ba-2666fe1f854f"] https://website.com:443 "GET /rest/billing/debt/health HTTP/1.1" 200 1502
But I also need to log the time elapsed making request. Using requests package I can do it like that:
requests.get(url='https://somewebsite.com/data').elapsed
But how can I add this information to log now? I know about the option to add logger.log(...) but I would like to avoid it.

For code that I control I'd usually wrap things in my own utility that I can instrument instead of trying to patch/modify how urllib3 works or performs its own logging. This could be just a few functions you use across the codebase or a custom requests.Session subclass, etc.

Should library loggers have only null handlers?

I am developing a new library and I am completely new to the concept of logging.
I have added logging using Python's logging module for same. The logging I gave has a specific FileHandler set for debug level and StreamHandler set at Warning level. Python documentation about logging says libraries should have only Null Handlers.
Here is the documentation link https://docs.python.org/3/howto/logging.html#library-config
Will it be a problem if I still have an exclusive file and stream handlers in my library.
I am not able to understand why one should create logs in libraries if they cannot have their own customized handlers.
It would be very helpful if someone could clear my understanding gap about implementing logging in libraries.
A secondary question: How will an application developer who uses my library be able to access/enable the logs that I created in the library if I set Null handler?

to your first question - [from the python docs] -
"The application developer knows their target audience and what
handlers are most appropriate for their application: if you add
handlers ‘under the hood’, you might well interfere with their ability
to carry out unit tests and deliver logs which suit their
requirements." 1
as a user of your library, I may want to show logs from your_pkg.foo.baz,
but not your_pkg.foo module.
adding handlers from within your library may force me to do that (depending on the log level that was set to the loggers and handlers).
to your second question -
adding a Nullhandler allows a user to choose his custom logging needs by configuring new handlers through the logging.get_logger("your_pkg.foo.baz").add_handler(...).
to fully understand the logging mechanism (loggers, handlers, filters, and propagation)-
you could look here -
logging flow

Starting app engine modules in Google App Engine

App engine "modules" are a new (and experimental, and confusingly-named) feature in App Engine: https://developers.google.com/appengine/docs/python/modules. Developers are being urged to convert use of the "backends" feature to use of this new feature.
There seem to be two ways to start an instance of a module: to send a HTTP request to it (i.e. at http://modulename.appname.appspot.com for the appname application and modulename module), or to call google.appengine.api.modules.start_module().
The Simple Way
The simple way to start an instance of a module would seem to be to create an HTTP request. However, in my case this results in only two outcomes, neither of which is what I want:
If I use the name of the backend that my application defines, i.e. http://backend.appname.appspot.com, the request is properly routed to the backend and properly denied (because backend access is defined by default to be private).
Anything else results in the request being routed to the sole frontend instance of the default module, even using random character strings as module names, such as http://sdlsdjfsldfsdf.appname.appspot.com. This even holds for made-up instance IDs such as in the case of http://99.sdlsdjfsldfsdf.appname.appspot.com, etc. And of course (this is the problem) for the actual name of my module as well.
Starting via the API
The documentation says that calling start_module() with the name of a module and version should cause the specified version of the specified module to start up. However, I'm getting an UnexpectedStateError whenever I call this function with valid arguments.
The Unfortunate State of Affairs
Because I can't get this to work, I'm wondering if there is some subtlety that the documentation might not have mentioned. My setup is pretty straightforward, so I'm wondering if this is a widespread problem to which someone has found a solution.

It turns out that versions cannot be numeric. This problem seems to have been happening because our module's version was "1" and not (for example) "v1".

With modules, they changed the terminology around a little bit. What used to be "backends" are now "basic scaling" or "manual scaling" instances.
"Automatic scaling" and "basic scaling" instances start when they process a request, while "manual scaling" instances run constantly.
Generally to start an instance you would send an HTTP request to your module's URL.
start_module() seems to have limited use for modules with "manual scaling" instances, or restarting modules that have been stopped with stop_module().

You can add:
login: admin
To the handler for your backend. This way an admin user can call your backend and trigger it to run. With login: admin, you can also have issue URLFetch requests froom elsewhwere in your app (ie from a frontend) trigger your backend.

How do I get urllib2 to log ALL transferred bytes

I'm writing a web-app that uses several 3rd party web APIs, and I want to keep track of the low level request and responses for ad-hock analysis. So I'm looking for a recipe that will get Python's urllib2 to log all bytes transferred via HTTP. Maybe a sub-classed Handler?

Well, I've found how to setup the built-in debugging mechanism of the library:
import logging, urllib2, sys
hh = urllib2.HTTPHandler()
hsh = urllib2.HTTPSHandler()
hh.set_http_debuglevel(1)
hsh.set_http_debuglevel(1)
opener = urllib2.build_opener(hh, hsh)
logger = logging.getLogger()
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.NOTSET)
But I'm still looking for a way to dump all the information transferred.

This looks pretty tricky to do. There are no hooks in urllib2, urllib, or httplib (which this builds on) for intercepting either input or output data.
The only thing that occurs to me, other than switching tactics to use an external tool (of which there are many, and most people use such things), would be to write a subclass of socket.socket in your own new module (say, "capture_socket") and then insert that into httplib using "import capture_socket; import httplib; httplib.socket = capture_socket". You'd have to copy all the necessary references (anything of the form "socket.foo" that is used in httplib) into your own module, but then you could override things like recv() and sendall() in your subclass to do what you like with the data.
Complications would likely arise if you were using SSL, and I'm not sure whether this would be sufficient or if you'd also have to make your own socket._fileobject as well. It appears doable though, and perusing the source in httplib.py and socket.py in the standard library would tell you more.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.