Scrapy framework - Colorize logging

Scrapy framework - Colorize logging - python

I am trying to make Scrapy output colorized logs. I am not so familiar with Python logging, but my understanding is that I must make my own Formatter and make it use by Scrapy. I succeeded in making a Formatter to colorized the output using Clint.
My problem is that I can't make it work within Scrapy correctly. I would have expected the logger object in my spider to have a handler, then I would have switched the formatter of that handler. When I looks what is inside spider.logger.logger, I see that handler is an empty list. I tried to add my formatter in a new stream handler doing.
crawler.spider.logger.logger.addHandler(sh)
where sh is a handler using my color formatter.
This have for effect to make scrappy output each messages twice. First message is colorized but doesn't have Scrapy formatting. The second one has Scrapy formatting with no colors.
How can I make Scrapy output colorized logs keeping the same format that can be set in settings.py
Thanks

If you mean to colorize LogRecord only, you can customize LOG_FORMAT in settings.py with ANSI escape codes.
Example:
LOG_FORMAT = '\x1b[0;0;34m%(asctime)s\x1b[0;0m \x1b[0;0;36m[%(name)s]\x1b[0;0m \x1b[0;0;31m%(levelname)s\x1b[0;0m: %(message)s'
If you also want to colorize different log levels with different colors, you can override scrapy.utils.log._get_handler(source code).
Put this near the top of your settings.py
import scrapy.utils.log
_get_handler = copy.copy(scrapy.utils.log._get_handler)
def _get_handler_custom(*args, **kwargs):
handler = _get_handler(*args, **kwargs)
handler.setFormatter(your_custom_formatter)
return handler
scrapy.utils.log._get_handler = _get_handler_custom
What it does is reset the formatter after calling the original _get_handler, and then reattach it to scrapy.utils.log.
This is a hacky solution and might not be the best practice, but it just works.
A more proper way to achieve this is to override logging.StreamHandler. There is a bunch of discussion on SO which can lead you to the right direction.
Here I provide my full working codes used in my projects (a third-party package colorlog is in use).
settings.py
import copy
from colorlog import ColoredFormatter
import scrapy.utils.log
color_formatter = ColoredFormatter(
(
'%(log_color)s%(levelname)-5s%(reset)s '
'%(yellow)s[%(asctime)s]%(reset)s'
'%(white)s %(name)s %(funcName)s %(bold_purple)s:%(lineno)d%(reset)s '
'%(log_color)s%(message)s%(reset)s'
),
datefmt='%y-%m-%d %H:%M:%S',
log_colors={
'DEBUG': 'blue',
'INFO': 'bold_cyan',
'WARNING': 'red',
'ERROR': 'bg_bold_red',
'CRITICAL': 'red,bg_white',
}
)
_get_handler = copy.copy(scrapy.utils.log._get_handler)
def _get_handler_custom(*args, **kwargs):
handler = _get_handler(*args, **kwargs)
handler.setFormatter(color_formatter)
return handler
scrapy.utils.log._get_handler = _get_handler_custom

Related

Set scrapy built-in loggers at different level than user code logger

Scrapy built-in loggers:
scrapy.utils.log
scrapy.crawler
scrapy.middleware
scrapy.core.engine
scrapy.extensions.logstats
scrapy.extensions.telnet
scrapy.core.scraper
scrapy.statscollectors
are very verbose.
I was trying to set a different log level, DEBUG, than user spider log level, INFO. This way I can reduce the 'noise'.
This helper function works, some times:
def set_loggers_level(level=logging.DEBUG):
loggers = [
'scrapy.utils.log',
'scrapy.crawler',
'scrapy.middleware',
'scrapy.core.engine',
'scrapy.extensions.logstats',
'scrapy.extensions.telnet',
'scrapy.core.scraper',
'scrapy.statscollectors'
]
for logger_name in loggers:
logger = logging.getLogger(logger_name)
logger.setLevel(level)
for handler in logger.handlers:
handler.setLevel(level)
I call it from UserSpider init:
class UserSpider(scrapy.Spider):
def __init__(self, *args, **kwargs):
# customize loggers: Some loggers can't be reset a this point
helpers.set_loggers_level()
super(UserSpider, self).__init__(*args, **kwargs)
This approach works some time, others not.
What will be the correct solution?

You can just set LOG_LEVEL appropriately in your settings.py, read more here: https://doc.scrapy.org/en/latest/topics/settings.html#std:setting-LOG_LEVEL
LOG_LEVEL
Default: 'DEBUG'
Minimum level to log. Available levels are: CRITICAL, ERROR, WARNING, INFO, DEBUG. For more info see Logging.
If project wide settings are not focused enough, you can set them per-spider by using custom_settings:
class MySpider(scrapy.Spider):
name = 'myspider'
custom_settings = {
'LOG_LEVEL': 'INFO',
}
Source:
https://doc.scrapy.org/en/latest/topics/settings.html#settings-per-spider

Settings different log levels per Log handlers is not very realiable.
At the end of the day the better approach will be launch scrapy cli tool from another script and filter logs output with a parser has needed.

I stumbled upon the same issue. I tried various method but it looks like since Scrapy uses logging module, you have to set it at global level which result in Scrapy to print all the debug information.
I've found more reliable solution to use bool flag with print statement fro DEBUG and use logger for INFO, ERROR and WARNING.

Instance of 'RootLogger' has no 'trace' member (but some types could not be inferred)

I have the following test script, which mimics what I usually do to setup an extended logger:
import logging
TRACE_LL = 25
TRACE_LSTR = 'TRACE'
LOG_FORMATTER = '%(asctime)s - %(levelname)-10s - %(message)s'
class MyLogger(logging.Logger):
def __init__(self, log_name):
logging.Logger.__init__(self, log_name)
self.setLevel(TRACE_LL)
hdlr = logging.StreamHandler()
formatter = logging.Formatter(LOG_FORMATTER)
hdlr.setFormatter(formatter)
self.addHandler(hdlr)
def trace(self, txt, *args, **kwargs):
self.log(TRACE_LL, txt, *args, **kwargs)
def getlog(name):
return logging.getLogger(name)
def setup():
logging.setLoggerClass(MyLogger)
logging.addLevelName(TRACE_LL, TRACE_LSTR)
setup()
log = getlog('mylog')
log.trace('Trace this')
Running this works as expected:
2014-01-07 07:22:59,982 - TRACE - Trace this
But running pylint on this causes trouble:
» pylint -E getlog_test.py
No config file found, using default configuration
************* Module getlog_test
E: 29,0: Instance of 'RootLogger' has no 'trace' member (but some types could not be inferred)
I get hundreds of those messages in my codebase, because I am using logging extensively.
How can I solve the pylint error?
As an alternative, disabling it would also be enough, but only for the RootLogger instance: I still want to know if other parts of the code have this problem.

Although I find pylint valuable, in my experience, it produces many false warnings and errors. You can disable one or more checks for any lines of your code by surrounding that code with comments of the form:
# pylint: disable=E1103
code that pylint trips over
# pylint: enable=E1103
where E1103 is the error to be suppressed. You can suppress multiple errors the same way with a comma-separated list of error codes. Pylint's documentation on this is here.

This is because Pylint is not smart enough to grasp that you'll get MyLogger instances on calling getLogger.
Beside using inline enabling/disabling of messages, which is easier at first, you may want to take a look at the pylint-brain project (https://bitbucket.org/logilab/pylint-brain).
You'll find there how to write a little astroid plugin that could add a 'trace' method to default logging loggers (or even better, inform that logging.getLogger() return MyLogger instance but that is a bit more tricky).
This is definitly better in the long run.

Python redirecting log

I am running a web server, tornado, and I am trying to redirect all the log output to a file using the following command. But I don't see the output in the file.
/usr/bin/python -u index.py 2>&1 >> /tmp/tornado.log
I pass -u option to python interpreter. I still don't see any output logged to my log file.
However, I see the output on stdout when I do the following
/usr/bin/python index.py

Tornado uses the built-in logging module. You can easily attach a file handler to the root logger and set its level to NOTSET so it records everything, or some other level if you want to filter.
Reference docs: logging, logging.handlers
Example that works with Tornado's logging:
import logging
# the root logger is created upon the first import of the logging module
# create a file handler to add to the root logger
filehandler = logging.FileHandler(
filename = 'test.log',
mode = 'a',
encoding = None,
delay = False
)
# set the file handler's level to your desired logging level, e.g. INFO
filehandler.setLevel(logging.INFO)
# create a formatter for the file handler
formatter = logging.Formatter('%(asctime)s.%(msecs)d [%(name)s](%(process)d): %(levelname)s: %(message)s')
# add filters if you want your handler to only handle events from specific loggers
# e.g. "main.sub.classb" or something like that. I'll leave this commented out.
# filehandler.addFilter(logging.Filter(name='root.child'))
# set the root logger's level to be at most as high as your handler's
if logging.root.level > filehandler.level:
logging.root.setLevel = filehandler.level
# finally, add the handler to the root. after you do this, the root logger will write
# records to file.
logging.root.addHandler(filehandler)
More often than not, I actually wish to suppress tornado's loggers (because I have my own, and catch their exceptions anyway, and they just end up polluting my logs,) and this is where adding a filter on your filehandlers can come in very handy.

Redirect logging to console during command

In my Django application I have set up my logging to log all levels to a file, which works well.
During management commands (and only there), I want to log (some levels) to the console aswell.
How can I (dynamically) set up the logging to achieve this?

It was actually quite easy, all I had to do was to add a new handler to each logger I wanted to redirect:
loggernames = [ ... ]
level = logging.DEBUG
handler = logging.StreamHandler()
handler.setLevel(level)
handler.setFormatter(logging.Formatter('%(levelname)s: %(message)s'))
for name in loggernames:
logging.getLogger(name).addHandler(handler)

Extending the Python Logger

I'm looking for a simple way to extend the logging functionality defined in the standard python library. I just want the ability to choose whether or not my logs are also printed to the screen.
Example: Normally to log a warning you would call:
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s: %(message)s', filename='log.log', filemode='w')
logging.warning("WARNING!!!")
This sets the configurations of the log and puts the warning into the log
I would like to have something along the lines of a call like:
logging.warning("WARNING!!!", True)
where the True statement signifys if the log is also printed to stdout.
I've seen some examples of implementations of overriding the logger class
but I am new to the language and don't really follow what is going on, or how to implement this idea. Any help would be greatly appreciated :)

The Python logging module defines these classes:
Loggers that emit log messages.
Handlers that put those messages to a destination.
Formatters that format log messages.
Filters that filter log messages.
A Logger can have Handlers. You add them by invoking the addHandler() method. A Handler can have Filters and Formatters. You similarly add them by invoking the addFilter() and setFormatter() methods, respectively.
It works like this:
import logging
# make a logger
main_logger = logging.getLogger("my logger")
main_logger.setLevel(logging.INFO)
# make some handlers
console_handler = logging.StreamHandler() # by default, sys.stderr
file_handler = logging.FileHandler("my_log_file.txt")
# set logging levels
console_handler.setLevel(logging.WARNING)
file_handler.setLevel(logging.INFO)
# add handlers to logger
main_logger.addHandler(console_handler)
main_logger.addHandler(file_handler)
Now, you can use this object like this:
main_logger.info("logged in the FILE")
main_logger.warning("logged in the FILE and on the CONSOLE")
If you just run python on your machine, you can type the above code into the interactive console and you should see the output. The log file will get crated in your current directory, if you have permissions to create files in it.
I hope this helps!

It is possible to override logging.getLoggerClass() to add new functionality to loggers. I wrote simple class which prints green messages in stdout.
Most important parts of my code:
class ColorLogger(logging.getLoggerClass()):
__GREEN = '\033[0;32m%s\033[0m'
__FORMAT = {
'fmt': '%(asctime)s %(levelname)s: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
}
def __init__(self, format=__FORMAT):
formatter = logging.Formatter(**format)
self.root.setLevel(logging.INFO)
self.root.handlers = []
(...)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
self.root.addHandler(handler)
def info(self, message):
self.root.info(message)
(...)
def info_green(self, message):
self.root.info(self.__GREEN, message)
(...)
if __name__ == '__main__':
logger = ColorLogger()
logger.info("This message has default color.")
logger.info_green("This message is green.")

Handlers send the log records (created by loggers) to the appropriate
destination.
(from the docs: http://docs.python.org/library/logging.html)
Just set up multiple handlers with your logging object, one to write to file, another to write to the screen.
UPDATE
Here is an example function you can call in your classes to get logging set up with a handler.
def set_up_logger(self):
# create logger object
self.log = logging.getLogger("command")
self.log.setLevel(logging.DEBUG)
# create console handler and set min level recorded to debug messages
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
# add the handler to the log object
self.log.addHandler(ch)
You would just need to set up another handler for files, ala the StreamHandler code that's already there, and add it to the logging object. The line that says ch.setLevel(logging.DEBUG) means that this particular handler will take logging messages that are DEBUG or higher. You'll likely want to set yours to WARNING or higher, since you only want the more important things to go to the console. So, your logging would work like this:
self.log.info("Hello, World!") -> goes to file
self.log.error("OMG!!") -> goes to file AND console

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Scrapy framework - Colorize logging - python

Related

Set scrapy built-in loggers at different level than user code logger

Instance of 'RootLogger' has no 'trace' member (but some types could not be inferred)

Python redirecting log

Redirect logging to console during command

Extending the Python Logger

Categories

Resources