Scrapy built-in loggers:
scrapy.utils.log
scrapy.crawler
scrapy.middleware
scrapy.core.engine
scrapy.extensions.logstats
scrapy.extensions.telnet
scrapy.core.scraper
scrapy.statscollectors
are very verbose.
I was trying to set a different log level, DEBUG, than user spider log level, INFO. This way I can reduce the 'noise'.
This helper function works, some times:
def set_loggers_level(level=logging.DEBUG):
loggers = [
'scrapy.utils.log',
'scrapy.crawler',
'scrapy.middleware',
'scrapy.core.engine',
'scrapy.extensions.logstats',
'scrapy.extensions.telnet',
'scrapy.core.scraper',
'scrapy.statscollectors'
]
for logger_name in loggers:
logger = logging.getLogger(logger_name)
logger.setLevel(level)
for handler in logger.handlers:
handler.setLevel(level)
I call it from UserSpider init:
class UserSpider(scrapy.Spider):
def __init__(self, *args, **kwargs):
# customize loggers: Some loggers can't be reset a this point
helpers.set_loggers_level()
super(UserSpider, self).__init__(*args, **kwargs)
This approach works some time, others not.
What will be the correct solution?
You can just set LOG_LEVEL appropriately in your settings.py, read more here: https://doc.scrapy.org/en/latest/topics/settings.html#std:setting-LOG_LEVEL
LOG_LEVEL
Default: 'DEBUG'
Minimum level to log. Available levels are: CRITICAL, ERROR, WARNING, INFO, DEBUG. For more info see Logging.
If project wide settings are not focused enough, you can set them per-spider by using custom_settings:
class MySpider(scrapy.Spider):
name = 'myspider'
custom_settings = {
'LOG_LEVEL': 'INFO',
}
Source:
https://doc.scrapy.org/en/latest/topics/settings.html#settings-per-spider
Settings different log levels per Log handlers is not very realiable.
At the end of the day the better approach will be launch scrapy cli tool from another script and filter logs output with a parser has needed.
I stumbled upon the same issue. I tried various method but it looks like since Scrapy uses logging module, you have to set it at global level which result in Scrapy to print all the debug information.
I've found more reliable solution to use bool flag with print statement fro DEBUG and use logger for INFO, ERROR and WARNING.
Related
Overview
I want to use httpimport as a logging library common to several scripts. This module generates logs of its own which I do not know how to silence.
In other cases such as this one, I would have used
logging.getLogger('httpimport').setLevel(logging.ERROR)
but it did not work.
Details
The following code is a stub of the "common logging code" mentioned above:
# toconsole.py
import logging
import os
log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s %(message)s')
handler_console = logging.StreamHandler()
level = logging.DEBUG if 'DEV' in os.environ else logging.INFO
handler_console.setLevel(level)
handler_console.setFormatter(formatter)
log.addHandler(handler_console)
# disable httpimport logging except for errors+
logging.getLogger('httpimport').setLevel(logging.ERROR)
A simple usage such as
import httpimport
httpimport.INSECURE = True
with httpimport.remote_repo(['githublogging'], 'http://localhost:8000/') :
from toconsole import log
log.info('yay!')
gives the following output
[!] Using non HTTPS URLs ('http://localhost:8000//') can be a security hazard!
2019-08-25 13:56:48,671 yay!
yay!
The second (bare) yay! must be coming from httpimport, namely from its logging setup.
How can I disable the logging for such a module, or better - raise its level so that only errors+ are logged?
Note: this question was initially asked at the Issues section of the GitHub repository for httpimport but the author did not know either how to fix that.
Author of httpimport here.
I totally forgot I was using the basicConfig logger thing.
It is fixed in master right now (0.7.2) - will be included in next PyPI release:
https://github.com/operatorequals/httpimport/commit/ff2896c8f666c3f16b0f27716c732d68be018ef7
The reason why this is happening is because when you do import httpimport they do the initial configuration for the logging machinery. This happens right here. What this means is that the root logger already has a StreamHandler attached to it. Because of this, and the fact that all loggers inherit from the root logger, when you do log.info('yay') it not only uses your Handler and Formatter, but it also propagates all they way to the root logger, which also emits the message.
Remember that whoever calls basicConfig first when an application starts that sets up the default configuration for the root logger, which in turn, is inherited by all loggers, unless otherwise specified.
If you have a complex logging configuration you need to ensure that you call it before you do any third-party imports which might call basicConfig. basicConfig is idempotent meaning the first call seals the deal, and subsequent calls have no effect.
Solutions
You could do log.propagate = False and you will see that the 2nd yay will not show.
You could attach the Formatter directly to the already existent root Handler by doing something like this (without adding another Handler yourself)
root = logging.getLogger('')
formatter = logging.Formatter('%(asctime)s %(message)s')
root_handler = root.handlers[0]
root_handler.setFormatter(formatter)
You could do a basicConfig call when you initialize your application (if you had such a config available, with initial Formatters and Handlers, etc. that will elegantly attach everything to the root logger neatly) and then you would only do something like logger = logging.getLogger(__name__) and logger.info('some message') that would work the way you'd expect because it would propagate all the way to the root logger which already has your configuration.
You could remove the initial Handler that's present on the root logger by doing something like
root = logging.getLogger('')
root.handlers = []
... and many more solutions, but you get the idea.
Also do note that logging.getLogger('httpimport').setLevel(logging.ERROR) this works perfectly fine. No messages below logging.ERROR would be logged by that logger, it's just that the problem wasn't from there.
If you however want to completely disable a logger you can just do logger.disabled = True (also do note, again, that the problem wasn't from the httpimport logger, as aforementioned)
One example demonstrated
Change your toconsole.py with this and you won't see the second yay.
import logging
import os
log = logging.getLogger(__name__)
log.setLevel(logging.DEBUG)
root_logger = logging.getLogger('')
root_handler = root_logger.handlers[0]
formatter = logging.Formatter('%(asctime)s %(message)s')
root_handler.setFormatter(formatter)
# or you could just keep your old code and just add log.propagate = False
# or any of the above solutions and it would work
logging.getLogger('httpimport').setLevel(logging.ERROR)
I am trying to make Scrapy output colorized logs. I am not so familiar with Python logging, but my understanding is that I must make my own Formatter and make it use by Scrapy. I succeeded in making a Formatter to colorized the output using Clint.
My problem is that I can't make it work within Scrapy correctly. I would have expected the logger object in my spider to have a handler, then I would have switched the formatter of that handler. When I looks what is inside spider.logger.logger, I see that handler is an empty list. I tried to add my formatter in a new stream handler doing.
crawler.spider.logger.logger.addHandler(sh)
where sh is a handler using my color formatter.
This have for effect to make scrappy output each messages twice. First message is colorized but doesn't have Scrapy formatting. The second one has Scrapy formatting with no colors.
How can I make Scrapy output colorized logs keeping the same format that can be set in settings.py
Thanks
If you mean to colorize LogRecord only, you can customize LOG_FORMAT in settings.py with ANSI escape codes.
Example:
LOG_FORMAT = '\x1b[0;0;34m%(asctime)s\x1b[0;0m \x1b[0;0;36m[%(name)s]\x1b[0;0m \x1b[0;0;31m%(levelname)s\x1b[0;0m: %(message)s'
If you also want to colorize different log levels with different colors, you can override scrapy.utils.log._get_handler(source code).
Put this near the top of your settings.py
import scrapy.utils.log
_get_handler = copy.copy(scrapy.utils.log._get_handler)
def _get_handler_custom(*args, **kwargs):
handler = _get_handler(*args, **kwargs)
handler.setFormatter(your_custom_formatter)
return handler
scrapy.utils.log._get_handler = _get_handler_custom
What it does is reset the formatter after calling the original _get_handler, and then reattach it to scrapy.utils.log.
This is a hacky solution and might not be the best practice, but it just works.
A more proper way to achieve this is to override logging.StreamHandler. There is a bunch of discussion on SO which can lead you to the right direction.
Here I provide my full working codes used in my projects (a third-party package colorlog is in use).
settings.py
import copy
from colorlog import ColoredFormatter
import scrapy.utils.log
color_formatter = ColoredFormatter(
(
'%(log_color)s%(levelname)-5s%(reset)s '
'%(yellow)s[%(asctime)s]%(reset)s'
'%(white)s %(name)s %(funcName)s %(bold_purple)s:%(lineno)d%(reset)s '
'%(log_color)s%(message)s%(reset)s'
),
datefmt='%y-%m-%d %H:%M:%S',
log_colors={
'DEBUG': 'blue',
'INFO': 'bold_cyan',
'WARNING': 'red',
'ERROR': 'bg_bold_red',
'CRITICAL': 'red,bg_white',
}
)
_get_handler = copy.copy(scrapy.utils.log._get_handler)
def _get_handler_custom(*args, **kwargs):
handler = _get_handler(*args, **kwargs)
handler.setFormatter(color_formatter)
return handler
scrapy.utils.log._get_handler = _get_handler_custom
I have the following test script, which mimics what I usually do to setup an extended logger:
import logging
TRACE_LL = 25
TRACE_LSTR = 'TRACE'
LOG_FORMATTER = '%(asctime)s - %(levelname)-10s - %(message)s'
class MyLogger(logging.Logger):
def __init__(self, log_name):
logging.Logger.__init__(self, log_name)
self.setLevel(TRACE_LL)
hdlr = logging.StreamHandler()
formatter = logging.Formatter(LOG_FORMATTER)
hdlr.setFormatter(formatter)
self.addHandler(hdlr)
def trace(self, txt, *args, **kwargs):
self.log(TRACE_LL, txt, *args, **kwargs)
def getlog(name):
return logging.getLogger(name)
def setup():
logging.setLoggerClass(MyLogger)
logging.addLevelName(TRACE_LL, TRACE_LSTR)
setup()
log = getlog('mylog')
log.trace('Trace this')
Running this works as expected:
2014-01-07 07:22:59,982 - TRACE - Trace this
But running pylint on this causes trouble:
ยป pylint -E getlog_test.py
No config file found, using default configuration
************* Module getlog_test
E: 29,0: Instance of 'RootLogger' has no 'trace' member (but some types could not be inferred)
I get hundreds of those messages in my codebase, because I am using logging extensively.
How can I solve the pylint error?
As an alternative, disabling it would also be enough, but only for the RootLogger instance: I still want to know if other parts of the code have this problem.
Although I find pylint valuable, in my experience, it produces many false warnings and errors. You can disable one or more checks for any lines of your code by surrounding that code with comments of the form:
# pylint: disable=E1103
code that pylint trips over
# pylint: enable=E1103
where E1103 is the error to be suppressed. You can suppress multiple errors the same way with a comma-separated list of error codes. Pylint's documentation on this is here.
This is because Pylint is not smart enough to grasp that you'll get MyLogger instances on calling getLogger.
Beside using inline enabling/disabling of messages, which is easier at first, you may want to take a look at the pylint-brain project (https://bitbucket.org/logilab/pylint-brain).
You'll find there how to write a little astroid plugin that could add a 'trace' method to default logging loggers (or even better, inform that logging.getLogger() return MyLogger instance but that is a bit more tricky).
This is definitly better in the long run.
I'm new to Python logging and I can easily see how it is preferrable to the home-brew solution I have come up with.
One question I can't seem to find an answer to: how do I squelch logging messages on a per-method/function basis?
My hypothetical module contains a single function. As I develop, the log calls are a great help:
logging.basicConfig(level=logging.DEBUG,
format=('%(levelname)s: %(funcName)s(): %(message)s'))
log = logging.getLogger()
my_func1():
stuff...
log.debug("Here's an interesting value: %r" % some_value)
log.info("Going great here!")
more stuff...
As I wrap up my work on 'my_func1' and start work on a second function, 'my_func2', the logging messages from 'my_func1' start going from "helpful" to "clutter".
Is there single-line magic statement, such as 'logging.disabled_in_this_func()' that I can add to the top of 'my_func1' to disable all the logging calls within 'my_func1', but still leave logging calls in all other functions/methods unchanged?
Thanks
linux, Python 2.7.1
The trick is to create multiple loggers.
There are several aspects to this.
First. Don't use logging.basicConfig() at the beginning of a module. Use it only inside the main-import switch
if __name__ == "__main__":
logging.basicConfig(...)
main()
logging.shutdown()
Second. Never get the "root" logger, except to set global preferences.
Third. Get individual named loggers for things which might be enabled or disabled.
log = logging.getLogger(__name__)
func1_log = logging.getLogger( "{0}.{1}".format( __name__, "my_func1" )
Now you can set logging levels on each named logger.
log.setLevel( logging.INFO )
func1_log.setLevel( logging.ERROR )
You could create a decorator that would temporarily suspend logging, ala:
from functools import wraps
def suspendlogging(func):
#wraps(func)
def inner(*args, **kwargs):
previousloglevel = log.getEffectiveLevel()
try:
return func(*args, **kwargs)
finally:
log.setLevel(previousloglevel)
return inner
#suspendlogging
def my_func1(): ...
Caveat: that would also suspend logging for any function called from my_func1 so be careful how you use it.
You could use a decorator:
import logging
import functools
def disable_logging(func):
#functools.wraps(func)
def wrapper(*args,**kwargs):
logging.disable(logging.DEBUG)
result = func(*args,**kwargs)
logging.disable(logging.NOTSET)
return result
return wrapper
#disable_logging
def my_func1(...):
I took me some time to learn how to implement the sub-loggers as suggested by S.Lott.
Given how tough it was to figure out how to setup logging when I was starting out, I figure it's long past time to share what I learned since then.
Please keep in mind that this isn't the only way to set up loggers/sub-loggers, nor is it the best. It's just the way I use to get the job done to fit my needs. I hope this is helpful to someone. Please feel free to comment/share/critique.
Let's assume we have a simple library we like to use. From the main program, we'd like to be able to control the logging messages we get from the library. Of course we're considerate library creators, so we configure our library to make this easy.
First, the main program:
# some_prog.py
import os
import sys
# Be sure to give Vinay Sajip thanks for his creation of the logging module
# and tireless efforts to answer our dumb questions about it. Thanks Vinay!
import logging
# This module will make understanding how Python logging works so much easier.
# Also great for debugging why your logging setup isn't working.
# Be sure to give it's creator Brandon Rhodes some love. Thanks Brandon!
import logging_tree
# Example library
import some_lib
# Directory, name of current module
current_path, modulename = os.path.split(os.path.abspath(__file__))
modulename = modulename.split('.')[0] # Drop the '.py'
# Set up a module-local logger
# In this case, the logger will be named 'some_prog'
log = logging.getLogger(modulename)
# Add a Handler. The Handler tells the logger *where* to send the logging
# messages. We'll set up a simple handler that send the log messages
# to standard output (stdout)
stdout_handler = logging.StreamHandler(stream=sys.stdout)
log.addHandler(stdout_handler)
def some_local_func():
log.info("Info: some_local_func()")
log.debug("Debug: some_local_func()")
if __name__ == "__main__":
# Our main program, here's where we tie together/enable the logging infra
# we've added everywhere else.
# Use logging_tree.printout() to see what the default log levels
# are on our loggers. Make logging_tree.printout() calls at any place in
# the code to see how the loggers are configured at any time.
#
# logging_tree.printout()
print("# Logging level set to default (i.e. 'WARNING').")
some_local_func()
some_lib.some_lib_func()
some_lib.some_special_func()
# We know a reference to our local logger, so we can set/change its logging
# level directly. Let's set it to INFO:
log.setLevel(logging.INFO)
print("# Local logging set to 'INFO'.")
some_local_func()
some_lib.some_lib_func()
some_lib.some_special_func()
# Next, set the local logging level to DEBUG:
log.setLevel(logging.DEBUG)
print("# Local logging set to 'DEBUG'.")
some_local_func()
some_lib.some_lib_func()
some_lib.some_special_func()
# Set the library's logging level to DEBUG. We don't necessarily
# have a reference to the library's logger, but we can use
# logging_tree.printout() to see the name and then call logging.getLogger()
# to create a local reference. Alternately, we could dig through the
# library code.
lib_logger_ref = logging.getLogger("some_lib")
lib_logger_ref.setLevel(logging.DEBUG)
# The library logger's default handler, NullHandler() won't output anything.
# We'll need to add a handler so we can see the output -- in this case we'll
# also send it to stdout.
lib_log_handler = logging.StreamHandler(stream=sys.stdout)
lib_logger_ref.addHandler(lib_log_handler)
lib_logger_ref.setLevel(logging.DEBUG)
print("# Logging level set to DEBUG in both local program and library.")
some_local_func()
some_lib.some_lib_func()
some_lib.some_special_func()
print("# ACK! Setting the library's logging level to DEBUG output")
print("# all debug messages from the library. (Use logging_tree.printout()")
print("# To see why.)")
print("# Let's change the library's logging level to INFO and")
print("# only some_special_func()'s level to DEBUG so we only see")
print("# debug message from some_special_func()")
# Raise the logging level of the libary and lower the logging level
# of 'some_special_func()' so we see only some_special_func()'s
# debugging-level messages.
# Since it is a sub-logger of the library's main logger, we don't need
# to create another handler, it will use the handler that belongs
# to the library's main logger.
lib_logger_ref.setLevel(logging.INFO)
special_func_sub_logger_ref = logging.getLogger('some_lib.some_special_func')
special_func_sub_logger_ref.setLevel(logging.DEBUG)
print("# Logging level set to DEBUG in local program, INFO in library and")
print("# DEBUG in some_lib.some_special_func()")
some_local_func()
some_lib.some_lib_func()
some_lib.some_special_func()
Next, our library:
# some_lib.py
import os
import logging
# Directory, name of current module
current_path, modulename = os.path.split(os.path.abspath(__file__))
modulename = modulename.split('.')[0] # Drop the '.py'
# Set up a module-local logger. In this case the logger will be
# named 'some_lib'
log = logging.getLogger(modulename)
# In libraries, always default to NullHandler so you don't get
# "No handler for X" messages.
# Let your library callers set up handlers and set logging levels
# in their main program so the main program can decide what level
# of messages they want to see from your library.
log.addHandler(logging.NullHandler())
def some_lib_func():
log.info("Info: some_lib.some_lib_func()")
log.debug("Debug: some_lib.some_lib_func()")
def some_special_func():
"""
This func is special (not really). It just has a function/method-local
logger in addition to the library/module-level logger.
This allows us to create/control logging messages down to the
function/method level.
"""
# Our function/method-local logger
func_log = logging.getLogger('%s.some_special_func' % modulename)
# Using the module-level logger
log.info("Info: some_special_func()")
# Using the function/method-level logger, which can be controlled separately
# from both the library-level logger and the main program's logger.
func_log.debug("Debug: some_special_func(): This message can be controlled at the function/method level.")
Now let's run the program, along with the commentary track:
# Logging level set to default (i.e. 'WARNING').
Notice there's no output at the default level since we haven't generated any WARNING-level messages.
# Local logging set to 'INFO'.
Info: some_local_func()
The library's handlers default to NullHandler(), so we see only output from the main program. This is good.
# Local logging set to 'DEBUG'.
Info: some_local_func()
Debug: some_local_func()
Main program logger set to DEBUG. We still see no output from the library. This is good.
# Logging level set to DEBUG in both local program and library.
Info: some_local_func()
Debug: some_local_func()
Info: some_lib.some_lib_func()
Debug: some_lib.some_lib_func()
Info: some_special_func()
Debug: some_special_func(): This message can be controlled at the function/method level.
Oops.
# ACK! Setting the library's logging level to DEBUG output
# all debug messages from the library. (Use logging_tree.printout()
# To see why.)
# Let's change the library's logging level to INFO and
# only some_special_func()'s level to DEBUG so we only see
# debug message from some_special_func()
# Logging level set to DEBUG in local program, INFO in library and
# DEBUG in some_lib.some_special_func()
Info: some_local_func()
Debug: some_local_func()
Info: some_lib.some_lib_func()
Info: some_special_func()
Debug: some_special_func(): This message can be controlled at the function/method level.
It's also possible to get only debug messages only from some_special_func(). Use logging_tree.printout() to figure out which logging levels to tweak to make that happen!
This combines #KirkStrauser's answer with #unutbu's. Kirk's has try/finally but doesn't disable, and unutbu's disables w/o try/finally. Just putting it here for posterity:
from functools import wraps
import logging
def suspend_logging(func):
#wraps(func)
def inner(*args, **kwargs):
logging.disable(logging.FATAL)
try:
return func(*args, **kwargs)
finally:
logging.disable(logging.NOTSET)
return inner
Example usage:
from logging import getLogger
logger = getLogger()
#suspend_logging
def example()
logger.info("inside the function")
logger.info("before")
example()
logger.info("after")
If you are using logbook silencing logs is very simple. Just use a NullHandler - https://logbook.readthedocs.io/en/stable/api/handlers.html
>>> logger.warn('TEST')
[12:28:17.298198] WARNING: TEST
>>> from logbook import NullHandler
>>> with NullHandler():
... logger.warn('TEST')
I'm looking for a simple way to extend the logging functionality defined in the standard python library. I just want the ability to choose whether or not my logs are also printed to the screen.
Example: Normally to log a warning you would call:
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(levelname)s: %(message)s', filename='log.log', filemode='w')
logging.warning("WARNING!!!")
This sets the configurations of the log and puts the warning into the log
I would like to have something along the lines of a call like:
logging.warning("WARNING!!!", True)
where the True statement signifys if the log is also printed to stdout.
I've seen some examples of implementations of overriding the logger class
but I am new to the language and don't really follow what is going on, or how to implement this idea. Any help would be greatly appreciated :)
The Python logging module defines these classes:
Loggers that emit log messages.
Handlers that put those messages to a destination.
Formatters that format log messages.
Filters that filter log messages.
A Logger can have Handlers. You add them by invoking the addHandler() method. A Handler can have Filters and Formatters. You similarly add them by invoking the addFilter() and setFormatter() methods, respectively.
It works like this:
import logging
# make a logger
main_logger = logging.getLogger("my logger")
main_logger.setLevel(logging.INFO)
# make some handlers
console_handler = logging.StreamHandler() # by default, sys.stderr
file_handler = logging.FileHandler("my_log_file.txt")
# set logging levels
console_handler.setLevel(logging.WARNING)
file_handler.setLevel(logging.INFO)
# add handlers to logger
main_logger.addHandler(console_handler)
main_logger.addHandler(file_handler)
Now, you can use this object like this:
main_logger.info("logged in the FILE")
main_logger.warning("logged in the FILE and on the CONSOLE")
If you just run python on your machine, you can type the above code into the interactive console and you should see the output. The log file will get crated in your current directory, if you have permissions to create files in it.
I hope this helps!
It is possible to override logging.getLoggerClass() to add new functionality to loggers. I wrote simple class which prints green messages in stdout.
Most important parts of my code:
class ColorLogger(logging.getLoggerClass()):
__GREEN = '\033[0;32m%s\033[0m'
__FORMAT = {
'fmt': '%(asctime)s %(levelname)s: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
}
def __init__(self, format=__FORMAT):
formatter = logging.Formatter(**format)
self.root.setLevel(logging.INFO)
self.root.handlers = []
(...)
handler = logging.StreamHandler()
handler.setFormatter(formatter)
self.root.addHandler(handler)
def info(self, message):
self.root.info(message)
(...)
def info_green(self, message):
self.root.info(self.__GREEN, message)
(...)
if __name__ == '__main__':
logger = ColorLogger()
logger.info("This message has default color.")
logger.info_green("This message is green.")
Handlers send the log records (created by loggers) to the appropriate
destination.
(from the docs: http://docs.python.org/library/logging.html)
Just set up multiple handlers with your logging object, one to write to file, another to write to the screen.
UPDATE
Here is an example function you can call in your classes to get logging set up with a handler.
def set_up_logger(self):
# create logger object
self.log = logging.getLogger("command")
self.log.setLevel(logging.DEBUG)
# create console handler and set min level recorded to debug messages
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
# add the handler to the log object
self.log.addHandler(ch)
You would just need to set up another handler for files, ala the StreamHandler code that's already there, and add it to the logging object. The line that says ch.setLevel(logging.DEBUG) means that this particular handler will take logging messages that are DEBUG or higher. You'll likely want to set yours to WARNING or higher, since you only want the more important things to go to the console. So, your logging would work like this:
self.log.info("Hello, World!") -> goes to file
self.log.error("OMG!!") -> goes to file AND console