Send log messages from all celery tasks to a single file - python

I'm wondering how to setup a more specific logging system. All my tasks use
logger = logging.getLogger(__name__)
as a module-wide logger.
I want celery to log to "celeryd.log" and my tasks to "tasks.log" but I got no idea how to get this working. Using CELERYD_LOG_FILE from django-celery I can route all celeryd related log messages to celeryd.log but there is no trace of the log messages created in my tasks.

Note: This answer is outdated as of Celery 3.0, where you now use get_task_logger() to get your per-task logger set up. Please see the Logging section of the What's new in Celery 3.0 document for details.
Celery has dedicated support for logging, per task. See the Task documentation on the subject:
You can use the workers logger to add diagnostic output to the worker log:
#celery.task()
def add(x, y):
logger = add.get_logger()
logger.info("Adding %s + %s" % (x, y))
return x + y
There are several logging levels available, and the workers loglevel setting decides
whether or not they will be written to the log file.
Of course, you can also simply use print as anything written to standard out/-err will be
written to the log file as well.
Under the hood this is all still the standard python logging module. You can set the CELERYD_HIJACK_ROOT_LOGGER option to False to allow your own logging setup to work, otherwise Celery will configure the handling for you.
However, for tasks, the .get_logger() call does allow you to set up a separate log file per individual task. Simply pass in a logfile argument and it'll route log messages to that separate file:
#celery.task()
def add(x, y):
logger = add.get_logger(logfile='tasks.log')
logger.info("Adding %s + %s" % (x, y))
return x + y
Last but not least, you can just configure your top-level package in the python logging module and give it a file handler of it's own. I'd set this up using the celery.signals.after_setup_task_logger signal; here I assume all your modules live in a package called foo.tasks (as in foo.tasks.email and foo.tasks.scaling):
from celery.signals import after_setup_task_logger
import logging
def foo_tasks_setup_logging(**kw):
logger = logging.getLogger('foo.tasks')
if not logger.handlers:
handler = logging.FileHandler('tasks.log')
formatter = logging.Formatter(logging.BASIC_FORMAT) # you may want to customize this.
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.propagate = False
after_setup_task_logger.connect(foo_tasks_setup_logging)
Now any logger whose name starts with foo.tasks will have all it's messages sent to tasks.log instead of to the root logger (which doesn't see any of these messages because .propagate is False).

Just a hint: Celery has its own logging handler:
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
Also, Celery logs all output from the task. More details at Celery docs for Task Logging

join
--concurrency=1 --loglevel=INFO
with the command to run celery worker
eg: python xxxx.py celery worker --concurrency=1 --loglevel=INFO
Better to set loglevel inside each python files too

Related

Python: logging object loses its file handler when passed to an RQ task queue

Problem
I pass a logging (logger) object, supposed to add lines to test.log, to a function background_task() that is run by the rq utility (task queues manager). logger has a FileHandler assigned to it to allow logging to test.log. Until background_task() is run, you can see the file handler present in logger.handlers, but when the logger is passed to background_task() and background_task() is run by rq worker, logger.handlers gets empty.
But if I ditch rq (and Redis) and just run background_task right away, the content of logger.handlers is preserved. So, it has something to do with rq (and, probably, task queuing in general, it's a new topic for me).
Steps to reproduce
Run add_job.py: python3 add_job.py. You'll see the output of print(logger.handlers) called from within add_job(): there will be a handlers list containing FileHandler added in get_job_logger().
Run command rq worker to start executing the queued task. You'll see the output of print(logger.handlers) once again but this time called from within background_task() and the list will be empty! Handlers of the logging (logger) object somehow get lost when the function that accepts a logger as an argument is run by rq (rq worker). What gives?
Here's how it looks like in the terminal:
$ python3 add_job.py
[<FileHandler /home/user/project/test.log (INFO)>]
$ rq worker
17:44:45 Worker rq:worker:2bbad3623e95438f81396c662cb01284: started, version 1.10.1
17:44:45 Subscribing to channel rq:pubsub:2bbad3623e95438f81396c662cb01284
17:44:45 *** Listening on default...
17:44:45 default: tasks.background_task(<RootLogger root (INFO)>) (5a5301be-efc3-49a7-ab0c-f7cf0a4bd3e5)
[]
Source code
add_job.py
import logging
from logging import FileHandler
from redis import Redis
from rq import Queue
from tasks import background_task
def add_job():
r = Redis()
qu = Queue(connection=r)
logger = get_job_logger()
print(logger.handlers)
job = qu.enqueue(background_task, logger)
def get_job_logger():
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger_file_handler = FileHandler('test.log')
logger_file_handler.setLevel(logging.INFO)
logger.addHandler(logger_file_handler)
return logger
if __name__ == '__main__':
add_job()
tasks.py
def background_task(logger):
print(logger.handlers)
Answered here.
FileHandler does not get carried over into other threads. You start the FileHandler in the main thread and rq worker starts other threads. Memory is not shared like that.
Hm, I see... Thanks!
I assumed the FileHandler was being serialized or whatnot when written to Redis as a part of the whole logger object and then reinitialized when popping out of the queue.
Anyway, I'll try passing a file path to the function and initialize a logger from within. That way, keeping the FileHandler object to one thread.
EDIT: yeah, it works

python-daemon and logging: set logging level interactively

I have a python-daemon process that logs to a file via a ThreadedTCPServer (inspired by the cookbook example: https://docs.python.org/2/howto/logging-cookbook.html#sending-and-receiving-logging-events-across-a-network, as I will have many such processes writing to the same file). I am controlling the spawning of the daemon process using subprocess.Popen from an ipython console, and this is how the application will be run. I am able to successfully write to the log file from both the main ipython process, as well as the daemon process, but I am unable to change the level of both by just simply setting the level of the root logger in ipython. Is this something that should be possible? Or will it require custom functionality to set the logging.level of the daemon separately?
Edit: As requested, here is an attempt to provide a pseudo-code example of what I am trying to achieve. I hope that this is a sufficient description.
daemon_script.py
import logging
import daemon
from other_module import function_to_run_as_daemon
class daemon(object):
def __init__(self):
self.daemon_name = __name__
logging.basicConfig() # <--- required, or I don't get any log messages
self.logger = logging.getLogger(self.daemon_name)
self.logger.debug( "Created logger successfully" )
def run(self):
with daemon.daemonContext( files_preserve = [self.logger.handlers[0].stream] )
self.logger.debug( "Daemonised successfully - about to enter function" )
function_to_run_as_daemon()
if __name__ == "__main__":
d = daemon()
d.run()
Then in ipython i would run something like
>>> import logging
>>> rootlogger = logging.getLogger()
>>> rootlogger.info( "test" )
INFO:root:"test"
>>> subprocess.Popen( ["python" , "daemon_script.py"] )
DEBUG:__main__:"Created logger successfully"
DEBUG:__main__:"Daemonised successfully - about to enter function"
# now i'm finished debugging and testing, i want to reduce the level for all the loggers by changing the level of the handler
# Note that I also tried changing the level of the root handler, but saw no change
>>> rootlogger.handlers[0].setLevel(logging.INFO)
>>> rootlogger.info( "test" )
INFO:root:"test"
>>> print( rootlogger.debug("test") )
None
>>> subprocess.Popen( ["python" , "daemon_script.py"] )
DEBUG:__main__:"Created logger successfully"
DEBUG:__main__:"Daemonised successfully - about to enter function"
I think that I may not be approaching this correctly, but, its not clear to me what would work better. Any advice would be appreciated.
The logger you create in your daemon won't be the same as the logger you made in ipython. You could test this to be sure, by just printing out both logger objects themselves, which will show you their memory addresses.
I think a better pattern would be be that you pass if you want to be in "debug" mode or not, when you run the daemon. In other words, call popen like this:
subprocess.Popen( ["python" , "daemon_script.py", "debug"] )
It's up to you, you could pass a string meaning "debug mode is on" as above, or you could pass the log level constant that means "debug", e.g.:
subprocess.Popen( ["python" , "daemon_script.py", "10"] )
(https://docs.python.org/2/library/logging.html#levels)
Then in the daemon's init function use argv for example, to get that argument and use it:
...
import sys
def __init__(self):
self.daemon_name = __name__
logging.basicConfig() # <--- required, or I don't get any log messages
log_level = int(sys.argv[1]) # Probably don't actually just blindly convert it without error handling
self.logger = logging.getLogger(self.daemon_name)
self.logger.setLevel(log_level)
...

dask distributed 1.19 client logging?

The following code used to emit logs at some point, but no longer seems to do so. Shouldn't configuration of the logging mechanism in each worker permit logs to appear on stdout? If not, what am I overlooking?
import logging
from distributed import Client, LocalCluster
import numpy as np
def func(args):
i, x = args
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(name)s %(levelname)s %(message)s')
logger = logging.getLogger('func %i' % i)
logger.info('computing svd')
return np.linalg.svd(x)
if __name__ == '__main__':
lc = LocalCluster(10)
c = Client(lc)
data = [np.random.rand(50, 50) for i in range(50)]
fut = c.map(func, zip(range(len(data)), data))
results = c.gather(fut)
lc.close()
As per this question, I tried putting the logger configuration code into a separate function invoked via c.run(init_logging) right after instantiation of the client, but that didn't make any difference either.
I'm using distributed 1.19.3 with Python 3.6.3 on Linux. I have
logging:
distributed: info
distributed.client: info
in ~/.dask/config.yaml.
Evidently the submitted functions do not actually execute until one tries to retrieve the results from the generated futures, i.e., using the line
print(list(results))
before shutting down the local cluster. I'm not sure how to reconcile this with the section in the online docs that seems to state that direct submissions to a cluster are executed immediately.

celery python threading with function importing

I'm working in python and was originally using someone's code to thread and render a map with mapnik. I've since tried to put it into a flask API, with celery as a backend.
Originally:
https://gist.github.com/Thetoxicarcade/57777a6714cb6fecaacf
"add an api":
https://gist.github.com/Thetoxicarcade/079cf03a3f3a061134f2
(yes I will edit this to make it shorter and better)
In general:
flask -> search(params) -> runCelery(params) -> returnOkay
celery worker (gimme params) -> fork a bunch of threads, render that place
*I may just rewrite this into a billion celery tasks?
except everything in runCelery is in the dark.
the worker task itself:
"""background working map algorithm"""
# I have no clue with these celery declarations.
CELERY_EAGER_PROPAGATES_EXCEPTIONS = True
CELERY_ALWAYS_EAGER = True
worker = Celery('tasks', backend='rpc://', broker='redis://localhost')
#worker.task(name="Renderer")#bind=True
def async_render(key,minz,maxz,fake):
from celery.utils.log import get_task_logger
logs = logging.getLogger('brettapi')
file = logging.FileHandler('/home/aristatek/log{}'.format(key))
style = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
file.setFormatter(style)
logs.addHandler(file)
logs.setLevel(logging.DEBUG)
logs.debug('HELLO WORLD')
logs.debug('I AM {}'.format(key))
#osm file
home = os.environ['HOME']
tiledir = home + "/mapnik/mapnik/osm.xml"
#grab minz/maxz
if minz is None: minz = 6
if maxz is None: maxz = 17
name=reverseKey(key)
bounds=makeBorder(key)
file = home + "/mapnik/mapnik/tiles/" + name + "/"
print key,minz,maxz,fake,file, tiledir, name
print bounds
# polygons are defined by lowercase, spaceless name strings. (or they were.)
if fake is not True:
bbox = (-180, -90, 180, 90)
render_tiles(bbox, tiledir, file, 0, 5, "World")
render_tiles(bounds, tiledir, file, minz, maxz, name)
return "Finished"
Apparently, any way that I try to get these celery instances to respond with logs or process information, they refuse to do so, which makes debugging them really tough.
I started a celery worker and was able to start celeryflower. I cannot seem to queue the task at all, and see nothing happening. :/
Part of this may be that I'm not importing functions, but even using pdb isn't helpful because of the mysticism of celery objects not obeying anything I throw at them.
It's a vague question because I hardly understand it anyway. The "read the docs" pages for celery are about as vague as possible. Do they mean properties or functions or variables within celery?? within the workers??
I'd like to know a way to get celery to respond, which would be meaningful because it means I'm going in the right direction.
Any help would be appreciated.
Edit -
Turns out from most of this the names declared need to be their own:
def things(key):
thread_lots(key)
worker = Celery()
#worker.task(name='ThisTask')
def ThisTask(key):
from testme import things
things(key)
webserve = Flask()
#webserve.route('/bla')
def bla(key):
ThisTask.apply_async((params), task_id=key)

Celery with RabbitMQ: AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'

I'm running the First Steps with Celery Tutorial.
We define the following task:
from celery import Celery
app = Celery('tasks', broker='amqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
Then call it:
>>> from tasks import add
>>> add.delay(4, 4)
But I get the following error:
AttributeError: 'DisabledBackend' object has no attribute '_get_task_meta_for'
I'm running both the celery worker and the rabbit-mq server. Rather strangely, celery worker reports the task as succeeding:
[2014-04-22 19:12:03,608: INFO/MainProcess] Task test_celery.add[168c7d96-e41a-41c9-80f5-50b24dcaff73] succeeded in 0.000435483998444s: 19
Why isn't this working?
Just keep reading tutorial. It will be explained in Keep Results chapter.
To start Celery you need to provide just broker parameter, which is required to send messages about tasks. If you want to retrieve information about state and results returned by finished tasks you need to set backend parameter. You can find full list with description in Configuration docs: CELERY_RESULT_BACKEND.
I suggest having a look at:
http://www.cnblogs.com/fangwenyu/p/3625830.html
There you will see that
instead of
app = Celery('tasks', broker='amqp://guest#localhost//')
you should be writing
app = Celery('tasks', backend='amqp', broker='amqp://guest#localhost//')
This is it.
In case anyone made the same easy to make mistake as I did: The tutorial doesn't say so explicitly, but the line
app = Celery('tasks', backend='rpc://', broker='amqp://')
is an EDIT of the line in your tasks.py file. Mine now reads:
app = Celery('tasks', backend='rpc://', broker='amqp://guest#localhost//')
When I run python from the command line I get:
$ python
>>> from tasks import add
>>> result = add.delay(4,50)
>>> result.ready()
>>> False
All tutorials should be easy to follow, even when a little drunk. So far this one doesn't reach that bar.
What is not clear by the tutorial is that the tasks.py module needs to be edited so that you change the line:
app = Celery('tasks', broker='pyamqp://guest#localhost//')
to include the RPC result backend:
app = Celery('tasks', backend='rpc://', broker='pyamqp://')
Once done, Ctrl + C the celery worker process and restart it:
celery -A tasks worker --loglevel=info
The tutorial is confusing in that we're making the assumption that creation of the app object is done in the client testing session, which it is not.
In your project directory find the settings file.
Then run the below command in your terminal:
sudo vim settings.py
copy/paste the below config into your settings.py:
CELERY_RESULT_BACKEND='djcelery.backends.database:DatabaseBackend'
Note: This is your backend for storing the messages in the queue if you are using django-celery package for your Django project.
Celery rely both on a backend AND a broker.
This solved it for me using only Redis:
app = Celery("tasks", backend='redis://localhost',broker="redis://localhost")
Remember to restart worker in your terminal after changing the config
I solved this error by adding app after taskID:
response = AsyncResult(taskID, app=celery_app)
where celery_app = Celery('ANYTHING', broker=BROKER_URL, backend=BACKEND_URL )
if you want to get the status of the celery task to know whether it is "PENDING","SUCCESS","FAILURE"
status = response.status
My case was simple - I used interactive Python console and Python cached imported module. I killed console and started it again - everything works as it should.
import celery
app = celery.Celery('tasks', broker='redis://localhost:6379',
backend='mongodb://localhost:27017/celery_tasks')
#app.task
def add(x, y):
return x + y
In Python console.
>>> from tasks import add
>>> result = add.delay(4, 4)
>>> result.ready()
True
Switching from Windows to Linux solved the issue for me
Windows is not guaranteed to work, it's mentioned here
I had the same issue, what resolved it for me was to import the celery file (celery.py) in the init function of you're app with something like:
from .celery import CELERY_APP as celery_app
__all__ = ('celery_app',)
if you use a celery.py file as described here

Categories