prevent duplicate celery logging - python

How do I prevent duplicate celery logs in an application like this?
# test.py
from celery import Celery
import logging
app = Celery('tasks', broker='redis://localhost:6379/0')
app.logger = logging.getLogger("new_logger")
file_handler = logging.handlers.RotatingFileHandler("app.log", maxBytes=1024*1024, backupCount=1)
file_handler.setFormatter(logging.Formatter('custom_format %(message)s'))
app.logger.addHandler(file_handler)
#app.task
def foo(x, y):
app.logger.info("log info from foo")
I start the application with: celery -A test worker --loglevel=info --logfile celery.log
Then I cause foo to be run with python -c "from test import foo; print foo.delay(4, 4)"
This results in the "log info from foo" being displayed in both celery.log and app.log.
Here is app.log contents:
custom_format log info from foo
And here is celery.log contents:
[2017-07-26 21:17:24,962: INFO/MainProcess] Connected to redis://localhost:6379/0
[2017-07-26 21:17:24,967: INFO/MainProcess] mingle: searching for neighbors
[2017-07-26 21:17:25,979: INFO/MainProcess] mingle: all alone
[2017-07-26 21:17:25,991: INFO/MainProcess] celery#jd-t430 ready.
[2017-07-26 21:17:38,224: INFO/MainProcess] Received task: test.foo[e2c5e6aa-0d2d-4a16-978c-388a5e3cf162]
[2017-07-26 21:17:38,225: INFO/ForkPoolWorker-4] log info from foo
[2017-07-26 21:17:38,226: INFO/ForkPoolWorker-4] Task test.foo[e2c5e6aa-0d2d-4a16-978c-388a5e3cf162] succeeded in 0.000783085000876s: None
I considered removing the custom logger handler from the python code, but I don't want to just use celery.log because it doesn't support rotating files. I considered starting celery with --logfile /dev/null but then I would loose the mingle and other logs that don't show up in app.log.
Can I prevent "log info from foo" from showing up in celery.log? Given that I created the logger from scratch and only setup logging to app.log why is "log info from foo" showing up in celery.log anyway?
Is it possible to get the celery MainProcess and Worker logs (e.g. Connected to redis://localhost:6379/0) to be logged by a RotatingFileHandler (e.g. go in my app.log)?

Why is "log info from foo" showing up in celery.log?
The logging system is basically a tree of logging.Logger objects with main logging.Logger in the root of the tree (you get the root with call logging.getLogger() without parameters).
When you call logging.getLogger("child") you get reference to the logging.Logger processing the "child" logs. The problem is when you call logging.getLogger("child").info() the info message is delivered to the "child" but also to the parent of the "child" and to its parent until it arrives to the root.
To avoid sending logs to the parent you have to setup the logging.getLogger("child").propagate = False.

Related

Why does Celery periodic tasks fire a function only once

I've built a small web scraper function to get some data from the web and populate it to my db which works just well.
Now I would like to fire this function periodically every 20 seconds using Celery periodic tasks.
I walked through the docu and everything seems to be set up for development (using redis as broker).
This is my tasks.py file in project/stocksapp where my periodically fired functions are:
# Celery imports
from celery.task.schedules import crontab
from celery.decorators import periodic_task
from celery.utils.log import get_task_logger
from datetime import timedelta
logger = get_task_logger(__name__)
# periodic functions
#periodic_task(
run_every=(timedelta(seconds=20)),
name="getStocksDataDax",
ignore_result=True
)
def getStocksDataDax():
print("fired")
Now when I start the worker, the function seems to be fired once and only once (the database gets populated). But after that, the function doesn't get fired anymore, although the console suggests it:
C:\Users\Jonas\Desktop\CFD\CFD>celery -A CFD beat -l info
celery beat v4.4.2 (cliffs) is starting.
__ - ... __ - _
LocalTime -> 2020-05-15 23:06:29
Configuration ->
. broker -> redis://localhost:6379/0
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> celerybeat-schedule
. logfile -> [stderr]#%INFO
. maxinterval -> 5.00 minutes (300s)
[2020-05-15 23:06:29,990: INFO/MainProcess] beat: Starting...
[2020-05-15 23:06:30,024: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:06:50,015: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:07:10,015: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:07:30,015: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:07:50,015: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:08:10,016: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:08:30,016: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
[2020-05-15 23:08:50,016: INFO/MainProcess] Scheduler: Sending due task getStocksDataDax (getStocksDataDax)
project/project/celery.py
from __future__ import absolute_import, unicode_literals
import os
from celery import Celery
# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'CFD.settings')
app = Celery('CFD',
broker='redis://localhost:6379/0',
backend='amqp://',
include=['CFD.tasks'])
app.conf.broker_transport_options = {'visibility_timeout': 3600}
# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
# should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')
# Load task modules from all registered Django app configs.
app.autodiscover_tasks()
#app.task(bind=True)
def debug_task(self):
print('Request: {0!r}'.format(self.request))
The function itself runs about 1 second totally.
Where could basically be an issue in this setup to make the worker/celery fire the function every 20 seconds as supposed to?
celery -A CFD beat -l info only starts the Celery beat process. You should have a separate Celery worker process - in a different terminal run something like celery -A CFD worker -c 8 -O fair -l info.

Logging with python/celery/rabbitMQ

I'm looking for the best way to keep track of my workers and queues and I'm looking into logging.
I've seen examples in the celery documentation that suggests setting up logging as follows:
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
#app.task
def add(x, y):
logger.info('Adding {0} + {1}'.format(x, y))
return x + y
Where does the logging file go? Also what information is stored in the log file? Is it just information that is contained in the logger.info function?
Does the logfile store the results returned by the workers, or is that separate?
Where does the logging file go?
As I can see you don't have any FileHandlers. It means logger write messages to console.
Let's check it. Here example of tasks.py:
# celery 4.0.2
#celery.task(name='add')
def add(x, y):
logger.info('Adding {0} + {1}'.format(x, y))
return x + y
app = celery.Celery(
__name__,
broker='redis://localhost:6379/0',
backend='redis://localhost:6379/0',
)
app.conf.beat_schedule = {
# run task each 2 seconds
'add-every-2-seconds': {
'task': 'add',
'schedule': 2.0,
'args': (1, 2)
},
}
Run Celery(celery worker -A tasks.app --loglevel=info --beat) and check console. You will see something like that:
[2017-04-08 18:18:55,924: INFO/Beat] Scheduler: Sending due task add-every-2-seconds (add)
[2017-04-08 18:18:55,930: INFO/MainProcess] Received task: add[44a6877c-84a2-4a26-815e-1f637fdf9c0c]
[2017-04-08 18:18:55,932: INFO/PoolWorker-2] add[44a6877c-84a2-4a26-815e-1f637fdf9c0c]: Adding 1 + 2
[2017-04-08 18:18:55,934: INFO/PoolWorker-2] Task add[44a6877c-84a2-4a26-815e-1f637fdf9c0c] succeeded in 0.00191404699945s: 3
[2017-04-08 18:18:57,924: INFO/Beat] Scheduler: Sending due task add-every-2-seconds (add)
[2017-04-08 18:18:57,928: INFO/MainProcess] Received task: add[c386d360-57d3-4352-8a89-f86bb2376e4e]
[2017-04-08 18:18:57,930: INFO/PoolWorker-3] add[c386d360-57d3-4352-8a89-f86bb2376e4e]: Adding 1 + 2
[2017-04-08 18:18:57,931: INFO/PoolWorker-3] Task add[c386d360-57d3-4352-8a89-f86bb2376e4e] succeeded in 0.00146738500007s: 3
It means logger works good and write our messages. Now let's try to add FileHandler for our tasks:
logger = get_task_logger(__name__)
task_handler = FileHandler('task.log')
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
task_handler.setFormatter(formatter)
logger.addHandler(task_handler)
Run Celery and check folder where stored tasks.py. You should see new file(tasks.log). Example of content:
2017-04-08 18:35:02,052 - tasks - INFO - Adding 1 + 2
...
Does the logfile store the results returned by the workers?
By default information just print to console. But you can register specific loggers, handlers and customize behavior using signals, custom Task/Loader class.
Also you can set -f LOGFILE, --logfile=LOGFILE argument when run Celery.
Hope this helps.

simple celery test with Print doesn't go to Terminal

EDIT 1:
Actually, print statements outputs to the Celery terminal, instead of the terminal where the python program is ran - as #PatrickAllen indicated
OP
I've recently started to use Celery, but can't even get a simple test going where I print a line to the terminal after a 30 second wait.
In my tasks.py:
from celery import Celery
celery = Celery(__name__, broker='amqp://guest#localhost//', backend='amqp://guest#localhost//')
#celery.task
def test_message():
print ("schedule task says hello")
in the main module for my package, I have:
import tasks.py
if __name__ == '__main__':
<do something>
tasks.test_message.apply_async(countdown=30)
I run it from terminal:
celery -A tasks worker --loglevel=info
Task is ran correctly, but nothing on the terminal of the main program. Celery output:
[2016-03-06 17:49:46,890: INFO/MainProcess] Received task: tasks.test_message[4282fa1a-8b2f-4fa2-82be-d8f90288b6e2] eta:[2016-03-06 06:50:16.785896+00:00]
[2016-03-06 17:50:17,890: WARNING/Worker-2] schedule task says hello
[2016-03-06 17:50:17,892: WARNING/Worker-2] The client is not currently connected.
[2016-03-06 17:50:18,076: INFO/MainProcess] Task tasks.test_message[4282fa1a-8b2f-4fa2-82be-d8f90288b6e2] succeeded in 0.18711688100120227s: None

Celery - one task in one second

I use Celery to make requests to the server (in tasks). I have hard limit - only 1 request in one second (from one ip).
I read this, so its what I want - 1/s.
In celeryconfig.py I have:
CELERY_DISABLE_RATE_LIMITS = False
CELERY_DEFAULT_RATE_LIMIT = "1/s"
But I have the messages, that I have too many requests per second.
In call.py I use groups.
I think, rate_limits does not work, because I have a mistake in celeryconfig.py.
How to fix that? Thanks!
When you start a celery worker with
celery -A your_app worker -l info
the default concurrency is equal to the number of the cores your machine has. So,eventhough you set a rate limit of '1/s', it is trying to process multiple tasks concurrently.
Also setting a rate_limit in celery_config is a bad idea. Now you have only one task, if you add new tasks to your app, the rate limits will affect each other.
A simple way to achieve your one task per one second is this.
tasks.py
import time
from celery import Celery
app = Celery('tasks', backend='amqp', broker='amqp://guest#localhost//')
#app.task()
def task1():
time.sleep(1)
return('task1')
Now start you worker with a concurrency of ONE
celery -A my_taks.py worker -l info -c 1
This will execute only one task per second. Here is my log with the above code.
[2014-10-13 19:27:41,158: INFO/MainProcess] Received task: task1[209008d6-bb9d-4ce0-80d4-9b6c068b770e]
[2014-10-13 19:27:41,161: INFO/MainProcess] Received task: task1[83dc18e0-22ec-4b2d-940a-8b62006e31cd]
[2014-10-13 19:27:41,168: INFO/MainProcess] Received task: task1[e1b25558-0bb2-405a-8009-a7b58bbfa4e1]
[2014-10-13 19:27:41,171: INFO/MainProcess] Received task: task1[2d864be0-c969-4c52-8a57-31dbd11eb2d8]
[2014-10-13 19:27:42,335: INFO/MainProcess] Task task1[209008d6-bb9d-4ce0-80d4-9b6c068b770e] succeeded in 1.170940883s: 'task1'
[2014-10-13 19:27:43,457: INFO/MainProcess] Task task1[83dc18e0-22ec-4b2d-940a-8b62006e31cd] succeeded in 1.119711205s: 'task1'
[2014-10-13 19:27:44,605: INFO/MainProcess] Task task1[e1b25558-0bb2-405a-8009-a7b58bbfa4e1] succeeded in 1.1454614s: 'task1'
[2014-10-13 19:27:45,726: INFO/MainProcess] Task task1[2d864be0-c969-4c52-8a57-31dbd11eb2d8] succeeded in 1.119111023s: 'task1'

Can't see my celery logs when running beat

I am starting celery via supervisord, see the entry below.
[program:celery]
user = foobar
autostart = true
autorestart = true
directory = /opt/src/slicephone/cloud
command = /opt/virtenvs/django_slice/bin/celery beat --app=cloud -l DEBUG -s /home/foobar/run/celerybeat-schedule --pidfile=/home/foobar/run/celerybeat.pid
priority = 100
stdout_logfile_backups = 0
stderr_logfile_backups = 0
stdout_logfile_maxbytes = 10MB
stderr_logfile_maxbytes = 10MB
stdout_logfile = /opt/logs/celery.stdout.log
stderr_logfile = /opt/logs/celery.stderr.log
pip freeze | grep celery
celery==3.1.0
But any usage of:
#celery.task
def test_rabbit_running():
import logging
from celery.utils.log import get_task_logger
logger = get_task_logger(__name__)
logger.setLevel(logging.DEBUG)
logger.info("foobar")
doesn't show up in the logs. Instead I get entries like the following.
celery.stdout.log
celery beat v3.1.0 (Cipater) is starting.
__ - ... __ - _
Configuration ->
. broker -> redis://localhost:6379//
. loader -> celery.loaders.app.AppLoader
. scheduler -> celery.beat.PersistentScheduler
. db -> /home/foobar/run/celerybeat-schedule
. logfile -> [stderr]#%DEBUG
. maxinterval -> now (0s)
celery.stderr.log
[2013-11-12 05:42:39,539: DEBUG/MainProcess] beat: Waking up in 2.00 seconds.
INFO Scheduler: Sending due task test_rabbit_running (retail.tasks.test_rabbit_running)
[2013-11-12 05:42:41,547: INFO/MainProcess] Scheduler: Sending due task test_rabbit_running (retail.tasks.test_rabbit_running)
DEBUG retail.tasks.test_rabbit_running sent. id->34268340-6ffd-44d0-8e61-475a83ab3481
[2013-11-12 05:42:41,550: DEBUG/MainProcess] retail.tasks.test_rabbit_running sent. id->34268340-6ffd-44d0-8e61-475a83ab3481
DEBUG beat: Waking up in 6.00 seconds.
What do I have to do to make my logging calls appear in the log files?
It doesn't log anything because it doesn't execute any tasks (and it's ok).
See also Celerybeat not executing periodic tasks
I'd try to put the call to log inside a task as the name of the util function implies get_task_logger, or just start with a simple print, or have your own log set up as suggested in Django Celery Logging Best Practice (best way to go IMO)

Categories