Django: How to ignore tasks with Celery? - python

Without changing the code itself, Is there a way to ignore tasks in Celery?
For example, when using Django mails, there is a Dummy Backend setting. This is perfect since it allows me, from a .env file to deactivate mail sending in some environments (like testing, or staging). The code itself that handles mail sending is not changed with if statements or decorators.
For celery tasks, I know I could do it in code using mocks or decorators, but I'd like to do it in a clean way that is 12factors compliant, like with Django mails. Any idea?
EDIT to explain why I want to do this:
One of the main motivation behind this, is that it creates coupling between Django web server and Celery tasks.
For example, when running unit tests, if the broker server (Redis for me) is not running, then if delay() method is called, it freezes forever, because there is no timeout when Celery tries to send a task to Redis.
From an architecture view, this is very bad. I'd like my unit tests can run properly without the requirement to run a Celery broker!
Thanks!

As far as the coupling is concerned, your Django application would still be tied to celery if you use a dummy backend. Just your tasks won't execute. Maybe this is acceptable in your case but in my opinion, it can cause some problems. For example, if the piece of code you are trying to test, submits a task to celery, and in a later part, tries to retrieve the result for that task, it will fail. Because the dummy backend will never execute the task.
For unit testing, as you mentioned in your question, you can use the task_always_eager setting. If you turn it on, your Django app will no longer depend upon a running worker. It will execute tasks in the same thread in a synchronous fashion and return the result.

Related

Should I use plain Python code or Celery? Django.

I have a heavy function (a lot of calculations are done) which outputs a individual number for each user in my Django project. This number changes just a little over time so to minimize the server load I thought about running the function once a day, save the output and just reference the output. I know that these kinda things are usually handled with Celery but the package requires a lot of site packages and extra modules so I thought about writing a simple function like:
x0 = #last.time function was called
x1 = datetime.now
if x0-x1 > 1 day:
def whatever():
....
x0 = datetime.now
return ....
I like to keep my code clean and not to install Packages which are not really required so I would like to know if there are any downsides by "just" using Python or any gain when I would do that with Celery. The task does not need to be asynchronous so I don't care about that.
Is there a clear "Use case" when Celery should be used and when not? Is there a performance loss/gain?
I hope somebody can explain that properly.
Celery is a clear winner but I would like to explain this with pros and cons.
Pros:
You can control celery from Django very easily. Running a celery task, cancelling task, checking state/progress of task can be done within django.
A periodical task running with celery is very simple, just register the task from django run the celery worker and voila you are done. No need to mess around with crontab or background processes.
Celery is very easy to setup and run. You might already know that if you have gone through the introduction of celery.
Cons
One of the cons is that you need to have at least one result backend with either redis, rabbitmq or any other one running with celery for queuing purposes. Although RabbitMq is not a heavy you need to install it once.
One more is that celery worker itself takes some memory but that won't be an issue if you are on a server, on local memory consumption might seem high to you.
I would suggest celery because it would provide you more control over your task rather than a simple background process.

python raven timing out when using django logging from a celery worker

I am using raven to log from my celery jobs to sentry. I am finding that whenever I use the django logging system to log to sentry each update can take minutes (but the log succeeds). If I remove sentry from my logging configuration it is instant.
I tried reverting back to using raven directly via:
import raven
client=raven.Client("DSN")
client.captureMessage("message")
this works with no delay inside the worker.
But if I try to use the django specific client instead as below the delay exists:
from raven.contrib.django.raven_compat.models import client
client.captureMessage("message")
It is usually a little over 2 minutes so it looks like a timeout but the operation succeeds.
The delays are adding up and making my jobs queue unreliable.
If you're using the default Celery worker model things should generally just work. If you're using something else that may be less true.
The Python client by default uses a threaded worker. Meaning, upon instantiation, it creates a queue and a thread to process messages asynchronously. If this happens in various ways it could cause problems (i.e. pre-fork), or if you're using something like gevent and not patching threads.
You can try changing the transport to be synchronous to confirm this is related:
https://docs.getsentry.com/hosted/clients/python/transports/

Display progress of a long running Python task in Django

I currently have a typical Django structure set up for a project and one web application.
The web application is set up so that a user inputs some information, and this information is taken as the input to run a Python program.
This python program sometimes can take quite a while to finish (grabbing things from the web and doing some text mining scoring) - sometimes it can take multiple minutes to load.
On the command line, this program would periodically display where it was in the process (it'd first say how many things it found to score against, then it'd say where in the number of things found it is in the scoring process), which was very useful. However, when I moved this over to a Django set up, I no longer have this capability (at least, not in the same way since now this is sent to log files).
The way I set it up is that there is an input view, and then a results view. The results view takes the input and runs the Python program. It won't display the results until the entire program is run. So on the user side, the browser just sits there for sometimes minutes before the results are displayed. Obviously, this is not ideal.
Does anyone know of the best way to bring status information on a task to Django?
I've looked into Celery a little bit, but I think since I'm still a beginner in Django that I'm confusing myself with some of the documentation. For instance: even if the task is sent off asynchronously to a worker, how does the browser grab the current state of the program?? Also, consistent documentation seems to be lacking for celery on Django (I've seen people set up celery many different ways on their Django projects).
I would appreciate any input here, I've been stuck on this for a while now.
My first suggestion is to psychologically separate celery from django when you start to think of the two. They can run in the same environment, but celery is to asynchronous processes what django is to http requests.
Also remember that celery is unlike diango in that it requires other services to function; a message broker. So by using celery you will increase your architectural requirements.
To address you specific use case, you'll need a system to publish messages from each celery task to a message broker and your web client will need to subscribe to those messages.
There's a lot involved here, but the short version is that you can use Redis as your celery message broker as well as your pub/sub service to get messages back to the browser. You can then use e.g diango-redis-websockets to subscribe the browser to the task state messages in redis

Run a Celery worker that connects to the Django Test DB

BACKGROUND: I'm working on a project that uses Celery to schedule tasks that will run at a certain time in the future. These tasks push the state of the Final State Machine forward. Here's an example:
A future reminder is scheduled to be sent to the user in 2 days.
When that scheduled task runs, an email is sent, and the FSM is advanced to the next state
The next state is to schedule a reminder to run in another two days
When this task runs, it will send another email, advance state
Etc...
I'm currently using CELERY_ALWAYS_EAGER as suggested by this SO answer
The problem with using that technique in tests, is that the task code, which is meant to run in a separate thread is running in the same one as the one that schedules it. This causes the FSM state to not be saved properly, and making it hard to test. I haven't been able to determine what exactly causes it, but it seems like at the bottom of the call stack you are saving to the current state, but as you return up the call stack, a previous state is being saved. I could possibly spend more time determining what is going wrong when the code is not running how it should, but it seems more logical to try to get the code running how it should and make sure it's doing what it should.
QUESTION: I would therefore like to know if there is a way to run a full on celery setup that django can use during a test run. If it could be run automagically, that would be ideal, but even some manual intervention would be better than having to test behavior by hand. I'm thinking something could be possible if I set a break in the tests, run the celery worker to connect to the test DB, continue the django tests. Has anyone tried something like this before?
What you are trying to do is not unit testing but rather functional / integration testing.
I would recommend to use some BDD framework (Behave, Lettuce) and run BDD tests from a CI server (TravisCI or Jenkins) against external server (staging environment for example).
So, the process could be:
Push changes to GitHub
GitHub launches build on CI server
CI server runs unit tests
CI server deploys to integration environment (or staging, if you don't have integration)
CI server runs integration end to end tests against the new deployed code
If all succeeds, this build will be promoted to "can be deploy to production" or something like that

How do I get django celery to write to the test database for my functional tests?

I am working on a Django application. We're using celery to queue writes to our Mongo database. I'm trying to write a functional test (using Selenium) for a function that queues something in celery.
The problem is that celery writes to the main Mongo database instead of the test database. How can I set up my functional tests to work with an instance of celery that writes to the test database?
We're using 'django_nose.NoseTestSuiteRunner' as our TEST_RUNNER.
UPDATE:
I haven't been able to figure out how to use another instance of celery for the tests, but I have found a way to bypass celery for the functional tests.
In my settings.py:
FUNC_TEST_COMMAND=['functional']
func_test_command = filter(lambda element: element in FUNC_TEST_COMMAND, sys.argv)
if len(func_test_command) > 0:
CELERY_ALWAYS_EAGER = True
This mimics the behaviour of an AsyncResult without sending anything through a message queue when running the functional test suite. (See http://celery.readthedocs.org/en/2.4/configuration.html#celery-always-eager for more info.)
This solution is probably not ideal for functional tests, because it cuts out one of the application layers.
Using CELERY_ALWAYS_EAGER = True does indeed bypass Celery's asynchornous processing. In order to write to the test database, you'll need to start your celeryd worker using the connection settings to the test database.
I'd suggest that you take a look at LiveServerTestCase if your using an automated test client for running functional tests.
Then make sure you have a separate settings module your running your tests with that properly configures Celery to use your project's database for transport.

Categories