I'm willing to send tasks from a web server (running Django) to a remote machine that is holding a Rabbitmq server and some workers that I implemented with Celery.
If I follow the Celery way to go, it seems I have to share the code between both machines, which means replicating the workers logic code in the web app code.
So:
Is there a best practice to do that? Since code is redundant, I am thinking about using a git submodule (=> replicated in the web app code repo, and in the workers code repo)
Should I better use something else than Celery then?
Am I missing something?
One way to manage this is to store your workers in your django project. Django and celery play nice to each other allowing you to use parts of your django project in your celery app. http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html
Deploying this would mean that your web application would not use the modules involved with your celery workers, and on your celery machine your django views and such would never be used. This usually only results in a couple of megs of unused django application code...
You can use send_task. It takes same parameters than apply_async but you only have to give the task name. Without loading the module in django you can send tasks:
app.send_task('tasks.add', args=[2, 2], kwargs={})
http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.send_task
Related
I realize similar questions have been asked however they have all been about a sepficic problem whereas I don't even now how I would go about doing what I need to.
That is: From my Django webapp I need to scrape a website periodically while my webapp runs on a server. The first options that I found were "django-background-tasks" (which doesn't seem to work the way I want it to) and 'celery-beat' which recommends getting another server if i understood correctly.
I figured just running a seperate thread would work but I can't seem to make that work without it interrupting the server and vice-versa and it's not the "correct" way of doing it.
Is there a way to run a task periodically without the need for a seperate server and a request to be made to an app in Django?
'celery-beat' which recommends getting another server if i understood correctly.
You can host celery (and any other needed components) on the same server as your Django app. They would be separate processes entirely.
It's not an uncommon setup to have a Django app + celery worker(s) + message queue all bundled into the same server deployment. Deploying on separate servers may be ideal, just as it would be ideal to distribute your Django app across many servers, but is by no means necessary.
I'm not sure if this is the "correct" way but it was a cheap and easy way for me to do it. I just created custom Django Management Commands and have them run via a scheduler such as CRON or in my case I just utilized Heroku Scheduler for my app.
I am writing a Gevent/Flask server in Python. Some of the requests my Flask app takes need to run in the background; there is an endpoint for the client to poll the server for the task's result.
If you search the wisdom of the Internet for the best way to do this, everybody seems to be in favor of setting up one or several worker processes such as Celery or RQ, with a message queue or store such as RabbitMQ or Redis.
My app is small and my deployment is modest. This seems like too much of a hassle for me. I already have cooperative multitasking with Gevent, so I thought I'd just create a greenlet to do the background work in-process, that is, within the Flask app process itself.
This is not the mainstream solution, so my question is: Am I missing something? What am I missing? Is there something in this solution that makes it particularly bad?
I'm wondering if there is a way to make a request to my Django backend that runs asynchronous. At page load, I need to kick off a process that takes ~30 seconds, but while it is running I cannot perform any other actions on the page that require a response from Django (specifically waiting on data for jqGrids).
Is there an easy way to tell Django that certain methods should be run asynchronously?
Django has not a native way to do asynchronous tasks but you could see Celery and using a django-celery task.
Celery web: http://www.celeryproject.org/
Django-Celery web: https://pypi.python.org/pypi/django-celery
You need to use celery. Celery is an asynchronous task queue/job queue based on distributed message passing. You can read more about celery here.
This is a great tutorial for setting up celery.
I have a small infrastructure plan that does not include Django. But, because of my experience with Django, I really like Celery. All I really need is Redis + Celery to make my project. Instead of using the local filesystem, I'd like to keep everything in Redis. My current architecture uses Redis for everything until it is ready to dump the results to AWS S3. Admittedly I don't have a great reason for using Redis instead of the filesystem. I've just invested so much into architecting this with Docker and scalability in mind, it feels wrong not to.
I was searching for a non-Django database scheduler too a while back, but it looked like there's nothing else. So I took the Django scheduler code and modified it to use SQLAlchemy. Should be even easier to make it use Redis instead.
It turns out that you can!
First I created this little project from the tutorial on celeryproject.org.
That went great so I built a Dockerized demo as a proof of concept.
Things I learned from this project
Docker
using --link to create network connections between containers
running commands inside containers
Dockerfile
using FROM to build images iteratively
using official images
using CMD for images that "just work"
Celery
using Celery without Django
using Celerybeat without Django
using Redis as a queue broker
project layout
task naming requirements
Python
proper project layout for setuptools/setup.py
installation of project via pip
using entry_points to make console_scripts accessible
using setuid and setgid to de-escalate privileges for the celery deamon
I want to give celery a try. I'm interested in a simple way to schedule crontab-like tasks, similar to Spring's quartz.
I see from celery's documentation that it requires running celeryd as a daemon process. Is there a way to refrain from running another external process and simply running this embedded in my django instance? Since I'm not interested in distributing the work at the moment, I'd rather keep it simple.
Add CELERY_ALWAYS_EAGER=True option in your django settings file and all your tasks will be executed locally. Seems like for the periodic tasks you have to execute celery beat as well.