Using Celery for long running async jobs - python

I'm having different python programs doing long polling at different machines, and am thinking of a queuing based mechanism to manage the load and provide an async job functionality.
These programs are standalone, and aren't part of any framework.
I'm primarily thinking about Celery due to its ability for multi-processing and sharing tasks across multiple celery workers. Is celery a good choice here, or am I better off simply using an event based system with RabbitMQ directly?

I would say yes - Celery is definitely a good choice! We do have tasks that run sometimes for over 20 hours, and Celery works just fine. Furthermore it is extremely simple to setup and use (Celery + Redis is supersimple).

Related

How to execute plain Celery tasks on Airflow workers

I currently have Airflow set up and working correctly using the CeleryExecutor as a backend to provide horizontal scaling. This works remarkably well especially when having the worker nodes sit in an autoscaling group on EC2.
In addition to Airflow, I use plain Celery to handle simple asynchronous tasks (that don't need a whole pipeline) coming from Flask/Python. Until now, these plain Celery tasks were very low volume and I just ran the plain Celery worker on the same machine as Flask. There is now a requirement to run a massive number of plain Celery tasks in the system, so I need to scale my plain Celery as well.
One way to do this would be to run the plain Celery worker service on the Airflow worker servers as well (to benefit from the autoscaling etc.) but this doesn't seem to be an elegant solution since it creates two different "types" of Celery worker on the same machine. My question is whether there is some combination of configuration settings I can pass to my plain Celery app that will cause #celery.task decorated functions to be executed directly on my Airflow worker cluster as a plain Celery task, completely bypassing the Airflow middleware.
Thanks for the help.
The application is airflow.executors.celery_executor.app if I remember well. Try celery -A airflow.executors.celery_executor.app inspect active for an example in your current Airflow infrastructure to test it. However, I suggest you do not do this because your Celery tasks may affect the execution of Airflow DAGs, and it may affect the SLAs.
What we do in the company I work for is exactly what you suggested - we maintain a large Celery cluster, and we sometimes offload execution of some Airflow tasks to our Celery cluster, depending on the use-case. This is particularly handy when a task in our Airflow DAG actually triggers tens of thousands of small jobs. Our Celery cluster runs 8 million tasks on a busy day.

Queue background tasks in Python application on Windows

I am trying to build a Flask application on Windows where user uploads a big Excel file then it is processed in Python which takes 4-5 minutes. I need to process those tasks in background after user uploads the file.
I RQ, Celery, etc. but those are not working on Windows and I have never worked on Linux. I need some advice on how to achieve this.
celery and rq can work on windows but have some trouble
for rq use this
and for celery use this
I don't think it's accurate to say that you can't run RQ on Windows, it just has some limitations (as you can in the documentation).
As you can run Redis on Windows, you might want to give a try to other task queues based on Redis. One such example is huey. There are at least examples of people who were successful running it on Windows (e.g. look at this SO question).
I solved this by using WSL Linux Emulation on windows.. and running my RQ worker on WSL..
I am not sure though if I will face any issues in future but as of now its queuing and processing tasks as I desire..
info Might be useful for somebody with same problem

Django scheduled Tasks - Alternative to Cron or Independent Daemon

In creating scheduled tasks I've used both Cron and a specially set up daemon for django.
Cron is silly-simple, and the daemon (in my opinion) might be excessive. The daemon set up an independent Django instance.
Django itself (If I'm not mistaken) runs as a daemon anyway, correct?
I'm wondering - how do you schedule tasks within the Django environment without leaving off from standard use?
You can use Celery to run periodic tasks but depending on what are you trying to do it could be overkill.
If your use case it's simple, cron+management command it's way easier. You can use Kronos, django-cron or any of this libraries for this

Asynchronous replacement for Celery

We're using Celery for background tasks in our Django project.
Unfortunately, we have many blocking sockets in tasks, that can be established for a long time. So Celery becomes fully loaded and does not respond.
Gevent can help me with sockets, but Celery has only experimental support of gevent (and as I found in practice, it doesn't work well).
So I considered to switch to another task queue system.
I can choose between two different ways:
Write my own task system. This is a least preferred choice, because it requires much time.
Find good and well-tried replacement for Celery that will work after monkey patching.
Is there any analogue of Celery, that will guarantee me execution of my tasks even after sudden exit?
Zeromq might be suitable for your use case.
See- https://serverfault.com/questions/80679/how-to-pick-between-rabbitmq-and-zeromq-or-something-else
You will however need to write your own messaging library to persist messages.
Have you tried to use Celery + eventlet? It works well in our project

Running celery in django not as an external process?

I want to give celery a try. I'm interested in a simple way to schedule crontab-like tasks, similar to Spring's quartz.
I see from celery's documentation that it requires running celeryd as a daemon process. Is there a way to refrain from running another external process and simply running this embedded in my django instance? Since I'm not interested in distributing the work at the moment, I'd rather keep it simple.
Add CELERY_ALWAYS_EAGER=True option in your django settings file and all your tasks will be executed locally. Seems like for the periodic tasks you have to execute celery beat as well.

Categories