Should I use plain Python code or Celery? Django. - python

I have a heavy function (a lot of calculations are done) which outputs a individual number for each user in my Django project. This number changes just a little over time so to minimize the server load I thought about running the function once a day, save the output and just reference the output. I know that these kinda things are usually handled with Celery but the package requires a lot of site packages and extra modules so I thought about writing a simple function like:
x0 = #last.time function was called
x1 = datetime.now
if x0-x1 > 1 day:
def whatever():
....
x0 = datetime.now
return ....
I like to keep my code clean and not to install Packages which are not really required so I would like to know if there are any downsides by "just" using Python or any gain when I would do that with Celery. The task does not need to be asynchronous so I don't care about that.
Is there a clear "Use case" when Celery should be used and when not? Is there a performance loss/gain?
I hope somebody can explain that properly.

Celery is a clear winner but I would like to explain this with pros and cons.
Pros:
You can control celery from Django very easily. Running a celery task, cancelling task, checking state/progress of task can be done within django.
A periodical task running with celery is very simple, just register the task from django run the celery worker and voila you are done. No need to mess around with crontab or background processes.
Celery is very easy to setup and run. You might already know that if you have gone through the introduction of celery.
Cons
One of the cons is that you need to have at least one result backend with either redis, rabbitmq or any other one running with celery for queuing purposes. Although RabbitMq is not a heavy you need to install it once.
One more is that celery worker itself takes some memory but that won't be an issue if you are on a server, on local memory consumption might seem high to you.
I would suggest celery because it would provide you more control over your task rather than a simple background process.

Related

Which way is the best for running background processes?

On the server-side: I need a way to execute some tasks in the background, frequently and start it at a specific time.
My programming language is Python for the back-end(Sanic Framework), VueJs for the front-end, MongoDB as main DB and the Redis for caching.
Also, I'm using a Docker container(docker-compose).
Also, I worked before with the Celery but I want to know what is the best solution for production that guarantees it's stable and reliable.
On the client-side: For the mentioned question, I need to run it on the server-side, sometimes I need to run a job scheduler on clients, embedded devices such as Raspberry Pi that could run Python or JavaScript.
So, What are your solutions for these use cases?
In production we have both long and short-running tasks and in total our Celery cluster executes up to 6M tasks per day, so naturally I would recommend Celery. It is made for this purpose and if you are a Python developer you have another reason to pick Celery. Finally, Celery is the only Python task queue system known to me that has HA scheduler (https://github.com/mixkorshun/celery-beatx and https://github.com/sibson/redbeat).
There are two other (Python) projects that should be mentioned as alternatives to Celery - Huey (https://github.com/coleifer/huey) and Apache Airflow (https://github.com/apache/airflow).
I'm one of the core devs for Sanic. I would agree with the other answers that Celery is a great option. For anyone in need of a more light weight solution, I have a post about an alternative approach only inside Sanic: https://community.sanicframework.org/t/how-to-use-asyncio-queues-in-sanic/166/4
Starting a new process in the background in python is as simple as calling os.fork(). For a comprehensive example, see https://python-course.eu/forking.php
EDIT:
For a fully featured solution, I'd recommend forking a background process as described above, and then using a library like https://github.com/dbader/schedule to execute jobs at scheduled intervals in that background process.

Django: How to ignore tasks with Celery?

Without changing the code itself, Is there a way to ignore tasks in Celery?
For example, when using Django mails, there is a Dummy Backend setting. This is perfect since it allows me, from a .env file to deactivate mail sending in some environments (like testing, or staging). The code itself that handles mail sending is not changed with if statements or decorators.
For celery tasks, I know I could do it in code using mocks or decorators, but I'd like to do it in a clean way that is 12factors compliant, like with Django mails. Any idea?
EDIT to explain why I want to do this:
One of the main motivation behind this, is that it creates coupling between Django web server and Celery tasks.
For example, when running unit tests, if the broker server (Redis for me) is not running, then if delay() method is called, it freezes forever, because there is no timeout when Celery tries to send a task to Redis.
From an architecture view, this is very bad. I'd like my unit tests can run properly without the requirement to run a Celery broker!
Thanks!
As far as the coupling is concerned, your Django application would still be tied to celery if you use a dummy backend. Just your tasks won't execute. Maybe this is acceptable in your case but in my opinion, it can cause some problems. For example, if the piece of code you are trying to test, submits a task to celery, and in a later part, tries to retrieve the result for that task, it will fail. Because the dummy backend will never execute the task.
For unit testing, as you mentioned in your question, you can use the task_always_eager setting. If you turn it on, your Django app will no longer depend upon a running worker. It will execute tasks in the same thread in a synchronous fashion and return the result.

Tool selection - is Celery the right system to use in this instance?

I have been trying to learn more about Celery, but it's difficult to understand what's literal in terms of "workers" and the "queue" and what it actually means in terms of programming. I apologize if this question is very basic, but I can't seem to find a straight answer in simple terms.
I have a Flask/Python app that I want to add a task assignment functionality to. For example, when one user completes a task, I want them to be able to flag it for their team members to check. Additionally, I'd like to be able to schedule task assignments, for example have a user complete 10 of a category of a task per week.
Celery seems like a good way to queue tasks and ensure that they are being completed, but the focus of this system seems to be scheduling resource intensive processes for asynchrous processing, not literal task assignment and queueing.
My question boils down to: is Celery the proper tool for assigning tasks to users, even if it's not for the purpose of resource saving? Have I misunderstood what the capabilities of Celery are? If so, what would be the tools to use to implement this feature?
Thank you!
Celery is meant to for handling the queuing of automated tasks on computer workers. I recommend against using it for assigning tasks to your users.
If you're looking for a lightweight solution for storing queues of tasks for users, you could use lists in Redis: https://redis.io/topics/data-types
You could also probably use any database that you're already using.

Celery django explanation

I have been learning about django recently and have stumbled upon celery. I don't seem to understand what it does. I've been to their site to no avail. Can anyone explain to me the concept and it's real world applications (in simple terms)?
Celery is an "asynchronous task queue/job queue based on distributed message passing". It is just a task queue, or something that one puts tasks into to do as soon as possible. You have a celery instance that you integrate directly with your django or python app- this is what you use to talk to celery. Then, you can configure celery to have 'workers' that perform the tasks you give them. The whole point is to be able to do tasks that don't fit within the normal request/response cycle very well that django handles so well.
What kinds of tasks are these? Well, as said before, they don't fit into the normal request/response cycle. The best example I can think of is emails- if you're building a web app and you want to keep your users, you need to keep them engaged and coming back, and a good way to do that is by sending emails. You send them once a week or once a day and they can maybe configure when to send. This would fit horribly within the request/response cycle, but it's perfect for something like Celery.
Other examples are long-running jobs with lots of computation. While you would typically use something like Hadoop for really big computations, you can schedule some queries with Celery. You could also use it to schedule builds if you're doing something like Travis. The uses go on and on, but you probably get the point.

Script needs to be run as a Celery task. What consequences does this have?

My task is it to write a script using opencv which will later run as a Celery task. What consequences does this have? What do I have to pay attention to? Is it enough in the end to include two lines of code or could it be, that I have to rewrite my whole script?
I read, that Celery is a "asynchronous task queue/job queuing system based on distributed message passing", but I wont pretend to know completely what that all entails.
I try to update the question, as soon as I get more details.
Celery implies a daemon using a broker (some data hub used to queue tasks). The celeryd daemon and the broker (RabbitMQ, redis, MongoDB or else) should always run in the background.
Your tasks will be queued, this means they won't happen all at the same time. You can choose how many at the same time can be run as a maximum. The rest of them will wait for the others to finish before starting. This also means some concurrency is often expected, and that you must create tasks that play nice with others doing the same thing.
Celery is not meant to run scripts but tasks, written as python functions. You can of course execute external scripts from Python, but your entry point is always a Python function.
Celery uses Kombu, which uses a message broker to dispatch the tasks. This implies the data you pass to your tasks should be serializable.

Categories