If you want to run your python script, let's say every day at 6 pm, is it better to go with a crontab entry or with a Advanced Python Scheduler solution regarding to power, memory, cpu ... consumption?
In my eyes doing a crone job is therefore better, because I do not see the advantage of permanently running an Advanced Python Scheduler.
You should probably use cron if two conditions are met;
It is available on all platforms your code needs to run on.
Starting a script on a set time is sufficient for your needs.
Mirroring these are two reasons to build your own solution:
Your program needs to be portable across many operating systems, including those that don't have cron available. (like ms-windows)
You need to schedule things in a way other than on a set start time. E.g. on a set interval, or if some other condition it met.
Agreed cron is better from resources point of view.
From functional point of view cronjob is better if your requirement is to just run a script at a specific time or schedule it on regular intervals. But if your requirement is more complicated you should check out Advance Python Schedular.
Hope it helps.
I also agree cron is better. But when you want to choose a solution, you should consider the specific requirement. Sometimes you can use Celery to do this.
Celery is an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.
Related
On the server-side: I need a way to execute some tasks in the background, frequently and start it at a specific time.
My programming language is Python for the back-end(Sanic Framework), VueJs for the front-end, MongoDB as main DB and the Redis for caching.
Also, I'm using a Docker container(docker-compose).
Also, I worked before with the Celery but I want to know what is the best solution for production that guarantees it's stable and reliable.
On the client-side: For the mentioned question, I need to run it on the server-side, sometimes I need to run a job scheduler on clients, embedded devices such as Raspberry Pi that could run Python or JavaScript.
So, What are your solutions for these use cases?
In production we have both long and short-running tasks and in total our Celery cluster executes up to 6M tasks per day, so naturally I would recommend Celery. It is made for this purpose and if you are a Python developer you have another reason to pick Celery. Finally, Celery is the only Python task queue system known to me that has HA scheduler (https://github.com/mixkorshun/celery-beatx and https://github.com/sibson/redbeat).
There are two other (Python) projects that should be mentioned as alternatives to Celery - Huey (https://github.com/coleifer/huey) and Apache Airflow (https://github.com/apache/airflow).
I'm one of the core devs for Sanic. I would agree with the other answers that Celery is a great option. For anyone in need of a more light weight solution, I have a post about an alternative approach only inside Sanic: https://community.sanicframework.org/t/how-to-use-asyncio-queues-in-sanic/166/4
Starting a new process in the background in python is as simple as calling os.fork(). For a comprehensive example, see https://python-course.eu/forking.php
EDIT:
For a fully featured solution, I'd recommend forking a background process as described above, and then using a library like https://github.com/dbader/schedule to execute jobs at scheduled intervals in that background process.
In my django project, I need to collect data from about 50 remote servers into the local database minutely or every 30-seconds. Though it works with crontab in the remote servers, I want to do this in the project. Firstly, I consider the django-celery. However it does well in asynchronous processing and the collect-data task could not be delayed. Therefore i think, it may be not fit. How if i do this use the timer for python and what need i to pay more attention. Excuse for my ignorance of python and django. I'll appreciate other advice or ideas. Many thanks
Basically you can use Celery's preiodic tasks with expire option, which makes you sure that your tasks will not be executed twice.
Also you could run your own script with infinite loop like which will run calculation. If your calculation will run more than minute you can spawn your tasks using eventlet or gevent. Other option you could creare celery-tasks from this script and be sure that your tasks executes every N seconds, as you prefer.
We need a service that we can use to schedule events. For instance, we might have a task that needs to run at 3 o'clock (one time) or that runs every 2 hours (multiple times). Preferably each task could be configured with an AMQP queue that it would publish to.
We could easily implement this by creating an OS timer event. My concern is how to recover if this service ever went down. We could use CRON if it was something that allowed scheduling on-the-fly.
I was looking for a way to avoid reinventing the wheel. If there isn't a project out there that does this already, we will just create one. This is a pretty common thing, though, so I'd be surprised if no one's put one out there by now.
Celery solves this problem.
celery.schedules lets you define periodic tasks. And you can override is_due to do things like schedule once a month. You can schedule tasks to execute at a specific time using periodic_task, or celery-beat (which I believe is now the standard approach). Yet another way is to use the eta argument to Task.apply_async.
The title is a bit fuzzy because I don't know the right vocabulary.
Here's the thing I am trying to do: I have a script/program on the server for running checks. Now my co-workers want that this script can be started from a website, and the logs viewed from there. The process can be quite long running for the checks, usually more than a few hours.
for that, I gathered, I'd have to monitor the processes with the website script, and show their logs. The chosen language would be either PHP or Python.
I'd very much like a hint or view on how such a thing is generally done and what are best practices, as I'm unsure how to start with this one. Especially a reliable way to start/monitor the processes would be much welcome.
If you choose Python check out Celery (although it may be a little bit overkill if you want to keep things simple). It allows you to run asynchronous tasks and you can easily monitor them. There is also a django integration for celery (django-celery) that includes a web monitor for the tasks.
I need a framework which will allow me to do the following:
Allow to dynamically define tasks (I'll read an external configuration file and create the tasks/jobs; task=spawn an external command for instance)
Provide a way of specifying dependencies on existing tasks (e.g. task A will be run after task B is finished)
Be able to run tasks in parallel in multiple processes if the execution order allows it (i.e. no task interdependencies)
Allow a task to depend on some external event (don't know exactly how to describe this, but some tasks finish and they will produce results after a while, like a background running job; I need to specify some of the tasks to depend on this background-job-completed event)
Undo/Rollback support: if one tasks fail, try to undo everything that has been executed before (I don't expect this to be implemented in any framework, but I guess it's worth to ask..)
So, obviously, this looks more or less like a build system, but I don't seem to be able to find something that will allow me to dynamically create tasks, most things I've seem already have them defined in the "Makefile".
Any ideas?
I've been doing a little more research and I've stumbled upon doit which provides the core functionality I need, without being overkill (not saying that Celery wouldn't have solved the job, but this does it better for my use case).
Another option is to use make.
Write a Makefile manually or let a python script write it
use meaningful intermediate output file stages
Run make, which should then call out the processes. The processes would be a python (build) script with parameters that tell it which files to work on and what task to do.
parallel execution is supported with -j
it also deletes output files if tasks fail
This circumvents some of the python parallelisation problems (GIL, serialisation).
Obviously only straightforward on *nix platforms.
AFAIK, there is no such framework in python which does exactly what you describe. So your options include either building something on your own or hack some bits of your requirements and model them using an existing tool. Which smells like celery.
You may have a celery task which reads a configuration file which contains some python functions' source code, then use eval or ast.literal_eval to execute them.
Celery provides a way to define subtasks (dependencies between tasks), so if you are aware of your dependencies, you can model them accordingly.
Provided that you know the execution order of your tasks you can route them to as many worker machines as you want.
You can periodically poll this background job's result and then start your tasks that are dependent on it.
Undo/Rollback: this might be tricky and depends on what you want to undo; results? state?