Python framework for task execution and dependencies handling - python

I need a framework which will allow me to do the following:
Allow to dynamically define tasks (I'll read an external configuration file and create the tasks/jobs; task=spawn an external command for instance)
Provide a way of specifying dependencies on existing tasks (e.g. task A will be run after task B is finished)
Be able to run tasks in parallel in multiple processes if the execution order allows it (i.e. no task interdependencies)
Allow a task to depend on some external event (don't know exactly how to describe this, but some tasks finish and they will produce results after a while, like a background running job; I need to specify some of the tasks to depend on this background-job-completed event)
Undo/Rollback support: if one tasks fail, try to undo everything that has been executed before (I don't expect this to be implemented in any framework, but I guess it's worth to ask..)
So, obviously, this looks more or less like a build system, but I don't seem to be able to find something that will allow me to dynamically create tasks, most things I've seem already have them defined in the "Makefile".
Any ideas?

I've been doing a little more research and I've stumbled upon doit which provides the core functionality I need, without being overkill (not saying that Celery wouldn't have solved the job, but this does it better for my use case).

Another option is to use make.
Write a Makefile manually or let a python script write it
use meaningful intermediate output file stages
Run make, which should then call out the processes. The processes would be a python (build) script with parameters that tell it which files to work on and what task to do.
parallel execution is supported with -j
it also deletes output files if tasks fail
This circumvents some of the python parallelisation problems (GIL, serialisation).
Obviously only straightforward on *nix platforms.

AFAIK, there is no such framework in python which does exactly what you describe. So your options include either building something on your own or hack some bits of your requirements and model them using an existing tool. Which smells like celery.
You may have a celery task which reads a configuration file which contains some python functions' source code, then use eval or ast.literal_eval to execute them.
Celery provides a way to define subtasks (dependencies between tasks), so if you are aware of your dependencies, you can model them accordingly.
Provided that you know the execution order of your tasks you can route them to as many worker machines as you want.
You can periodically poll this background job's result and then start your tasks that are dependent on it.
Undo/Rollback: this might be tricky and depends on what you want to undo; results? state?

Related

What's the difference between FastAPI background tasks and Celery tasks?

Recently I read something about this and the point was that celery is more productive.
Now, I can't find detailed information about the difference between these two and what should be the best way to use them.
Straight from the documentation:
If you need to perform heavy background computation and you don't
necessarily need it to be run by the same process (for example, you
don't need to share memory, variables, etc), you might benefit from
using other bigger tools like Celery.
They tend to require more complex configurations, a message/job queue
manager, like RabbitMQ or Redis, but they allow you to run
background tasks in multiple processes, and especially, in multiple
servers.
To see an example, check the Project Generators, they all include
Celery already configured.
But if you need to access variables and objects from the same
FastAPI app, or you need to perform small background tasks (like
sending an email notification), you can simply just use
BackgroundTasks.
Have a look at this answer as well.

Best practice submitting SLURM jobs via Python

This is kind of a general best practice question.
I have a Python script which iterates over some arguments and calls another script with those arguments (it's basically a grid search for some simple Deep Learning models). This works fine on my local machine, but now I need the resources of my unis computer cluster, which uses SLURM.
I have some logic in the python script that I think would be difficult to implement, and maybe out of place, in a shell script. I also can't just throw all the jobs at the cluster at once, because I want to skip certain parameter combination depending on the outcome (loss) of others. Now I'd like to submit the SLURM jobs directly from my python script and still handle the more complexe logic there. My question now is what the best way to implement something like this is and if running a python script on the login node would be bad mannered. Should I use the subprocess module? Snakemake? Joblib? Or are there other, more elegant ways?
Snakemake and Joblib are valid options, they will handle the communication with the Slurm cluster. Another possibility is Fireworks. This one is a bit more tedious to get running ; it needs a MongoDB database, and has a vocabulary that needs getting used to, but in the end it can do very complex stuff. You can for instance create a workflow that submits jobs to multiple clusters and run other jobs dependent of the output of the previous ones, and automatically re-submit the ones that failed, with other parameters if needed.

What options exist for segregating python environments in a mult-user dask.distributed cluster?

I'm specifically interested in avoiding conflicts when multiple users upload (upload_file) slightly different versions of the same python file or zip contents.
It would seem this is not really a supported use case as the worker process is long-running and subject to the environment changes/additions of others.
I like the library for easy, on-demand local/remote context switching, so would appreciate any insight on what options we might have, even if it means some seamless deploy-like step for user-specific worker processes.
Usually the solution to having different user environments is to launch and destroy networks of different Dask workers/schedulers on the fly on top of some other job scheduler like Kubernetes, Marathon, or Yarn.
If you need to reuse the same set of dask workers then you could also be careful about specifying the workers= keyword consistently, but this would be error prone.

Recommendations for a script queuer

There probably are such applications, but couldn't find them so I'm asking a question here.
I'd like a dynamic python script scheduler.
It runs one script from beginning to end, and then reads the next script in queue, and executes it.
I'd like to dynamically add new python scripts into the queue (and probably also delete it as well)
If I do a list job, it should show me the list of all jobs that have been executed, and those that are still in the queue.
Do you know any program that provides such functionality?
I know a Load Sharing Facility, but I don't need to distribute jobs to clusters,
I just need to queue jobs on my machine...
If you need an in-process scheduler, you can try APScheduler, which is quite simple to use and thoroughly documented. You can even build a custom scheduler program based on APScheduler, communicating with a minimal GUI through an SQL database
If you are looking for very basic functionality, you could also have a program polling a chosen directory (the pending jobs pool) for e.g. batch files. Each time it finds one, it moves it to another directory (the running jobs pool), runs it and finally moves it to a final directory (the executed jobs pool). Adding a job is as simple as adding a batch file, monitoring the queue consists in viewing the pending jobs's directory contents, etc.
This script can IMHO take less time to write than learning to use an advanced scheduler (let aside APScheduler).

spawn safe, platform-independent dummy process in Python

I'm writing some code that needs to run on different OS platforms and interact with separate processes. To write tests for it, I need to be able to create processes from python that do nothing but wait to be signaled to stop. I would like to be able to create some processes that recursively create more.
Also (this part might be a little strange), it would be best for my testing if I were able to create processes that weren't children of the creating process, so I could emulate conditions where, e.g., os.waitpid won't have permission to interact with the process, or where one process signals a factory to create a process rather than creating it directly.
If you're using Python 2.6 the multiprocessing package has some stuff you might find useful.
There's a very simple example on my github. If you run spawner it will create 3 processes that run seperately, but use a channel to talk back to the spawner. So if you kill the spawner process the others you have started will die. I'm afraid there's a lot of redundant code in here, I'm in the middle of a refactoring, but I hope it gives a basic idea.

Categories