I have a page where the user selects a Python script, and then this script executes.
My issue is that some scripts take a while to execute (up to 30m) so I'd like to run them in the background while the user can still navigate on the website.
I tried to use Celery but as I'm on Windows I couldn't do better than using --pool=solo which, while allowing the user to do something else, can only do so for one user at a time.
I also saw this thread while searching for a solution, but didn't manage to really understand how it worked nor how to implement it, as well as determine if it was really answering my problem...
So here is my question : how can I have multiple thread/multiple processes on Celery while on Windows ? Or if there's another way, how can I execute several tasks simultaneously in the background ?
Have you identified whether your slow scripts belong to CPU-bound tasks or I/O bound tasks?
if they're I/O bound, you can use eventlet and gevent based on Strategy 1 in the blog from distributedpython.com
but if they're CPU bound, you may have to think of using the ways like a dedicated Celery windows box (or windows Docker container) to workaround Celery billiard issue on Windows by setting the environment variable (FORKED_BY_MULTIPROCESSING=1) based on Strategy 2 in the blog from distributedpython.com
Related
I am trying to build a Flask application on Windows where user uploads a big Excel file then it is processed in Python which takes 4-5 minutes. I need to process those tasks in background after user uploads the file.
I RQ, Celery, etc. but those are not working on Windows and I have never worked on Linux. I need some advice on how to achieve this.
celery and rq can work on windows but have some trouble
for rq use this
and for celery use this
I don't think it's accurate to say that you can't run RQ on Windows, it just has some limitations (as you can in the documentation).
As you can run Redis on Windows, you might want to give a try to other task queues based on Redis. One such example is huey. There are at least examples of people who were successful running it on Windows (e.g. look at this SO question).
I solved this by using WSL Linux Emulation on windows.. and running my RQ worker on WSL..
I am not sure though if I will face any issues in future but as of now its queuing and processing tasks as I desire..
info Might be useful for somebody with same problem
I am pretty new in the Python and at distributed systems.
I am using the ZeroMQ Venitlator-Worker-Sink configuration:
Ventilator - Worker - Sink
Everything is working fine at the moment, my problem is, that I need a lot of workers. Every worker is doing the same work.
At the moment every worker is working in his own Python file and has his own Output-Console.
If I have programm changes, I have to change (or copy) the code in every file.
Next problem is that I have to start/run every file, so it quiet annoying to start 12 files.
What are here the best solutions? Threads, processes?
I have to say that the goal is to run every worker on a diffrent raspberry pi.
This appears to be more of a dev/ops problem. You have your worker code, which is presumably a single codebase, on multiple distributed machines or instances. You make a change to that codebase and you need the resulting code to be distributed to each instance, and then the process restarted.
To start, you should at minimum be using a source control system, like Git. With such a system you could at least go to each instance and pull the most recent commit and restart. Beyond that, you could set up a system like Ansible to go out and run those actions on each instance initiated from a single command.
There's a whole host of other tools, strategies and services that will help you do those things in a myriad of different ways. Using Docker to create a single worker container and then distribute and run that container on your various instances is probably one of the more popular ways to do what you're after, but it'll require a more fundamental change to your infrastructure.
Hope this helps.
I currently has an executable that when running uses all the cores on my server. I want to add another server, and have the jobs split between the two machines, but still each job using all the cores on the machine it is running. If both machines are busy I need the next job to queue until one of the two machines become free.
I thought this might be controlled by python, however I am a novice and not sure which python package would be the best for this problem.
I liked the "heapq" package for the queuing of the jobs, however it looked like it is designed for a single server use. I then looked into Ipython.parallel, but it seemed more designed for creating a separate smaller job for every core (on either one or more servers).
I saw a huge list of different options here (https://wiki.python.org/moin/ParallelProcessing) but I could do with some guidance as which way to go for a problem like this.
Can anyone suggest a package that may help with this problem, or a different way of approaching it?
Celery does exactly what you want - make it easy to distribute a task queue across multiple (many) machines.
See the Celery tutorial to get started.
Alternatively, IPython has its own multiprocessing library built in, based on ZeroMQ; see the introduction. I have not used this before, but it looks pretty straight-forward.
I'd like to run periodic tasks on my django project, but I don't want all the complexity of celery/django-celery (with celerybeat) bundled in my project.
I'd like, also, to store the config with the times and which command to run within my SCM.
My production machine is running Ubuntu 10.04.
While I could learn and use cron, I feel like there should be a higher level (user friendly) way to do it. (Much like UFW is to iptables).
Is there such thing? Any tips/advice?
Thanks!
There are several Django-based scheduling apps, such as django-chronograph and django-chroniker and django-cron. I forked django-chronograph into django-chroniker to fix a few bugs and extend it for my own use case. I still use Celery in some projects, but like you point out, it's a bit overcomplicated and has a large stack.
In my personal opinion, i would learn how to use cron. This won't take more than 5 to 10 minutes, and it's an essential tool when working on a Linux server.
What you could do is set up a cronjob that requests one page of your django instance every minute, and have the django script figure out what time it is and what needs to be done, depending on the configuration stored in your database. This is the approach i've seen in other similar applications.
Does anyone know of a working and well documented implementation of a daemon using python? Please post a link here if you know of a project that fits these two requirements.
Three options I can think of-
Make a cron job that calls your script. Cron is a common name for a GNU/Linux daemon that periodically launches scripts according to a schedule you set. You add your script into a crontab or place a symlink to it into a special directory and the daemon handles the job of launching it in the background. You can read more at wikipedia. There is a variety of different cron daemons, but your GNU/Linux system should have it already installed.
Pythonic approach (a library, for example) for your script to be able to daemonize itself. Yes, it will require a simple event loop (where your events are timer triggering, possibly, provided by sleep function). Here is the one I recommend & use - A simple unix/linux daemon in Python
Use python multiprocessing module. The nitty-gritty of trying to fork a process etc. are hidden in this implementation. It's pretty neat.
I wouldn't recommend 2 or 3 'coz you're in fact repeating cron functionality. The Linux system paradigm is to let multiple simple tools interact and solve your problems. Unless there are additional reasons why you should make a daemon (in addition to trigger periodically), choose the other approach.
Also, if you use daemonize with a loop and a crash happens, make sure that you have logs which will help you debug. Also devise a way so that the script starts again. While if the script is added as a cron job, it will trigger again in the time gap you kept.
If you just want to run a daemon, consider Supervisor, a daemon that itself controls and manages daemons.
If you want to look at the nitty-gritty, you can check out Supervisor's launch script or some of the responses to this lazyweb request.
Check this link for a double-fork daemon: http://code.activestate.com/recipes/278731-creating-a-daemon-the-python-way/
The code is readable and well-documented. You want to take a look at chapter 13 of W. Richard's book 'Advanced Programming in the UNix Environment' for detailed information on Unix daemons.