The title basically says it all. I have gunicorn running my app with 5 workers. I have a data structure that all the workers need access to that is being updated on a schedule by apscheduler. Currently apscheduler is being run once per worker, but I just want it run once period. Is there a way to do this? I've tried using the --preload option, which let's me load the shared data structure just once, but doesn't seem to let all the workers have access to it when it updates. I'm open to switching to uWSGI if that helps.
I'm not aware of any way to do this with either, at least not without some sort of RPC. That is, run APScheduler in a separate process and then connect to it from each worker. You may want to look up projects like RPyC and Execnet to do that.
Related
I'm working on a flask framework trying to schedule a job that will be triggered in 30 min from lunch and will happen only once.
I tried to work with threading.Timer, But since my job calling a REST request I'm getting RunTimeError: 'working outside of request context' which I just couldn’t solve.
From this thread, I understand that it is not recommended using the threading module on a flask server:
How do you schedule timed events in Flask?
So I'm looking for a solution for a timed trigger job (which doesn’t work on intervals).
It looks like APscheduler must be interval based.
I would be grateful for any help.
The apscheduler add_job method can take a date trigger that will allow you to do what you want.
Pro tips:
If you use apscheduler inside your flask app process, when going into production with a wsgi server like gunicorn or uwsgi you will hand up with your job being run multiple time(one for each flask worker).
When facing this issue the gunicorn --preload option didn't cut it for me.
So:
You can use flask-apscheduler with his rest server approach if that suits you.
Or separate the apscheduler into a daemon and
use uwsgi mules,
or keep gunicorn running only the web app and use supervisor(or an equivalent) to start the scheduler daemon.
IMHO the separation of gunicorn/flask and apscheduler into two part and use of supervisor is the cleanest yet not so complex solution.
I'm in need of a way to execute external long running processes from a web app written in Django and Python.
Right now I'm using Supervisord and the API. My problem with this solution is that it's very static. I need to build the commands from my app instead of having to pre configure Supervisord with all possible commands. The argument and the command is dynamic.
I need to execute the external process, save a pid/identifier and later be able to check if it's still alive and running and stop the process.
I've found https://github.com/mnaberez/supervisor_twiddler to add processes on the fly to supervisord. Maybe that's the best way to go?
Any other ideas how to best solve this problem?
I suggest you have a look at this post:
Processing long-running Django tasks using Celery + RabbitMQ + Supervisord + Monit
As the title says, there are a few additional components involved (mainly celery and rabbitMQ), but these are good and proven technologies for this kind of requirement.
I've been able to deploy a test application by using pyramid with pserve and running pceleryd (I just send an email without blocking while it is sent).
But there's one point that I don't understand: I want to run my application with mod_wsgi, and I don't understand if I can can do it without having to run pceleryd from a shell, but if I can do something in the virtualhost configuration.
Is it possible? How?
There are technically ways you could use Apache/mod_wsgi to manage a process distinct from that handling web requests, but the pain point is that Celery will want to fork off further worker processes. Forking further processes from a process managed by Apache can cause problems at times and so is not recommended.
You are thus better of starting up Celery process separately. One option is to use supervisord to start it up and manage it.
I want to give celery a try. I'm interested in a simple way to schedule crontab-like tasks, similar to Spring's quartz.
I see from celery's documentation that it requires running celeryd as a daemon process. Is there a way to refrain from running another external process and simply running this embedded in my django instance? Since I'm not interested in distributing the work at the moment, I'd rather keep it simple.
Add CELERY_ALWAYS_EAGER=True option in your django settings file and all your tasks will be executed locally. Seems like for the periodic tasks you have to execute celery beat as well.
I have written a Django app that makes use of Python threading to create a web spider, the spider operates as a series of threads to check links.
When I run this app using the django test server (built in), the app runs fine and the threads seem to start and stop on time.
However, running the app on Apache it seems the threads aren't kicking off and running (after about 80 seconds there should be a queued database update and these changes aren't occuring).
Does anyone have an idea what I'm missing here?
-- Edit: My question is, how does Apache handle threaded applications, i.e. is there a limit on how many threads can be run from a single app?
Any help would be appreciated!
Most likely, you are missing the creation of new processes. Apache will not run in a single process, but fork new processes for requests every now and then (depending on a dozen or so configuration parameters). If you run django in each process, they will share no memory, and the results produced in one worker won't be visible to any of the others. In addition, the Apache process might terminate (on idle, or after a certain time), discarding your in-memory results.