I'm working on a flask framework trying to schedule a job that will be triggered in 30 min from lunch and will happen only once.
I tried to work with threading.Timer, But since my job calling a REST request I'm getting RunTimeError: 'working outside of request context' which I just couldn’t solve.
From this thread, I understand that it is not recommended using the threading module on a flask server:
How do you schedule timed events in Flask?
So I'm looking for a solution for a timed trigger job (which doesn’t work on intervals).
It looks like APscheduler must be interval based.
I would be grateful for any help.
The apscheduler add_job method can take a date trigger that will allow you to do what you want.
Pro tips:
If you use apscheduler inside your flask app process, when going into production with a wsgi server like gunicorn or uwsgi you will hand up with your job being run multiple time(one for each flask worker).
When facing this issue the gunicorn --preload option didn't cut it for me.
So:
You can use flask-apscheduler with his rest server approach if that suits you.
Or separate the apscheduler into a daemon and
use uwsgi mules,
or keep gunicorn running only the web app and use supervisor(or an equivalent) to start the scheduler daemon.
IMHO the separation of gunicorn/flask and apscheduler into two part and use of supervisor is the cleanest yet not so complex solution.
Related
I realize similar questions have been asked however they have all been about a sepficic problem whereas I don't even now how I would go about doing what I need to.
That is: From my Django webapp I need to scrape a website periodically while my webapp runs on a server. The first options that I found were "django-background-tasks" (which doesn't seem to work the way I want it to) and 'celery-beat' which recommends getting another server if i understood correctly.
I figured just running a seperate thread would work but I can't seem to make that work without it interrupting the server and vice-versa and it's not the "correct" way of doing it.
Is there a way to run a task periodically without the need for a seperate server and a request to be made to an app in Django?
'celery-beat' which recommends getting another server if i understood correctly.
You can host celery (and any other needed components) on the same server as your Django app. They would be separate processes entirely.
It's not an uncommon setup to have a Django app + celery worker(s) + message queue all bundled into the same server deployment. Deploying on separate servers may be ideal, just as it would be ideal to distribute your Django app across many servers, but is by no means necessary.
I'm not sure if this is the "correct" way but it was a cheap and easy way for me to do it. I just created custom Django Management Commands and have them run via a scheduler such as CRON or in my case I just utilized Heroku Scheduler for my app.
I am writing a Gevent/Flask server in Python. Some of the requests my Flask app takes need to run in the background; there is an endpoint for the client to poll the server for the task's result.
If you search the wisdom of the Internet for the best way to do this, everybody seems to be in favor of setting up one or several worker processes such as Celery or RQ, with a message queue or store such as RabbitMQ or Redis.
My app is small and my deployment is modest. This seems like too much of a hassle for me. I already have cooperative multitasking with Gevent, so I thought I'd just create a greenlet to do the background work in-process, that is, within the Flask app process itself.
This is not the mainstream solution, so my question is: Am I missing something? What am I missing? Is there something in this solution that makes it particularly bad?
The title basically says it all. I have gunicorn running my app with 5 workers. I have a data structure that all the workers need access to that is being updated on a schedule by apscheduler. Currently apscheduler is being run once per worker, but I just want it run once period. Is there a way to do this? I've tried using the --preload option, which let's me load the shared data structure just once, but doesn't seem to let all the workers have access to it when it updates. I'm open to switching to uWSGI if that helps.
I'm not aware of any way to do this with either, at least not without some sort of RPC. That is, run APScheduler in a separate process and then connect to it from each worker. You may want to look up projects like RPyC and Execnet to do that.
I'm willing to send tasks from a web server (running Django) to a remote machine that is holding a Rabbitmq server and some workers that I implemented with Celery.
If I follow the Celery way to go, it seems I have to share the code between both machines, which means replicating the workers logic code in the web app code.
So:
Is there a best practice to do that? Since code is redundant, I am thinking about using a git submodule (=> replicated in the web app code repo, and in the workers code repo)
Should I better use something else than Celery then?
Am I missing something?
One way to manage this is to store your workers in your django project. Django and celery play nice to each other allowing you to use parts of your django project in your celery app. http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html
Deploying this would mean that your web application would not use the modules involved with your celery workers, and on your celery machine your django views and such would never be used. This usually only results in a couple of megs of unused django application code...
You can use send_task. It takes same parameters than apply_async but you only have to give the task name. Without loading the module in django you can send tasks:
app.send_task('tasks.add', args=[2, 2], kwargs={})
http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.send_task
I've been able to deploy a test application by using pyramid with pserve and running pceleryd (I just send an email without blocking while it is sent).
But there's one point that I don't understand: I want to run my application with mod_wsgi, and I don't understand if I can can do it without having to run pceleryd from a shell, but if I can do something in the virtualhost configuration.
Is it possible? How?
There are technically ways you could use Apache/mod_wsgi to manage a process distinct from that handling web requests, but the pain point is that Celery will want to fork off further worker processes. Forking further processes from a process managed by Apache can cause problems at times and so is not recommended.
You are thus better of starting up Celery process separately. One option is to use supervisord to start it up and manage it.