Using RQ with a Docker container hosted on Heroku - python

A client has a goal to use Heroku for a project, but I'm stuck on how to make it work.
Basically, I need to run a function that takes around 15 seconds and depends on a custom repo, scipy, and some other dependencies that are typically NOT hosted on Heroku. So I turned the app into a docker container and pushed it to Heroku. So far so good.
The results of that function need to be returned via an API response. So I planned to use RQ for a task queue, and set up a worker process. Since my app is already using Docker, I had to stick with another docker container for the worker.
I can submit the task via the main app, and the worker takes over. However, the worker is in a separate docker container and cannot import the function. If I move the function completely to the worker, then I have the same problem in that I am unable to import the function into the main app when I call enqueue.
Does anybody have an idea how to resolve this? I feel like it is a complete mess right now.

Related

How can I properly kill a celery task in a kubernetes environment?

How can I properly kill celery tasks running on containers inside a kubernetes environment? The structure of the whole application (all written in Python) is as follows:
A SDK that makes requests to our API;
A Kubernetes structure with one pod running the API and other pods running celery containers to deal with some long-running tasks that can be triggered by the API. These celery containers autoscale.
Suppose we call a SDK method that in turn makes a request to the API that triggers a task to be run on a celery container. What would be the correct/graceful way to kill this task if need be? I am aware that celery tasks have a revoke() method, but I tried using this approach and it did not work, even using terminate=True and signal=signal.SIGKILL (maybe this has something to do with the fact that I am using Azure Service Bus as a broker?)
Perhaps a mapping between a celery task and its corresponding container name would help, but I could not find a way to get this information as well.
Any help and/or ideas would be deeply appreciated.
The solution I found was to write to file shared by both API and Celery containers. In this file, whenever an interruption is captured, a flag is set to true. Inside the celery containers I keep periodically checking the contents of such file. If the flag is set to true, then I gracefully clear things up and raise an error.

Django asynchronous tasks locally

I have a web application that runs locally only (does not run on a remote server). The web application is basically just a ui to adjust settings and see some information about the main application. Web UI was used as opposed to a native application due to portability and ease of development. Now, in order to start and stop the main application, I want to achieve this through a button in the web application. However, I couldn't find a suitable way to start a asynchronous and managed task locally. I saw there is a library called celery, however that seems to be suitable to a distributed environment, which mine is not.
My main need to be able to start/stop the task, as well as the check if the task is running (so I can display that in the ui). Is there any way to achieve this?
celery can work just fine locally. Distributed is just someone else's computer after all :)
You will have to install all the same requirements and the like. You can kick off workers by hand, or as a service, just like in the celery docs.

How do I Schedule timed events in python Flask?

I'm working on a flask framework trying to schedule a job that will be triggered in 30 min from lunch and will happen only once.
I tried to work with threading.Timer, But since my job calling a REST request I'm getting RunTimeError: 'working outside of request context' which I just couldn’t solve.
From this thread, I understand that it is not recommended using the threading module on a flask server:
How do you schedule timed events in Flask?
So I'm looking for a solution for a timed trigger job (which doesn’t work on intervals).
It looks like APscheduler must be interval based.
I would be grateful for any help.
The apscheduler add_job method can take a date trigger that will allow you to do what you want.
Pro tips:
If you use apscheduler inside your flask app process, when going into production with a wsgi server like gunicorn or uwsgi you will hand up with your job being run multiple time(one for each flask worker).
When facing this issue the gunicorn --preload option didn't cut it for me.
So:
You can use flask-apscheduler with his rest server approach if that suits you.
Or separate the apscheduler into a daemon and
use uwsgi mules,
or keep gunicorn running only the web app and use supervisor(or an equivalent) to start the scheduler daemon.
IMHO the separation of gunicorn/flask and apscheduler into two part and use of supervisor is the cleanest yet not so complex solution.

Celery tasks functions - web server vs remote server

I'm willing to send tasks from a web server (running Django) to a remote machine that is holding a Rabbitmq server and some workers that I implemented with Celery.
If I follow the Celery way to go, it seems I have to share the code between both machines, which means replicating the workers logic code in the web app code.
So:
Is there a best practice to do that? Since code is redundant, I am thinking about using a git submodule (=> replicated in the web app code repo, and in the workers code repo)
Should I better use something else than Celery then?
Am I missing something?
One way to manage this is to store your workers in your django project. Django and celery play nice to each other allowing you to use parts of your django project in your celery app. http://celery.readthedocs.org/en/latest/django/first-steps-with-django.html
Deploying this would mean that your web application would not use the modules involved with your celery workers, and on your celery machine your django views and such would never be used. This usually only results in a couple of megs of unused django application code...
You can use send_task. It takes same parameters than apply_async but you only have to give the task name. Without loading the module in django you can send tasks:
app.send_task('tasks.add', args=[2, 2], kwargs={})
http://celery.readthedocs.org/en/latest/reference/celery.html#celery.Celery.send_task

What is the proper way to deploy the same application/code to Elastic Beanstalk server and worker environments?

So I have a web service (flask + MySQL + celery) and I'm trying to figure out the proper way to deploy it on Elastic Beanstalk into separate Web Server and Worker environments/tiers. I currently have it working by launching the worker (using this answer) on the same instance as the web server, but obviously I want to have the worker(s) running in a separately auto-scaled environment. Note that the celery tasks rely on the main server code (e.g. making queries, etc) so they cannot be separated. Essentially it's an app with two entry points.
The only way I can think to do this is by having the code/config-script examine some env variable (e.g. ENV_TYPE = "worker" or "server") to determine whether to launch the standard flask app, or the celery worker.
The other caveat here is that I would have to "eb deploy" my code to two separate environments (server and worker), when I'd like/expect them to be deployed simultaneously since both use the same code base.
Apologies if this has been asked before, but I've looked around a lot and couldn't find anything, which I find surprising since this seems like a common use case.
Edit: Just found this answer, which addresses my concern for deploying twice (I guess it's technically deploy once and then update two environments, easily scriptable). But my question regarding how to bootstrap the application into server vs worker mode still stands.
Regarding the bootstrapping, if you setup an environment variable for an Elastic Beanstalk environment (docs here), then you never have to touch it again when you re-deploy your code with your script. You only need to add the environment variable if you create a new environment.
Thus when starting up, you can just check in Python for that ENV variable and then bootstrap from there and load what you need.
My preference is instead of creating a enum by specifying "worker" or "server", just do a boolean for the env variable like ENV_WORKER=1 or something. It'll remove possibility of typing mistakes and be easier to read.
if os.environ.get('ENV_WORKER') is not None:
# Bootstrap worker stuff here
else:
# Specific stuff for server here

Categories