Celery missing backend result - python

I tried using Celery for some I/O tasks for my project, but I reached a dead end and I would really appreciate your help with this if possible.
So what I'm trying to achieve basically is to first run a group of the same task on multiple remote machines (e.g. copying some files on them) and after that run another group of the another task type (e.g. installing a Python module on the machines).
I've tried implementing this stuff like this:
final_job = chain( group(copy_files_job) | HandleResults().s(), group(install_module_job) | HandleResults().s())
result = final_job.delay()
What I would like to achieve with this is to also report for each group of tasks the result back to a web interface. I'm not entirely sure if this is the correct way of doing what I want with Celery.
But running this will return a NotImplementedError: Starting chords requires a result backend to be configured. This is not true though, because I use Redis both as broker and result backend and it works well if I don't add the second group of tasks with the handle results task (group(install_module_job) | HandleResults().s())).
So it's clear that the error is not the correct one. I guess Celery tries to tell me that I configured the final_job the wrong way, but I really don't know how else can I write what I'm trying to achieve.

Related

Handling background processes in kubernetes with python

I'm working on a project which uses kubernetes to manage a collection of flask servers and stores its data in redis. I have to run a lot of background tasks which handle and process data, and also check on the progress of that data processing. I'd like to know if there are frameworks or guides on how to do this optimally as my current setup leaves me feeling like it's suboptimal.
Here's basically how I have it set up now:
def process_data(data):
# do processing
return processed
def run_processor(data_key):
if redis_client.exists(f"{data_key}_processed", f"{data_key}_processing") > 0:
return
redis_client.set(f"{data_key}_processing", 1)
data = redis_client.get(data_key)
processed = process_data(data_key)
redis_client.set({f"{data_key}_processed": processed})
redis_client.delete(f"{data_key}_processing")
#app.route("start/data/processing/endpoint")
def handle_request():
Thread(target=run_processor, args=data_key).start()
return jsonify(successful=True)
The idea is that I can call the handle_request endpoint as many times as I want and it will only run if the data is not processed and there isn't any other process already running, regardless of which pod is running it. One flaw I've already noticed is that the process could fail and leave f'{data_key}_processing' in place. I could fix that by adding and refreshing a timeout, but it feels hacky to me. Additionally, I don't have a good way to "check in" on a process which is currently running.
If there are any useful resources or even just terms I could google the help would be much obliged.

Django background infinite loop process management

I'm having trouble trying to figure out the best way to run a background process with Django with specific requirements.
What I would like to be able to do:
Process(es) run(s) on an infinite loop once started (will need 2 background processes, no more, no less)
Start/Stop/Get_Status of each process
Able to access Postgres DB (rules out subprocess module(I think))
Even when no users have accessed the website, the process continues to run in the background if started.
Edit:
When the task that I need to run starts, it has to initialize itself with DB information in order to gather what it needs. After initialization, it compares new information with it's prior results in order to get a delta value. Unfortunately re-initializing each time the task runs defeats this purpose and it must run in a continuous loop unless intentionally stopped by the user.
Options I have considered but haven't been able to find reliable documentation on how I can do what I want to be able to do:
Celery
RQ
django-background-task
My requirements.txt in virtualenv (currently trying to get celery working):
1 amqp==1.4.7
2 anyjson==0.3.3
3 billiard==3.3.0.21
4 celery==3.1.19
5 Django==1.8.6
6 django-crispy-forms==1.5.2
7 kombu==3.0.29
8 psycopg2==2.6.1
9 pytz==2015.7
10 redis==2.10.5
11 requests==2.8.1
12 uWSGI==2.0.11.2
13 wheel==0.24.0
If I didn't supply enough information about my problem, I apologize in advance (this is my first time posting).
I think Celery is just for you. You can take a look at pereodic tasks for some background tasks.
Also it's very easy to start using Celery with Django. You can start learning it here.

How to continuously read data from xively in a (python) heroku app?

I am trying to write a Heroku app in python which will read and store data from a xively feed in real time. I want the app to run independently as a sort of 'backend process' to simply store the data in a database. (It does not need to 'serve up' anything for users (for site visitors).)
Right now I am working on the 'continuous reading' part. I have included my code below. It simply reads the datastream once, each time I hit my app's Heroku URL. How do I get it to operate continuously so that it keeps on reading the data from xively?
import os
from flask import Flask
import xively
app = Flask(__name__)
#app.route('/')
def run_xively_script():
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
return "level is %s" %(level)
I am new to web development, heroku, and python... I would really appreciate any help(pointers)
{
PS:
I have read about Heroku Scheduler and from what I understand, it can be used to schedule a task at specific time intervals and when it does so, it starts a one-off dyno for the task. But as I mentioned, my app is really meant to perform just one function->continuously reading and storing data from xively. Is it necessary to schedule a separate task for that? And the one-off dyno that the scheduler will start will also consume dyno hours, which I think will exceed the free 750 dyno-hours limit (as my app's web dyno is already consuming 720 dyno-hours per month)...
}
Using the scheduler, as you and #Calumb have suggested, is one method to go about this.
Another method would be for you to setup a trigger on Xively. https://xively.com/dev/docs/api/metadata/triggers/
Have the trigger occur when your feed is updated. The trigger should POST to your Flask app, and the Flask app can then take the new data, manipulate it and store it as you wish. This would be the most near realtime, I'd think, because Xively is pushing the update to your system.
This question is more about high level architecture decisions and what you are trying to accomplish than a specific thing you should do.
Ultimately, Flask is probably not the best choice for an app to do what you are trying to do. You would be better off with just pure python or pure ruby. With that being said, using Heroku scheduler (which you alluded to) makes it possible to do something like what you are trying to do.
The simplest way to accomplish your goal (assuming that you want to change minimal amount of code and that constantly reading data is really what you want to do. Both of which you should consider) is to write a loop that runs when you call that task and grabs data for a few seconds. Just use a for loop and increment a counter for however many times you want to get the data.
Something like:
for i in range(0,5):
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
time.sleep(1)
However, Heroku has limits on how long something can run before it returns a value. Otherwise the router will return a 503 or 500. But you could use the scheduler to then schedule this to run every certain amount of time.
Again, I think that Flask and Heroku is not the best solution for what it sounds like you are trying to do. I would review your use case and go back to the drawing board on what the best method to accomplish it our.

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

Celery, Django.. making a Task / thread launch sub-task / threads?

I'm using celery with django and am trying to get a Task, like the one below:
class task1 (Task)
def run (self):
launch_some_other_task.delay()
But it doesn't seem to be working, I can go into more detail as far as my code but figured I would just ask first if this sort of thing will work as doesn't seem to be working for me. I am finding this necessary as I am using selenium, a web testing framework, where sometimes it will just hang where I can't get any output from this so I want to be able to kill if off if a certain condition isn't met (updating a memcache variable with a certain value within a specified number of seconds).
Thanks for any advice on this
Make sure you've added the following to your urls.py:
import djcelery
djcelery.setup_loader()

Categories