How to continuously read data from xively in a (python) heroku app? - python

I am trying to write a Heroku app in python which will read and store data from a xively feed in real time. I want the app to run independently as a sort of 'backend process' to simply store the data in a database. (It does not need to 'serve up' anything for users (for site visitors).)
Right now I am working on the 'continuous reading' part. I have included my code below. It simply reads the datastream once, each time I hit my app's Heroku URL. How do I get it to operate continuously so that it keeps on reading the data from xively?
import os
from flask import Flask
import xively
app = Flask(__name__)
#app.route('/')
def run_xively_script():
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
return "level is %s" %(level)
I am new to web development, heroku, and python... I would really appreciate any help(pointers)
{
PS:
I have read about Heroku Scheduler and from what I understand, it can be used to schedule a task at specific time intervals and when it does so, it starts a one-off dyno for the task. But as I mentioned, my app is really meant to perform just one function->continuously reading and storing data from xively. Is it necessary to schedule a separate task for that? And the one-off dyno that the scheduler will start will also consume dyno hours, which I think will exceed the free 750 dyno-hours limit (as my app's web dyno is already consuming 720 dyno-hours per month)...
}

Using the scheduler, as you and #Calumb have suggested, is one method to go about this.
Another method would be for you to setup a trigger on Xively. https://xively.com/dev/docs/api/metadata/triggers/
Have the trigger occur when your feed is updated. The trigger should POST to your Flask app, and the Flask app can then take the new data, manipulate it and store it as you wish. This would be the most near realtime, I'd think, because Xively is pushing the update to your system.

This question is more about high level architecture decisions and what you are trying to accomplish than a specific thing you should do.
Ultimately, Flask is probably not the best choice for an app to do what you are trying to do. You would be better off with just pure python or pure ruby. With that being said, using Heroku scheduler (which you alluded to) makes it possible to do something like what you are trying to do.
The simplest way to accomplish your goal (assuming that you want to change minimal amount of code and that constantly reading data is really what you want to do. Both of which you should consider) is to write a loop that runs when you call that task and grabs data for a few seconds. Just use a for loop and increment a counter for however many times you want to get the data.
Something like:
for i in range(0,5):
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
time.sleep(1)
However, Heroku has limits on how long something can run before it returns a value. Otherwise the router will return a 503 or 500. But you could use the scheduler to then schedule this to run every certain amount of time.
Again, I think that Flask and Heroku is not the best solution for what it sounds like you are trying to do. I would review your use case and go back to the drawing board on what the best method to accomplish it our.

Related

Handling background processes in kubernetes with python

I'm working on a project which uses kubernetes to manage a collection of flask servers and stores its data in redis. I have to run a lot of background tasks which handle and process data, and also check on the progress of that data processing. I'd like to know if there are frameworks or guides on how to do this optimally as my current setup leaves me feeling like it's suboptimal.
Here's basically how I have it set up now:
def process_data(data):
# do processing
return processed
def run_processor(data_key):
if redis_client.exists(f"{data_key}_processed", f"{data_key}_processing") > 0:
return
redis_client.set(f"{data_key}_processing", 1)
data = redis_client.get(data_key)
processed = process_data(data_key)
redis_client.set({f"{data_key}_processed": processed})
redis_client.delete(f"{data_key}_processing")
#app.route("start/data/processing/endpoint")
def handle_request():
Thread(target=run_processor, args=data_key).start()
return jsonify(successful=True)
The idea is that I can call the handle_request endpoint as many times as I want and it will only run if the data is not processed and there isn't any other process already running, regardless of which pod is running it. One flaw I've already noticed is that the process could fail and leave f'{data_key}_processing' in place. I could fix that by adding and refreshing a timeout, but it feels hacky to me. Additionally, I don't have a good way to "check in" on a process which is currently running.
If there are any useful resources or even just terms I could google the help would be much obliged.

Improving the user experiense of a slow Flask view

I have an anchor tag that hits a route which generates a report in a new tab. I am lazyloading the report specs because I don't want to have copies of my data in the original place and on the report object. But collecting that data takes 10-20 seconds.
from flask import render_template
#app.route('/report/')
#app.route('/report/<id>')
def report(id=None):
report_specs = function_that_takes_20_seconds(id)
return render_template('report.html', report_specs=report_specs)
I'm wondering what I can do so that the server responds immediately with a spinner and then when function_that_takes_20_seconds is done, load the report.
You are right: a HTTP view is not a place for a long running tasks.
You need think your system architecture: what you can prepare outside the view and what computations must happen real time in a view.
The usual solutions include adding asynchronous properties and processing your data in a separate process. Often people use schedulers like Celery for this.
Prepare data in a scheduled process which runs for e.g. every 10 minutes
Cache results by storing them in a database
HTTP view always returns the last cached version
This, or then make a your view to do an AJAX call via JavaScript ("the spinner approach"). This requires obtaining some basic JavaScript skills. However this doesn't make the results appear to an end user any faster - it's just user experience smoke and mirrors.

How can I run a script constantly in background of App Engine website?

I'm trying to use Google App Engine (Python) to make a simple web app. I want to maintain one number x in the datastore that models a random walk. I need a script running 24 hours a day that, every second, randomly chooses to either increment or decrement x (saving the change to the datastore). Users should be able to go to a url to see the current value of x.
I've thought of two ways to accomplish the constant script issue:
1) I can have an admin-access page that runs a continuous loop in javascript which, each second, makes an AJAX request to the server to update x. If I leave this page open on my computer 24 hours a day, this should work. The problem with this approach is that if my computer crashes then the script dies with it.
2) I can use a CRON job. But the interval between jobs cannot be smaller than 1 minute, so this doesn't really work.
It seems like there should be a simple way to just run a script constantly (that exists only server side) with Google App Engine.
I appreciate any advice. Thanks for your time!
Start a backend instance using Modules (either programmatically or by hitting a special URL accessible to admins only). Run the script for as long as the instance lives.
Note that an instance can die, just like your computer can crash. For this reason, you are probably better off with a Google Compute Engine instance (choose the smallest) than with an App Engine instance. Note that the Compute Engine instance will be many times cheaper.
Compute Engine instances can also fail, though it is much less likely. There are ways to create a fail-over implementation (when one instance is creating your random numbers while the other instance - which can run on some other platform - waits for the first one to fail), but this will obviously cost more.

Synchronizing data, want to track what has changed

I have a program query data from database (MySQL) every minute.
while 1:
self.alerteng.updateAndAnalyze()
time.sleep(60)
but the data doesn't change frequently; maybe once an hour or a day.(change by another C++ program)
I think the best way is track the change if a change happens then I query and update my data.
any advice?
It depends what you're doing, but SQLAlchemy's Events functionality might help you out.
It lets you run code whenever something happens in your database, i.e. after you insert a new row, or set a column value. I've used it in Flask apps to kick off notifications or other async processes.
http://docs.sqlalchemy.org/en/rel_0_7/orm/events.html#mapper-events
Here's toy code from a Flask app that'll run the kick_off_analysis() function whenever a new YourModel model is created in the database.
from sqlalchemy import event
#event.listens_for(YourModel, "after_insert")
def kick_off_analysis(mapper, connection, your_model):
# do stuff here
Hope that helps you get started.
I don't know how expensive updateAndAnalyze() is, but I'm pretty sure it's like most SQL commands: not something you really want to poll.
You have a textbook case for the Observer Pattern. You want MySQL to call something somehow in your code whenever it gets updated. I'm not positive of the exact mechanism to do this, but there should be way to set a trigger on your relevant tables where it can notify your code that the underlying table has been updated. Then, instead of polling, you basically get "interrupted" with knowledge that you need to do something. It will also eliminate that up-to-a-minute lag you're introducing, which will make whatever you're doing feel more snappy.

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

Categories