Synchronizing data, want to track what has changed - python

I have a program query data from database (MySQL) every minute.
while 1:
self.alerteng.updateAndAnalyze()
time.sleep(60)
but the data doesn't change frequently; maybe once an hour or a day.(change by another C++ program)
I think the best way is track the change if a change happens then I query and update my data.
any advice?

It depends what you're doing, but SQLAlchemy's Events functionality might help you out.
It lets you run code whenever something happens in your database, i.e. after you insert a new row, or set a column value. I've used it in Flask apps to kick off notifications or other async processes.
http://docs.sqlalchemy.org/en/rel_0_7/orm/events.html#mapper-events
Here's toy code from a Flask app that'll run the kick_off_analysis() function whenever a new YourModel model is created in the database.
from sqlalchemy import event
#event.listens_for(YourModel, "after_insert")
def kick_off_analysis(mapper, connection, your_model):
# do stuff here
Hope that helps you get started.

I don't know how expensive updateAndAnalyze() is, but I'm pretty sure it's like most SQL commands: not something you really want to poll.
You have a textbook case for the Observer Pattern. You want MySQL to call something somehow in your code whenever it gets updated. I'm not positive of the exact mechanism to do this, but there should be way to set a trigger on your relevant tables where it can notify your code that the underlying table has been updated. Then, instead of polling, you basically get "interrupted" with knowledge that you need to do something. It will also eliminate that up-to-a-minute lag you're introducing, which will make whatever you're doing feel more snappy.

Related

Best way to constantly check for scheduled events on a website

So I am making a website, and something that required for part of the security is having a waiting period when trying to do something, for example trying to delete something, this would help incase someone's account was stolen and someone tried to ruin their account.
I'm already using SQLite so I'm going to create a table in there where scheduled events will be defined.
What I'm wondering is what is the best way to constantly check these scheduled events, it may also be important to note I want to check at least every hour. My immediate thought was creating a separate thread and running a function on there with a while loop in it which will constantly run a chunk of code with a time.sleep(3600) at the end of the function, like this:
def check_events(self):
while True:
# code
time.sleep(3600)
I'm not sure though if this is the most efficient way of doing it.
That function currently is inside my website code class hence the self, is that something I need to put on the outside or no?
I would either create a cron job on your server (which is the most straightforward)
or use a schedule module to schedule your task, see example:
import time
import schedule
from sharepoint_cleaner import main as cleaner
from sharepoint_uploader import main as uploader
from transfer_statistics import main as transfer_stats
schedule.every(1).hours.do(uploader)
schedule.every(1).hours.do(transfer_stats)
schedule.every().sunday.do(cleaner)
while True:
schedule.run_pending()
time.sleep(10)
https://github.com/ansys/automatic-installer/blob/4d59573f8623c838aadfd49c312eeaca964c6601/sharepoint/scheduler.py#L3

Is it possible to trigger a script or program if any data is updated in a database, like MySQL?

It doesn't have to be exactly a trigger inside the database. I just want to know how I should design this, so that when changes are made inside MySQL or SQL server, some script could be triggered.
One Way would be to keep a counter on the last updated row in the database, and then you need to keep polling(Checking) the database through python for new records in short intervals.
If the value in the counter is increased then you could use the subprocess module to call another Python script.
It's possible to execute an external script from a MySql trigger, but I never used it and I don't know the implications of something like this.
MySql provides a way to implement your own functions, its called User Defined Functions. With this you can define your own functions and call them from MySql events. You need to write your own logic in a C program by following the interface provided by MySql.
Fortunately someone already did a library to call an external program from MySql: LIB_MYSQLUDF_SYS. After installing it, the following trigger should work:
CREATE TRIGGER Test_Trigger
AFTER INSERT ON MyTable
FOR EACH ROW
BEGIN
DECLARE cmd CHAR(255);
DECLARE result int(10);
SET cmd=CONCAT('/YOUR_SCRIPT');
SET result = sys_exec(cmd);
END;

How to continuously read data from xively in a (python) heroku app?

I am trying to write a Heroku app in python which will read and store data from a xively feed in real time. I want the app to run independently as a sort of 'backend process' to simply store the data in a database. (It does not need to 'serve up' anything for users (for site visitors).)
Right now I am working on the 'continuous reading' part. I have included my code below. It simply reads the datastream once, each time I hit my app's Heroku URL. How do I get it to operate continuously so that it keeps on reading the data from xively?
import os
from flask import Flask
import xively
app = Flask(__name__)
#app.route('/')
def run_xively_script():
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
return "level is %s" %(level)
I am new to web development, heroku, and python... I would really appreciate any help(pointers)
{
PS:
I have read about Heroku Scheduler and from what I understand, it can be used to schedule a task at specific time intervals and when it does so, it starts a one-off dyno for the task. But as I mentioned, my app is really meant to perform just one function->continuously reading and storing data from xively. Is it necessary to schedule a separate task for that? And the one-off dyno that the scheduler will start will also consume dyno hours, which I think will exceed the free 750 dyno-hours limit (as my app's web dyno is already consuming 720 dyno-hours per month)...
}
Using the scheduler, as you and #Calumb have suggested, is one method to go about this.
Another method would be for you to setup a trigger on Xively. https://xively.com/dev/docs/api/metadata/triggers/
Have the trigger occur when your feed is updated. The trigger should POST to your Flask app, and the Flask app can then take the new data, manipulate it and store it as you wish. This would be the most near realtime, I'd think, because Xively is pushing the update to your system.
This question is more about high level architecture decisions and what you are trying to accomplish than a specific thing you should do.
Ultimately, Flask is probably not the best choice for an app to do what you are trying to do. You would be better off with just pure python or pure ruby. With that being said, using Heroku scheduler (which you alluded to) makes it possible to do something like what you are trying to do.
The simplest way to accomplish your goal (assuming that you want to change minimal amount of code and that constantly reading data is really what you want to do. Both of which you should consider) is to write a loop that runs when you call that task and grabs data for a few seconds. Just use a for loop and increment a counter for however many times you want to get the data.
Something like:
for i in range(0,5):
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
time.sleep(1)
However, Heroku has limits on how long something can run before it returns a value. Otherwise the router will return a 503 or 500. But you could use the scheduler to then schedule this to run every certain amount of time.
Again, I think that Flask and Heroku is not the best solution for what it sounds like you are trying to do. I would review your use case and go back to the drawing board on what the best method to accomplish it our.

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.
So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

python: how to get notifications for mysql database changes?

In Python, is there a way to get notified that a specific table in a MySQL database has changed?
It's theoretically possible but I wouldn't recommend it:
Essentially you have a trigger on the the table the calls a UDF which communicates with your Python app in some way.
Pitfalls include what happens if there's an error?
What if it blocks? Anything that happens inside a trigger should ideally be near-instant.
What if it's inside a transaction that gets rolled back?
I'm sure there are many other problems that I haven't thought of as well.
A better way if possible is to have your data access layer notify the rest of your app. If you're looking for when a program outside your control modifies the database, then you may be out of luck.
Another way that's less ideal but imo better than calling an another program from within a trigger is to set some kind of "LastModified" table that gets updated by triggers with triggers. Then in your app just check whether that datetime is greater than when you last checked.
If by changed you mean if a row has been updated, deleted or inserted then there is a workaround.
You can create a trigger in MySQL
DELIMITER $$
CREATE TRIGGER ai_tablename_each AFTER INSERT ON tablename FOR EACH ROW
BEGIN
DECLARE exec_result integer;
SET exec_result = sys_exec(CONCAT('my_cmd '
,'insert on table tablename '
,',id=',new.id));
IF exec_result = 0 THEN BEGIN
INSERT INTO table_external_result (id, tablename, result)
VALUES (null, 'tablename', 0)
END; END IF;
END$$
DELIMITER ;
This will call executable script my_cmd on the server. (see sys_exec fro more info) with some parameters.
my_cmd can be a Python program or anything you can execute from the commandline using the user account that MySQL uses.
You'd have to create a trigger for every change (INSERT/UPDATE/DELETE) that you'd want your program to be notified of, and for each table.
Also you'd need to find some way of linking your running Python program to the command-line util that you call via sys_exec().
Not recommended
This sort of behaviour is not recommend because it is likely to:
slow MySQL down;
make it hang/timeout if my_cmd does not return;
if you are using transaction, you will be notified before the transaction ends;
I'm not sure if you'll get notified of a delete if the transaction rolls back;
It's an ugly design
Links
sys_exec: http://www.mysqludf.org/lib_mysqludf_sys/index.php
Yes, may not be SQL standard. But PostgreSQL supports this with LISTEN and NOTIFY since around Version 9.x
http://www.postgresql.org/docs/9.0/static/sql-notify.html
Not possible with standard SQL functionality.
It might not be a bad idea to try using a network monitor instead of a MySQL trigger. Extending a network monitor like this:
http://sourceforge.net/projects/pynetmontool/
And then writing a script that waits for activity on port 3306 (or whatever port your MySQL server listens on), and then checks the database when the network activity meets certain filter conditions.
It's a very high level idea that you'll have to research further, but you don't run into the DB trigger problems and you won't have to write a cron job that runs every second.

Categories