I am using a postgres-DB and a pyton script, which should be notified for DB-Changes with the postgres NOTIFY-functionality.
In all examples i can find for this topic the trigger in postgres is implemented with BEFORE like in this example:
CREATE TRIGGER notify_on_changes
BEFORE UPDATE OR INSERT OR DELETE
ON table_bla_bla
FOR EACH ROW
EXECUTE PROCEDURE notify_changes();
what is the reason for using BEFORE and not AFTER? I do not want to change anything before inserting/updating or deleting a row.
Shouldn't it be better to use AFTER?
AFTER triggers have to be queued up in memory for later execution, so are less efficient.
BEFORE triggers carry the risk that some other BEFORE trigger will modify the row after you have seen it but before it is written.
Related
Here is a typical request:
I built a DAG which updates daily from 2020-01-01. It runs an INSERT SQL query using {execution_date} as a parameter. Now I need to update the query and rerun for the past 6 months.
I found out that I have to pause Airflow process, DELETE historical data, INSERT manually and then re-activate Airflow process because Airflow catch-up does not remove historical data when I clear a run.
I'm wondering if it's possible to script the clear part so that every time I click a run, clear it from UI, Airflow runs a clear script in the background?
After some thought, I think here is a viable solution:
Instead of INSERT data in a DAG, use a DELETE query and then INSERT query.
For example, if I want to INSERT for {execution_date} - 1 (yesterday), instead of creating a DAG that just runs the INSERT query, I should first run a DELETE query that removes data of yesterday, and then INSERT the data.
By using this DELETE-INSERT method, both of my scenarios work automatically:
If it's just a normal run (i.e. no data of yesterday has been inserted yet and this is the first run of this DAG for {execution_date}), the DELETE part does nothing and INSERT inserts the data properly.
If it's a re-run, the DELETE part will purge the data already inserted, and INSERT will insert the data based on the updated script. No duplication is created.
It doesn't have to be exactly a trigger inside the database. I just want to know how I should design this, so that when changes are made inside MySQL or SQL server, some script could be triggered.
One Way would be to keep a counter on the last updated row in the database, and then you need to keep polling(Checking) the database through python for new records in short intervals.
If the value in the counter is increased then you could use the subprocess module to call another Python script.
It's possible to execute an external script from a MySql trigger, but I never used it and I don't know the implications of something like this.
MySql provides a way to implement your own functions, its called User Defined Functions. With this you can define your own functions and call them from MySql events. You need to write your own logic in a C program by following the interface provided by MySql.
Fortunately someone already did a library to call an external program from MySql: LIB_MYSQLUDF_SYS. After installing it, the following trigger should work:
CREATE TRIGGER Test_Trigger
AFTER INSERT ON MyTable
FOR EACH ROW
BEGIN
DECLARE cmd CHAR(255);
DECLARE result int(10);
SET cmd=CONCAT('/YOUR_SCRIPT');
SET result = sys_exec(cmd);
END;
I'm trying to call a stored procedure in my MSSQL database from a python script, but it does not run completely when called via python. This procedure consolidates transaction data into hour/daily blocks in a single table which is later grabbed by the python script. If I run the procedure in SQL studio, it completes just fine.
When I run it via my script, it gets cut short about 2/3's of the way through. Currently I found a work around, by making the program sleep for 10 seconds before moving on to the next SQL statement, however this is not time efficient and unreliable as some procedures may not finish in that time. I'm looking for a more elegant way to implement this.
Current Code:
cursor.execute("execute mySP")
time.sleep(10)
cursor.commit()
The most related article I can find to my issue is here:
make python wait for stored procedure to finish executing
I tried the solution using Tornado and I/O generators, but ran into the same issue as listed in the article, that was never resolved. I also tried the accepted solution to set a runningstatus field in the database by my stored procedures. At the beginnning of my SP Status is updated to 1 in RunningStatus, and when the SP finished Status is updated to 0 in RunningStatus. Then I implemented the following python code:
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
cursor.commit()
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
When I implement this, the same problem happens as before with my storedprocedure finishing executing prior to it actually being complete. If I eliminate my cursor.commit(), as follows, I end up with the connection just hanging indefinitely until I kill the python process.
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
Any assistance in finding a more efficient and reliable way to implement this, as opposed to time.sleep(10) would be appreciated.
As OP found out, inconsistent or imcomplete processing of stored procedures from application layer like Python may be due to straying from best practices of TSQL scripting.
As #AaronBetrand highlights in this Stored Procedures Best Practices Checklist blog, consider the following among other items:
Explicitly and liberally use BEGIN ... END blocks;
Use SET NOCOUNT ON to avoid messages sent to client for every row affected action, possibly interrupting workflow;
Use semicolons for statement terminators.
Example
CREATE PROCEDURE dbo.myStoredProc
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM foo;
SELECT * FROM bar;
END
GO
I'm using MySQLdb in Python.
I have an update that may succeed or fail:
UPDATE table
SET reserved_by = PID
state = "unavailable"
WHERE state = "available"
AND id = REQUESTED_ROW_ID
LIMIT 1;
As you may be able to infer, multiple processes are using the database, and I need processes to be able to securely grab rows for themselves, without race conditions causing problems.
My theory (perhaps incorrect) is that only one process will be able to succeed with this query (.rowcount=1) -- the others will fail (.rowcount=0) or get a different row (.rowcount=1).
The problem is, it appears that everything that happens through MySQLdb happens in a virtual world -- .rowcount reads =1, but you can't really know whether anything really happened, until you perform a .commit().
My questions:
In MySQL, is a single UPDATE atomic within itself? That is, if the same UPDATE above (with different PID values, but the same REQUESTED_ROW_ID) were sent to the same MySQL server at "once," am I guaranteed that one will succeed and the other will fail?
Is there a way to know, after calling "conn.commit()", whether there was a meaningful change or not?
** Can I get a reliable .rowcount for the actual commit operation?
Does the .commit operation send the actual query (SET's and WHERE conditions intact,) or does it just perform the SETs on affected rows, independent the WHERE clauses that inspired them?
Is my problem solved neatly by .autocommit?
Turn autocommit on.
The commit operation just "confirms" updates already done. The alternative is rollback, which "undoes" any updates already made.
In Python, is there a way to get notified that a specific table in a MySQL database has changed?
It's theoretically possible but I wouldn't recommend it:
Essentially you have a trigger on the the table the calls a UDF which communicates with your Python app in some way.
Pitfalls include what happens if there's an error?
What if it blocks? Anything that happens inside a trigger should ideally be near-instant.
What if it's inside a transaction that gets rolled back?
I'm sure there are many other problems that I haven't thought of as well.
A better way if possible is to have your data access layer notify the rest of your app. If you're looking for when a program outside your control modifies the database, then you may be out of luck.
Another way that's less ideal but imo better than calling an another program from within a trigger is to set some kind of "LastModified" table that gets updated by triggers with triggers. Then in your app just check whether that datetime is greater than when you last checked.
If by changed you mean if a row has been updated, deleted or inserted then there is a workaround.
You can create a trigger in MySQL
DELIMITER $$
CREATE TRIGGER ai_tablename_each AFTER INSERT ON tablename FOR EACH ROW
BEGIN
DECLARE exec_result integer;
SET exec_result = sys_exec(CONCAT('my_cmd '
,'insert on table tablename '
,',id=',new.id));
IF exec_result = 0 THEN BEGIN
INSERT INTO table_external_result (id, tablename, result)
VALUES (null, 'tablename', 0)
END; END IF;
END$$
DELIMITER ;
This will call executable script my_cmd on the server. (see sys_exec fro more info) with some parameters.
my_cmd can be a Python program or anything you can execute from the commandline using the user account that MySQL uses.
You'd have to create a trigger for every change (INSERT/UPDATE/DELETE) that you'd want your program to be notified of, and for each table.
Also you'd need to find some way of linking your running Python program to the command-line util that you call via sys_exec().
Not recommended
This sort of behaviour is not recommend because it is likely to:
slow MySQL down;
make it hang/timeout if my_cmd does not return;
if you are using transaction, you will be notified before the transaction ends;
I'm not sure if you'll get notified of a delete if the transaction rolls back;
It's an ugly design
Links
sys_exec: http://www.mysqludf.org/lib_mysqludf_sys/index.php
Yes, may not be SQL standard. But PostgreSQL supports this with LISTEN and NOTIFY since around Version 9.x
http://www.postgresql.org/docs/9.0/static/sql-notify.html
Not possible with standard SQL functionality.
It might not be a bad idea to try using a network monitor instead of a MySQL trigger. Extending a network monitor like this:
http://sourceforge.net/projects/pynetmontool/
And then writing a script that waits for activity on port 3306 (or whatever port your MySQL server listens on), and then checks the database when the network activity meets certain filter conditions.
It's a very high level idea that you'll have to research further, but you don't run into the DB trigger problems and you won't have to write a cron job that runs every second.