On a SQL Server database, I would like to run an external (Python) script whenever a new row is inserted into a table. The Python script needs to process the data from this row.
Is a DML Trigger AFTER INSERT a save method to use here? I saw several warnings/discouragements in other questions (see, e.g., How to run a script on every insert or Trigger script when SQL Data changes). From what I understand so far, the script may fail when the INSERT is not yet commited because then the script cannot see/load the row? However, as I understand the example in https://www.mssqltips.com/sqlservertip/5909/sql-server-trigger-example/, during the execution of the trigger there exists a virtual table named inserted that holds the data being affected by the trigger execution. So technically, I should be able to pass the row that the Python script needs by retrieving it directly from this inserted table?
I am new to triggers which is why I am asking - so thank you for any clarifaction on best practices here! :)
After some testing, I found that the following trigger seems to successfully pass the row from the inserted table to an external Python (SQL Server Machine-Learning-Services) script:
CREATE TRIGGER index_new_row
ON dbo.triggertesttable
AFTER INSERT
AS
DECLARE #new_row nvarchar(max) = (SELECT * FROM inserted FOR JSON AUTO);
EXEC sp_execute_external_script #language =N'Python',
#script=N'
import pandas as pd
OutputDataSet = pd.read_json(new_row)
',
#params = N'#new_row nvarchar(max)',
#new_row = #new_row
GO
When testing this with an insert on dbo.triggertesttable, this demo Python script works like a select statement on the inserted table, so it returns all rows that were inserted.
Related
I'm writing a python script which connects with Oracle DB. I'm collecting specific Reference ID into a variable and then executing a Stored Procedure in a For Loop. It's working fine but its taking very long time to complete.
Here's a code:
sql = f"SELECT STATEMENT"
cursor.execute(sql)
result = cursor.fetchall()
for i in result:
cursor.callproc('DeleteStoredProcedure', [i[0]])
print("Deleted:", i[0])
The first SQL SELECT Statement collect around 600 Ref IDs but its taking around 3 mins to execute Stored Procedure which is very long if we have around 10K or more record.
BTW, the Stored Procedure is configured to delete rows from three different tables based on the reference ID. And its running quickly from Oracle Toad.
Is there any way to improve the performance?
I think you could create just one store procedure that execute the SELECT STATEMENT and do what ever DeleteStoredProcedure does.
Or, you can use threads to execute every stored procedure https://docs.python.org/3/library/threading.html
Here is a typical request:
I built a DAG which updates daily from 2020-01-01. It runs an INSERT SQL query using {execution_date} as a parameter. Now I need to update the query and rerun for the past 6 months.
I found out that I have to pause Airflow process, DELETE historical data, INSERT manually and then re-activate Airflow process because Airflow catch-up does not remove historical data when I clear a run.
I'm wondering if it's possible to script the clear part so that every time I click a run, clear it from UI, Airflow runs a clear script in the background?
After some thought, I think here is a viable solution:
Instead of INSERT data in a DAG, use a DELETE query and then INSERT query.
For example, if I want to INSERT for {execution_date} - 1 (yesterday), instead of creating a DAG that just runs the INSERT query, I should first run a DELETE query that removes data of yesterday, and then INSERT the data.
By using this DELETE-INSERT method, both of my scenarios work automatically:
If it's just a normal run (i.e. no data of yesterday has been inserted yet and this is the first run of this DAG for {execution_date}), the DELETE part does nothing and INSERT inserts the data properly.
If it's a re-run, the DELETE part will purge the data already inserted, and INSERT will insert the data based on the updated script. No duplication is created.
It doesn't have to be exactly a trigger inside the database. I just want to know how I should design this, so that when changes are made inside MySQL or SQL server, some script could be triggered.
One Way would be to keep a counter on the last updated row in the database, and then you need to keep polling(Checking) the database through python for new records in short intervals.
If the value in the counter is increased then you could use the subprocess module to call another Python script.
It's possible to execute an external script from a MySql trigger, but I never used it and I don't know the implications of something like this.
MySql provides a way to implement your own functions, its called User Defined Functions. With this you can define your own functions and call them from MySql events. You need to write your own logic in a C program by following the interface provided by MySql.
Fortunately someone already did a library to call an external program from MySql: LIB_MYSQLUDF_SYS. After installing it, the following trigger should work:
CREATE TRIGGER Test_Trigger
AFTER INSERT ON MyTable
FOR EACH ROW
BEGIN
DECLARE cmd CHAR(255);
DECLARE result int(10);
SET cmd=CONCAT('/YOUR_SCRIPT');
SET result = sys_exec(cmd);
END;
I have a python script to execute a stored procedure to purge the tables in database. This SP further calls another SP which has delete statements for each table in database. Something like below -
Python calls - Stored procedure Purge_DB
Purge_DB calls - Stored procedure Purge_Table
Purge_Table has definition to delete data from each table.
When I run this python script, the transaction logs increase exponentially and on running this script 2-3 times, I get the transaction log full error.
Please note that the deletion happens in transaction.
BEGIN TRAN
EXEC (#DEL_SQL)
COMMIT TRAN
Earlier I was executing the same SP using VB script and never got any issue related to transaction log.
Is there a different way that Python uses to create transaction log?
Why is the log size much bigger with Python than VB script?
This is resolved now.
Python starts a transaction when execute method is called and that transaction remains open until we explicitly call commit() method. Since, this purge SP was called for more than 100 tables, the transaction log was populated until transaction was closed in the python code and hence, it was getting full because of this job.
I have set the autocommit property of pyodbc to true which will now automatically commit each SQL statement as and when it is executed as part of that connection. Please refer to the documentation here -
https://github.com/mkleehammer/pyodbc/wiki/Database-Transaction-Management
In Python, is there a way to get notified that a specific table in a MySQL database has changed?
It's theoretically possible but I wouldn't recommend it:
Essentially you have a trigger on the the table the calls a UDF which communicates with your Python app in some way.
Pitfalls include what happens if there's an error?
What if it blocks? Anything that happens inside a trigger should ideally be near-instant.
What if it's inside a transaction that gets rolled back?
I'm sure there are many other problems that I haven't thought of as well.
A better way if possible is to have your data access layer notify the rest of your app. If you're looking for when a program outside your control modifies the database, then you may be out of luck.
Another way that's less ideal but imo better than calling an another program from within a trigger is to set some kind of "LastModified" table that gets updated by triggers with triggers. Then in your app just check whether that datetime is greater than when you last checked.
If by changed you mean if a row has been updated, deleted or inserted then there is a workaround.
You can create a trigger in MySQL
DELIMITER $$
CREATE TRIGGER ai_tablename_each AFTER INSERT ON tablename FOR EACH ROW
BEGIN
DECLARE exec_result integer;
SET exec_result = sys_exec(CONCAT('my_cmd '
,'insert on table tablename '
,',id=',new.id));
IF exec_result = 0 THEN BEGIN
INSERT INTO table_external_result (id, tablename, result)
VALUES (null, 'tablename', 0)
END; END IF;
END$$
DELIMITER ;
This will call executable script my_cmd on the server. (see sys_exec fro more info) with some parameters.
my_cmd can be a Python program or anything you can execute from the commandline using the user account that MySQL uses.
You'd have to create a trigger for every change (INSERT/UPDATE/DELETE) that you'd want your program to be notified of, and for each table.
Also you'd need to find some way of linking your running Python program to the command-line util that you call via sys_exec().
Not recommended
This sort of behaviour is not recommend because it is likely to:
slow MySQL down;
make it hang/timeout if my_cmd does not return;
if you are using transaction, you will be notified before the transaction ends;
I'm not sure if you'll get notified of a delete if the transaction rolls back;
It's an ugly design
Links
sys_exec: http://www.mysqludf.org/lib_mysqludf_sys/index.php
Yes, may not be SQL standard. But PostgreSQL supports this with LISTEN and NOTIFY since around Version 9.x
http://www.postgresql.org/docs/9.0/static/sql-notify.html
Not possible with standard SQL functionality.
It might not be a bad idea to try using a network monitor instead of a MySQL trigger. Extending a network monitor like this:
http://sourceforge.net/projects/pynetmontool/
And then writing a script that waits for activity on port 3306 (or whatever port your MySQL server listens on), and then checks the database when the network activity meets certain filter conditions.
It's a very high level idea that you'll have to research further, but you don't run into the DB trigger problems and you won't have to write a cron job that runs every second.