I'm writing a python script which connects with Oracle DB. I'm collecting specific Reference ID into a variable and then executing a Stored Procedure in a For Loop. It's working fine but its taking very long time to complete.
Here's a code:
sql = f"SELECT STATEMENT"
cursor.execute(sql)
result = cursor.fetchall()
for i in result:
cursor.callproc('DeleteStoredProcedure', [i[0]])
print("Deleted:", i[0])
The first SQL SELECT Statement collect around 600 Ref IDs but its taking around 3 mins to execute Stored Procedure which is very long if we have around 10K or more record.
BTW, the Stored Procedure is configured to delete rows from three different tables based on the reference ID. And its running quickly from Oracle Toad.
Is there any way to improve the performance?
I think you could create just one store procedure that execute the SELECT STATEMENT and do what ever DeleteStoredProcedure does.
Or, you can use threads to execute every stored procedure https://docs.python.org/3/library/threading.html
Related
On a SQL Server database, I would like to run an external (Python) script whenever a new row is inserted into a table. The Python script needs to process the data from this row.
Is a DML Trigger AFTER INSERT a save method to use here? I saw several warnings/discouragements in other questions (see, e.g., How to run a script on every insert or Trigger script when SQL Data changes). From what I understand so far, the script may fail when the INSERT is not yet commited because then the script cannot see/load the row? However, as I understand the example in https://www.mssqltips.com/sqlservertip/5909/sql-server-trigger-example/, during the execution of the trigger there exists a virtual table named inserted that holds the data being affected by the trigger execution. So technically, I should be able to pass the row that the Python script needs by retrieving it directly from this inserted table?
I am new to triggers which is why I am asking - so thank you for any clarifaction on best practices here! :)
After some testing, I found that the following trigger seems to successfully pass the row from the inserted table to an external Python (SQL Server Machine-Learning-Services) script:
CREATE TRIGGER index_new_row
ON dbo.triggertesttable
AFTER INSERT
AS
DECLARE #new_row nvarchar(max) = (SELECT * FROM inserted FOR JSON AUTO);
EXEC sp_execute_external_script #language =N'Python',
#script=N'
import pandas as pd
OutputDataSet = pd.read_json(new_row)
',
#params = N'#new_row nvarchar(max)',
#new_row = #new_row
GO
When testing this with an insert on dbo.triggertesttable, this demo Python script works like a select statement on the inserted table, so it returns all rows that were inserted.
I'm trying to call a stored procedure in my MSSQL database from a python script, but it does not run completely when called via python. This procedure consolidates transaction data into hour/daily blocks in a single table which is later grabbed by the python script. If I run the procedure in SQL studio, it completes just fine.
When I run it via my script, it gets cut short about 2/3's of the way through. Currently I found a work around, by making the program sleep for 10 seconds before moving on to the next SQL statement, however this is not time efficient and unreliable as some procedures may not finish in that time. I'm looking for a more elegant way to implement this.
Current Code:
cursor.execute("execute mySP")
time.sleep(10)
cursor.commit()
The most related article I can find to my issue is here:
make python wait for stored procedure to finish executing
I tried the solution using Tornado and I/O generators, but ran into the same issue as listed in the article, that was never resolved. I also tried the accepted solution to set a runningstatus field in the database by my stored procedures. At the beginnning of my SP Status is updated to 1 in RunningStatus, and when the SP finished Status is updated to 0 in RunningStatus. Then I implemented the following python code:
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
cursor.commit()
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
When I implement this, the same problem happens as before with my storedprocedure finishing executing prior to it actually being complete. If I eliminate my cursor.commit(), as follows, I end up with the connection just hanging indefinitely until I kill the python process.
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
Any assistance in finding a more efficient and reliable way to implement this, as opposed to time.sleep(10) would be appreciated.
As OP found out, inconsistent or imcomplete processing of stored procedures from application layer like Python may be due to straying from best practices of TSQL scripting.
As #AaronBetrand highlights in this Stored Procedures Best Practices Checklist blog, consider the following among other items:
Explicitly and liberally use BEGIN ... END blocks;
Use SET NOCOUNT ON to avoid messages sent to client for every row affected action, possibly interrupting workflow;
Use semicolons for statement terminators.
Example
CREATE PROCEDURE dbo.myStoredProc
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM foo;
SELECT * FROM bar;
END
GO
I am using MobaExterm to run my python script.
The script is fetching records from 3 tables. I can see the output of my query in MySQL Workbench but when the same query runs in my script, i get output as "Killed"
What is the reason. My query seems correct.
select tsp.data_ip, tsp.IP, tvp.vm_d_ip, tvp.IP FROM cmdb.t_server tsp,cmdb.t_vm tvp,t_ip ip where tvp.SERIALNUMBER= 'AD123' or tsp.SERIALNUMBER= 'AD123' and (ip.ip=tsp.d_ip or ip.ip=tsp.IP or ip.ip=tvp.dip or ip.ip=tvp.IP);
The reason why this happens in python script is because of too many records.
The record exceeds the time to wait of the script while its running and kills it.
As seen in the select query, it is querying three tables at the same time with where clause mentioning multiple conditions in 'and', 'or'.
Joins should be used instead.
I have a python script that is querying HiveServer2 using pyhs2, like so:
import pyhs2;
conn = pyhs2.connect(host=localhost,
port=10000,
user='user',
password='password',
database='default');
cur = conn.cursor();
cur.execute("SELECT name,data,number,time FROM table WHERE date = '2014-01-01' AND number in (1,5,6,22) ORDER BY name,time ASC");
line = cur.fetchone();
while line is not None:
<do some processing, including writing to stdout>
.
.
.
line = cur.fetchone();
I have also tried using fetchall() instead of fetchone(), but that just seems to hang forever.
My query runs just fine and returns ~270 million rows. For testing, I dumped the output from Hive into a flat, tab-delimited file and wrote the guts of my python script against that, so I didn't have to wait for the query to finish everytime I ran. My script that reads the flat file will finish in ~20 minutes. What confuses me is that I don't see that same performance when I directly query Hive. In fact, it takes about 5 times longer to finish processing. I am pretty new to Hive, and python so maybe I am making some bone-headed error, but examples that I see online show a set up such as this. I just want to iterate through my Hive return, getting one row at a time as quickly as possible, much like I did using my flat file. Any suggestions?
P.S. I have found this question that sounds similar:
Python slow on fetchone, hangs on fetchall
but that ended up being a SQLite issue, and I have no control over my Hive set up.
Have you considered using fetchmany().
That would be the DBAPI answer for pulling data in chunks (bigger one, where the overhead is an issue, and smaller than all rows, where memory is an issue).
I've got a postgresql-query that returns 120 rows {integer, boolean, integer, varchar(255), varchar(255), bigint, text} in about 70ms when done in the database running psql.
Using python/django with django.db.connection.cursor.execute() it takes 10s to run, on the same machine.
I've tried putting all the rows into an array, and a single string (18k characters, but returning only the first 500 takes the same time) so there is only one row returned but with no gain.
Any ideas as to why there is such a dramatic slowdown in running a query from within python and in the db?
EDIT
I had to increase the work_mem to get the function running timely in psql. Other functions/queries don't show the same pattern, the difference between psql and python is only a few milliseconds.
EDIT
Cutting down the work_mem to 1MB shows similar numbers in psql and the django shell. Could it be that django is not going by the memory set in work_mem?
EDIT
Ugh. The problem was that the work_mem set in psql is not valid globally, if I set the memory in the function, the call is timely. I suppose setting this in the configuration file would work globally.
If the timing between "in situ" queries and psql queries differs much then the first and usual suspect is this: If the framework uses prepared statements, then you have to check the timing in psql using prepared statements too. For example:
prepare foo as select * from sometable where intcolumn = $1;
execute foo(42);
If the timing of the execute is in the same ballpark as your in situ query, then you can explain and explain analyse the execute line.
If the timing is not in the same ballpark you have to look for something else.