Python code doesn't run the SQL stored procedure completely - python

I am not proficient in Python but I have written a python code that executes a stored procedure (SQL server) which within it contains multiple stored procedures therefore it usually takes 5 mins or so to run on SSMS.
I can see the stored procedure runs halfway through without error when I run the Python code which makes me think that somehow it needs more time to execute when coding in python.
I found other posts where people suggested subprocess but I don't know how to code this. Below is an example of a (not mine) python code to execute the stored procedure.
mydb_lock = pyodbc.connect('Driver={SQL Server Native Client 11.0};'
'Server=localhost;'
'Database=InterelRMS;'
'Trusted_Connection=yes;'
'MARS_Connection=yes;'
'user=sa;'
'password=Passw0rd;')
mycursor_lock = mydb_lock.cursor()
sql_nodes = "Exec IVRP_Nodes"
mycursor_lock.execute(sql_nodes)
mydb_lock.commit()
How can I edit the above code to use the subprocess? Is the subprocess the right choice? Any other method you can suggest?
Many thanks.
Python 2.7 and 3
SQL Server
UPDATE 04/04/2022:
#AlwaysLearning, I tried
NEWcnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password+';Connection Timeout=0')
But there was no change. What I noticed is that to check how much of the code it executes, I inserted the following two lines of code right after each other somewhere in the nested procedure where I thought the SP stopped.
INSERT INTO CheckTable (OrgID,Stage,Created) VALUES(#OrgID,2.5331,getdate())
INSERT INTO CheckTable (OrgID,Stage,Created) VALUES(#OrgID,2.5332,getdate())
Only the first query is completed. I use Azure DB if that helps.
UPDATE 05/04/2022:
I tried what #AlwaysLearning suggested, after my connection, I added, NEWconxn.timeout=4000 and it's working now

I tried what #AlwaysLearning suggested, after my connection, I added, NEWconxn.timeout=4000 and it's working now. Many thanks.

Related

How to implement 'sqlite3_busy_timeout' in python?

I'm trying to run a python script with multiple threads, but I'm getting the following error:
sqlite3.OperationalError: database is locked
I've found out that I need to extend the sqlite3_busy_timeout to make it wait a bit longer before writing to the database.
The code used for this looks like the following:
'db.configure("busyTimeout", 10000)' //This should make it wait for 10 seconds.
What I want to know is how do I implement this code? Where should I place it, before or after the SQLite command? also, do I have to write anything before it? like c.execute("code")?
You can set the timeout with the busy_timeout pragma. Here is an example setting the busy timeout to 10 seconds:
import sqlite3
with sqlite3.connect('example.db') as db
db.execute('pragma busy_timeout=10000')
# do more work with db...

pyodbc connection.close() very slow with Access Database

I am using pyodbc to open a connection to a Microsoft Access database file (.accdb) and run a SELECT query to create a pandas dataframe. I acquired the ODBC drivers from here. I have no issues with retrieving and manipulating the data but closing the connection at the end of my code can take 10-15 seconds which is irritating. Omitting conn.close() fixes my problem and prior research indicates that it's not critical to close my connection as I am the only one accessing the file. However, my concern is that I may have unexpected hangups down the road when I integrate the code into my tkinter gui.
import pyodbc
import pandas as pd
import time
start = time.time()
db_fpath = 'C:\mydb.accdb'
conn = pyodbc.connect(r'Driver={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={0};'.format(db_fpath))
pd.read_sql_query('SELECT * FROM table',conn)
conn.close()
print(time.time()-start)
I get the following results:
15.27361798286438 # with conn.close()
0.4076552391052246 # without conn.close()
If I wrap the code in a function but omit the call to conn.close(), I also encounter a hangup which makes me believe that whenever I release the connection from memory, it will cause a slowdown.
If I omit the SQL query (open the connection and then close it without doing anything), there is no hangup.
Can anyone duplicate my issue?
Is there something I can change about my connection or method of closing to avoid the slowdown?
EDIT: After further investigation, Dropbox was not the issue. I am actually using pd.read_sql_query() 3 times in succession using conn to import 3 different tables from my database. Trying different combinations of closing/not closing the connection and reading different tables (and restarting the kernel between tests), I determined that only when I read one specific table can I cause the connection closing to take significantly longer. Without understanding the intricacies of the ODBC driver or what's different about that table, I'm not sure I can do anything more. I would upload my database for others to try but it contains sensitive information. I also tried switching to pypyodbc to no effect. I think the original source of the table was actually a much older .mdb Access Database so maybe remaking that table from scratch would solve my issue.
At this point, I think the simplest solution is just to maintain the connection object in memory always to avoid closing it. My initial testing indicates this will work out although it is a bit of a pain.

SQL Stored Procedures not finishing when called from Python

I'm trying to call a stored procedure in my MSSQL database from a python script, but it does not run completely when called via python. This procedure consolidates transaction data into hour/daily blocks in a single table which is later grabbed by the python script. If I run the procedure in SQL studio, it completes just fine.
When I run it via my script, it gets cut short about 2/3's of the way through. Currently I found a work around, by making the program sleep for 10 seconds before moving on to the next SQL statement, however this is not time efficient and unreliable as some procedures may not finish in that time. I'm looking for a more elegant way to implement this.
Current Code:
cursor.execute("execute mySP")
time.sleep(10)
cursor.commit()
The most related article I can find to my issue is here:
make python wait for stored procedure to finish executing
I tried the solution using Tornado and I/O generators, but ran into the same issue as listed in the article, that was never resolved. I also tried the accepted solution to set a runningstatus field in the database by my stored procedures. At the beginnning of my SP Status is updated to 1 in RunningStatus, and when the SP finished Status is updated to 0 in RunningStatus. Then I implemented the following python code:
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
cursor.commit()
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
When I implement this, the same problem happens as before with my storedprocedure finishing executing prior to it actually being complete. If I eliminate my cursor.commit(), as follows, I end up with the connection just hanging indefinitely until I kill the python process.
conn=pyodbc_connect(conn_str)
cursor=conn.cursor()
sconn=pyodbc_connect(conn_str)
scursor=sconn.cursor()
cursor.execute("execute mySP")
while 1:
q=scursor.execute("SELECT Status FROM RunningStatus").fetchone()
if(q[0]==0):
break
Any assistance in finding a more efficient and reliable way to implement this, as opposed to time.sleep(10) would be appreciated.
As OP found out, inconsistent or imcomplete processing of stored procedures from application layer like Python may be due to straying from best practices of TSQL scripting.
As #AaronBetrand highlights in this Stored Procedures Best Practices Checklist blog, consider the following among other items:
Explicitly and liberally use BEGIN ... END blocks;
Use SET NOCOUNT ON to avoid messages sent to client for every row affected action, possibly interrupting workflow;
Use semicolons for statement terminators.
Example
CREATE PROCEDURE dbo.myStoredProc
AS
BEGIN
SET NOCOUNT ON;
SELECT * FROM foo;
SELECT * FROM bar;
END
GO

PyODBC execute stored procedure does not complete

I have the following code, and the stored procedure is used to call several stored procedures. I can run the stored procedure and it will complete without issues in SQL 2012. I am using Python 3.3.
cnxn = pyodbc.connect('DRIVER={SQL Server};Server=.\SQLEXPRESS;Database=MyDatabase;Trusted_Connection=yes;')
cursor = cnxn.cursor()
cnxn.timeout = 0
cnxn.autocommit = True
cursor.execute("""exec my_SP""")
The python code is executing, I have determined this from inserting numerous prints.
I did see the other question regarding python waiting for the SP to finish. I tried adding a 'time.sleep()' after the execute, and varying the time (up to 120 seconds) no change.
The stored procedure appears to be partially executing, based on the results. The data suggests that it is even interrupting one of the sub-stored procedures, yet it is fine when the SP is run from query analyzer.
My best guess would be that this is something SQL config related, but I am lost in where to look.
Any thoughts?
Adding SET NOCOUNT OFF to my proc worked for me.
I had the same issue and solved it with a combination of setting a locking variable (see answer from Ben Caine in this thread: make python wait for stored procedure to finish executing) and adding
"SET NOCOUNT ON"
after "CREATE PROCEDURE ... AS"
Just a follow up; I have had limited success using the time features located at the link below, and reducing the level of nesting stored procedures.
At the level that I was calling in the above, there were 4 layers of nested SP's; pyodbc seems to behave a little better when you have 3 layers or less. Doesn't make a lot of sense to me, but it works.
make python wait for stored procedure to finish executing
Any input on the rationale behind this would be greatly appreciated.

Python/Hive interface slow with fetchone(), hangs with fetchall()

I have a python script that is querying HiveServer2 using pyhs2, like so:
import pyhs2;
conn = pyhs2.connect(host=localhost,
port=10000,
user='user',
password='password',
database='default');
cur = conn.cursor();
cur.execute("SELECT name,data,number,time FROM table WHERE date = '2014-01-01' AND number in (1,5,6,22) ORDER BY name,time ASC");
line = cur.fetchone();
while line is not None:
<do some processing, including writing to stdout>
.
.
.
line = cur.fetchone();
I have also tried using fetchall() instead of fetchone(), but that just seems to hang forever.
My query runs just fine and returns ~270 million rows. For testing, I dumped the output from Hive into a flat, tab-delimited file and wrote the guts of my python script against that, so I didn't have to wait for the query to finish everytime I ran. My script that reads the flat file will finish in ~20 minutes. What confuses me is that I don't see that same performance when I directly query Hive. In fact, it takes about 5 times longer to finish processing. I am pretty new to Hive, and python so maybe I am making some bone-headed error, but examples that I see online show a set up such as this. I just want to iterate through my Hive return, getting one row at a time as quickly as possible, much like I did using my flat file. Any suggestions?
P.S. I have found this question that sounds similar:
Python slow on fetchone, hangs on fetchall
but that ended up being a SQLite issue, and I have no control over my Hive set up.
Have you considered using fetchmany().
That would be the DBAPI answer for pulling data in chunks (bigger one, where the overhead is an issue, and smaller than all rows, where memory is an issue).

Categories