pyodbc connection.close() very slow with Access Database - python

I am using pyodbc to open a connection to a Microsoft Access database file (.accdb) and run a SELECT query to create a pandas dataframe. I acquired the ODBC drivers from here. I have no issues with retrieving and manipulating the data but closing the connection at the end of my code can take 10-15 seconds which is irritating. Omitting conn.close() fixes my problem and prior research indicates that it's not critical to close my connection as I am the only one accessing the file. However, my concern is that I may have unexpected hangups down the road when I integrate the code into my tkinter gui.
import pyodbc
import pandas as pd
import time
start = time.time()
db_fpath = 'C:\mydb.accdb'
conn = pyodbc.connect(r'Driver={{Microsoft Access Driver (*.mdb, *.accdb)}};DBQ={0};'.format(db_fpath))
pd.read_sql_query('SELECT * FROM table',conn)
conn.close()
print(time.time()-start)
I get the following results:
15.27361798286438 # with conn.close()
0.4076552391052246 # without conn.close()
If I wrap the code in a function but omit the call to conn.close(), I also encounter a hangup which makes me believe that whenever I release the connection from memory, it will cause a slowdown.
If I omit the SQL query (open the connection and then close it without doing anything), there is no hangup.
Can anyone duplicate my issue?
Is there something I can change about my connection or method of closing to avoid the slowdown?
EDIT: After further investigation, Dropbox was not the issue. I am actually using pd.read_sql_query() 3 times in succession using conn to import 3 different tables from my database. Trying different combinations of closing/not closing the connection and reading different tables (and restarting the kernel between tests), I determined that only when I read one specific table can I cause the connection closing to take significantly longer. Without understanding the intricacies of the ODBC driver or what's different about that table, I'm not sure I can do anything more. I would upload my database for others to try but it contains sensitive information. I also tried switching to pypyodbc to no effect. I think the original source of the table was actually a much older .mdb Access Database so maybe remaking that table from scratch would solve my issue.
At this point, I think the simplest solution is just to maintain the connection object in memory always to avoid closing it. My initial testing indicates this will work out although it is a bit of a pain.

Related

Python code doesn't run the SQL stored procedure completely

I am not proficient in Python but I have written a python code that executes a stored procedure (SQL server) which within it contains multiple stored procedures therefore it usually takes 5 mins or so to run on SSMS.
I can see the stored procedure runs halfway through without error when I run the Python code which makes me think that somehow it needs more time to execute when coding in python.
I found other posts where people suggested subprocess but I don't know how to code this. Below is an example of a (not mine) python code to execute the stored procedure.
mydb_lock = pyodbc.connect('Driver={SQL Server Native Client 11.0};'
'Server=localhost;'
'Database=InterelRMS;'
'Trusted_Connection=yes;'
'MARS_Connection=yes;'
'user=sa;'
'password=Passw0rd;')
mycursor_lock = mydb_lock.cursor()
sql_nodes = "Exec IVRP_Nodes"
mycursor_lock.execute(sql_nodes)
mydb_lock.commit()
How can I edit the above code to use the subprocess? Is the subprocess the right choice? Any other method you can suggest?
Many thanks.
Python 2.7 and 3
SQL Server
UPDATE 04/04/2022:
#AlwaysLearning, I tried
NEWcnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password+';Connection Timeout=0')
But there was no change. What I noticed is that to check how much of the code it executes, I inserted the following two lines of code right after each other somewhere in the nested procedure where I thought the SP stopped.
INSERT INTO CheckTable (OrgID,Stage,Created) VALUES(#OrgID,2.5331,getdate())
INSERT INTO CheckTable (OrgID,Stage,Created) VALUES(#OrgID,2.5332,getdate())
Only the first query is completed. I use Azure DB if that helps.
UPDATE 05/04/2022:
I tried what #AlwaysLearning suggested, after my connection, I added, NEWconxn.timeout=4000 and it's working now
I tried what #AlwaysLearning suggested, after my connection, I added, NEWconxn.timeout=4000 and it's working now. Many thanks.

What is the proper way of using MySQLdb connections and cursors across multiple functions in Python

I'm kind of new to Python and its MySQLdb connector.
I'm writing an API to return some data from a database using the RESTful approach. In PHP, I wrapped the Connection management part in a class, acting as an abstraction layer for MySQL queries.
In Python:
I define the connection early on in the script: con = mdb.connect('localhost', 'user', 'passwd', 'dbname')
Then, in all subsequent methods:
import MySQLdb as mdb
def insert_func():
with con:
cur = con.cursor(mdb.cursors.DictCursor)
cur.execute("INSERT INTO table (col1, col2, col3) VALUES (%s, %s, %s)", (val1, val2, val3) )
rows = cur.fetchall()
#do something with the results
return someval
etc.
I use mdb.cursors.DictCursor because I prefer to be able to access database columns in an associative array manner.
Now the problems start popping up:
in one function, I issue an insert query to create a 'group' with unique 'groupid'.
This 'group' has a creator. Every user in the database holds a JSON array in the 'groups' column of his/her row in the table.
So when I create a new group, I want to assign the groupid to the user that created it.
I update the user's record using a similar function.
I've wrapped the 'insert' and 'update' parts in two separate function defs.
The first time I run the script, everything works fine.
The second time I run the script, the script runs endlessly (I suspect due to some idle connection to the MySQL database).
When I interrupt it using CTRL + C, I get one of the following errors:
"'Cursor' object has no attribute 'connection'"
"commands out of sync; you can't run this command now"
or any other KeyboardInterrupt exception, as would be expected.
It seems to me that these errors are caused by some erroneous way of handling connections and cursors in my code.
I read it was good practice to use with con: so that the connection will automatically close itself after the query. I use 'with' on 'con' in each function, so the connection is closed, but I decided to define the connection globally, for any function to use it. This seems incompatible with the with con: context management. I suspect the cursor needs to be 'context managed' in a similar way, but I do not know how to do this (To my knowledge, PHP doesn't use cursors for MySQL, so I have no experience using them).
I now have the following questions:
Why does it work the first time but not the second? (it will however, work again, once, after the CTRL + C interrupt).
How should I go about using connections and cursors when using multiple functions (that can be called upon in sequence)?
I think there are two main issues going on here- one appears to be python code and the other is the structure of how you're interacting to your DB.
First, you're not closing your connection. This depends on your application's needs - you have to decide how long it should stay open. Reference this SO question
from contextlib import closing
with closing( connection.cursor() ) as cursor:
... use the cursor ...
# cursor closed. Guaranteed.
connection.close()
Right now, you have to interrupt your program with Ctl+C because there's no reason for your with statement to stop running.
Second, start thinking about your interactions with the DB in terms of 'transactions'. Do something, commit it to the DB, if it didn't work, rollback, if it did, close the connection. Here's a tutorial.
With connections, as with file handles the rule of thumb is open late, close early.
So I would recommend share connections only where they are trying to do one thing. Or if you multiprocess, then each process gets a connection, again following open late, close early. And if you are doing sequential operation (say in a loop) open and close outside the loop. Having global connections can get messy. Mainly because now you have to keep track of which function uses it at what time, and what it tries to do with it.
The issue of "cannot run command now", is because your keyboard interrupt kills the active connection.
As to part one of your question - endlessly could be anywhere. Each instance of python will get its own connection. So when you run it the second time it should get its own connection. Open up a mysql client and do
show full processlist
to see whats going on.

Python Mysql Connector not getting new content

I making a simple python script which checks a mysql table every x seconds and print the result to the console. I use the MySQL Connector Driver.
However, running the script only prints the initalial values. By that I mean, that if I change the values in the database while my script is running, it's not registered by the script and it's keeps on writing the initial values.
The code which retrieves the values in a while loop is as follows:
def get_fans():
global cnx
query = 'SELECT * FROM config'
while True:
cursor = cnx.cursor()
cursor.execute(query)
for (config_name, config_value) in cursor:
print config_name, config_value
print "\n"
cursor.close()
time.sleep(3)
Why is this happening?
Most likely, it's an autocommit issue. MySQL Connector Driver documentation states, that it has autocommit turned off. Make sure you commit your implicit transactions while changing your table. Also because default isolation is REPEATABLE READ you have:
All consistent reads within the same transaction read the snapshot
established by the first read.
So I guess you have to manage transaction even for your polling script. Or change isolation level to READ COMMITTED.
Though, the better way is to restore to MySQL client default autocommit-on mode. Even though PEP 249 guides to have it initially disabled, it's mere a proposal and most likely a serious design mistake. Not only it makes novices wonder about uncommited changes, makes even your read-only workload slower, it complicates data-layer design and breaks explicit is better than implicit Python zen. Even sluggish things like Django have it rethought.

Python/Hive interface slow with fetchone(), hangs with fetchall()

I have a python script that is querying HiveServer2 using pyhs2, like so:
import pyhs2;
conn = pyhs2.connect(host=localhost,
port=10000,
user='user',
password='password',
database='default');
cur = conn.cursor();
cur.execute("SELECT name,data,number,time FROM table WHERE date = '2014-01-01' AND number in (1,5,6,22) ORDER BY name,time ASC");
line = cur.fetchone();
while line is not None:
<do some processing, including writing to stdout>
.
.
.
line = cur.fetchone();
I have also tried using fetchall() instead of fetchone(), but that just seems to hang forever.
My query runs just fine and returns ~270 million rows. For testing, I dumped the output from Hive into a flat, tab-delimited file and wrote the guts of my python script against that, so I didn't have to wait for the query to finish everytime I ran. My script that reads the flat file will finish in ~20 minutes. What confuses me is that I don't see that same performance when I directly query Hive. In fact, it takes about 5 times longer to finish processing. I am pretty new to Hive, and python so maybe I am making some bone-headed error, but examples that I see online show a set up such as this. I just want to iterate through my Hive return, getting one row at a time as quickly as possible, much like I did using my flat file. Any suggestions?
P.S. I have found this question that sounds similar:
Python slow on fetchone, hangs on fetchall
but that ended up being a SQLite issue, and I have no control over my Hive set up.
Have you considered using fetchmany().
That would be the DBAPI answer for pulling data in chunks (bigger one, where the overhead is an issue, and smaller than all rows, where memory is an issue).

Python in Windows: large number of inserts using pyodbc causes memory leak

I am trying to populate a MS SQL 2005 database using python on windows. I am inserting millions of rows, and by 7 million I am using almost a gigabyte of memory. The test below eats up 4 megs of RAM for each 100k rows inserted:
import pyodbc
connection=pyodbc.connect('DRIVER={SQL Server};SERVER=x;DATABASE=x;UID=x;PWD=x')
cursor=connection.cursor()
connection.autocommit=True
while 1:
cursor.execute("insert into x (a,b,c,d, e,f) VALUES (?,?,?,?,?,?)",1,2,3,4,5,6)
mdbconn.close()
Hack solution: I ended up spawning a new process using the multiprocessing module to return memory. Still confused about why inserting rows in this way consumes so much memory. Any ideas?
I had the same issue, and it looks like a pyodbc issue with parameterized inserts: http://code.google.com/p/pyodbc/issues/detail?id=145
Temporarily switching to a static insert with the VALUES clause populated eliminates the leak, until I try a build from the current source.
Even I had faced the same problem.
I had to read more than 50 XML files each about 300 MB and load them into SQL Server 2005.
I tried the following :
Using the same cursor by dereferencing.
Closing /opening the connection
Setting the connection to None.
Finally ended up bootstrapping each XML file load using Process module.
Now I have replaced the process using IronPython - System.Data.SqlClient.
This give a better performance and also better interface.
Maybe close & re-open the connection every million rows or so?
Sure it doesn't solve anything, but if you only have to do this once you could get on with life!
Try creating a separate cursor for each insert. Reuse the cursor variable each time through the loop to implicitly dereference the previous cursor. Add a connection.commit after each insert.
You may only need something as simple as a time.sleep(0) at the bottom of each loop to allow the garbage collector to run.
You could also try forcing a garbage collection every once in a while with gc.collect() after importing the gc module.
Another option might be to use cursor.executemany() and see if that clears up the problem. The nasty thing about executemany(), though, is that it takes a sequence rather than an iterator (so you can't pass it a generator). I'd try the garbage collector first.
EDIT: I just tested the code you posted, and I am not seeing the same issue. Are you using an old version of pyodbc?

Categories