Make Python access to MS Access faster on Windows - python

I am on a windows server 2003 and accessing a locally stored MS Access 2000 MDB from python 2.5.4 scripts using pyodbc 2.1.5.
The db access is very slow this way (I am on fast machine and all other db operations are normal) and I wonder if there is a better way to access the MDB from python? Maybe a better odbc driver?
This is an example script like I use:
import pyodbc
cstring = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=t:\data.mdb'
conn = pyodbc.connect(cstring)
cursor = conn.cursor()
sql="UPDATE ..."
cursor.execute(sql)
conn.commit()
conn.close()

Try setting up your connection once on program startup and then reusing the connection everywhere. Rather than closing it after every Execute or Commit.

Tony's suggestion makes the most sense to me. However, if it's not enough, you could also try a later version of the driver, such as this one that works with Office 2007 files (as well as older versions, of course). You can download and install it even if you don't have Office.
Once you have it installed, try a connection string like this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=T:\data.mdb;

Related

pyodbc fast_executemany with Access ODBC crashes Python interpreter

I'm trying to generate and insert many (>1.000.000) Rows in a MS Access Database. For the generation I use numpy functions, therefore I try to access the database with python. I started with pyodbc:
import numpy as np
import pyodbc as db
connection_string = "Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/Users.../DataCreation.accdb;"
connection = db.connect(connection_string)
cur = connection.cursor()
k = 0
numberofdatasets=1000
for l in range(50):
params=np.empty(numberofdatasets, dtype=[('valnr', int),('val', float)])
for j in range(numberofdatasets):
params[j]= (k, somevalue generated with a numpy function)
k=k+1
params = np.array(params).tolist()
cur.executemany("INSERT INTO DataFinal VALUES (1,?,1,?);", params)
connection.commit()
connection.close()
This works, but takes way too long for being useful to me. I timed it and the problem is the
cur.executemany
I searched the internet and found the fast_executemany flag. But when I add the line
cur.fast_executemany = True
my kernel dies. Does anyone have an idea why? I'm using 64bit Windows 10, Python 3.6, Spyder 3.2.8 and MS Access 2016. Please don't suggest not using MS Access, I'm aware there are more efficient databases to do this, but right now this is all I can use. I also am aware that it might not be best du first generate the numpy array and then turn it into a list. My next try was turbodbc and its function
cursor.executemanycolumns
but this threw an error from the driver and therefore is a different problem I believe. Any help is appreciated but maybe I should add that I just started using Python in connection with databases and I prefer to understand the problem at least a bit and not just copy some mystery code :) Thanks.
The pyodbc fast_executemany feature uses an ODBC mechanism called "parameter arrays". Not all ODBC drivers support parameter arrays, and apparently the Microsoft Access ODBC driver is one that doesn't. As mentioned in the pyodbc Wiki
Note that this feature ... is currently only recommended for applications running on Windows that use Microsoft's ODBC Driver for SQL Server.

Using jaydebeapi3 to connect to Apache Phoenix

I have a program, in which I have been using the phoenixdb package developed by Lukas Lalinsky but during the past few days it seems to have become very unstable. I think this is due to the size of the database (as it is constantly growing). By unstable I mean that around half my queries are failing with a runtime exception.
So I have moved on and tried to find a more stable way to connect with my Phoenix "server". Therefore I want to try out a JDBC connection. As far as I have understood Phoenix should have great integration with JDBC.
I do however have problems with understanding how to set up the initial connection.
I read the following Usage section of the JayDeBeApi package, but I don't know what the Driver Class is or where it is located? If I have to download it myself? How to set it up? And so forth.
I was hoping someone in here would know and hopefully explain it in detail.
Thanks!
EDIT:
I've managed to figure out that my connect statement should be something along this:
import jaybedeapi as jdbc
conn = jdbc.connect('org.apache.phoenix.jdbc.PhoenixDriver', ['jdbc:phoenix:<ip>:<port>:', '', ''], '<location-of-phoenix-client.jar>')
However I still don't know where to get my hands on that phoenix-client.jar file and how to reference to it.
I managed to find the solution after having set up a Java project and testing out JDBC in that development environment and getting a successful connection.
To get the JDBC connection working in Java I used the JDBC driver found in the Phoenix distribution from Apache here. I used the driver that matched my Phoenix and HBase versions - phoenix-4.9.0-HBase-1.2-client.jar
Once that setup was completed and I could connect to Phoenix using Java I started trying to set it up using Python. I started a connection to Phoenix with the following:
import jaydebeapi as jdbc
import os
cwd = os.getcwd()
jar = cwd + '/phoenix-4.9.0-HBase-1.2-client.jar'
drivername = 'org.apache.phoenix.jdbc.PhoenixDriver'
url = 'jdbc:phoenix:<ip>:<port>/'
conn = jdbc.connect(drivername, url, jar)
Now I had a successful connection through JDBC to Phoenix using Python. Hope someone else out there can use this question in the future.
I created a cursor using the following and could issue commands like in the following:
cursor = conn.cursor()
sql = """SELECT ...."""
cursor.execute(sql)
resp = cursor.fetchone() # could use .fetchall() or .fetchmany() if needed
I hope this helps someone out there!

pyODBC insert failing silently

I'm using python 3.4 (ActiveState) and pyodbc 3.0.7 on a Windows 7 box to connect to a SQL Server 2008 RC2 database running on Window NT 6.1.
The problem I'm having is that the code below fails silently. No changes are made to the database.
connection = pyodbc.connect("DRIVER={SQL Server};SERVER=(local);DATABASE=Kerb;UID=sa;PWD=password", autocommit=True)
cursor = connection.cursor()
cursor.execute('''INSERT INTO [My].[Sample] (Case) VALUES (1);''')
I've also attempted to force the insert with a commit statement (which, unless I'm mistaken, shouldn't be necessary due to the autocommit=True), this also fails with no output.
cursor.execute('''INSERT INTO [My].[Sample] (Case) VALUES (1);''')
cursor.commit()
So my solution so far has been to add a sleep, which has solved the problem. But I worry about implementing this solution in production as it doesn't take into account network lag, etc.
cursor.execute('''INSERT INTO [My].[Sample] (Case) VALUES (1);''')
time.sleep(1)
I believe my question may be related to:
pyODBC and SQL Server 2008 and Python 3
If anyone has any ideas for further debugging or has documentation regarding this bit of asynchronous behavior I would love to hear it.
Thanks!
Unfortunately it appears that PyODBC cannot execute insert statements without the use of a timeout. I have started using PyMSSQL and the timeout is no longer required for a successful commit.

Python Connect to Oracle DB

I currently use PYODBC to connect to MS SQL Server and MYSQL, but now need to access an Oracle database as well.
I have Oracle SQL Developer installed on my work comp (but there doesn't seem to be a separate Net Manager client per other SO posts), which I can use to access the DB.
Ideally, I would run what I need to in python, but am having difficulties. As it stands, I have created a linked server object to the Oracle DB in a MS SQL Server DB as a work around, but this isn't ideal.
What do I need to do to get PYODBC (or substitute) to connect to Oracle? Thanks very kindly.
I ran into the same issue where I could connect to a database via Oracle SQL Developer but not via pyodbc. Someone else did most of the database setup, so I wasn't sure of the proper connection parameters. I'll run you through how I was able to connect on a Windows computer.
In the Start Menu I typed "odbc" and selected "Microsoft ODBC Administrator". Under the "System DSN" tab I found my DSN name (we'll call it myDSN) and corresponding driver (mine was "Oracle in OraClient11g_home2"). I also have to specify a username and password for my database so my connection line now looks like this:
cnxn = pyodbc.connect(driver='{Oracle in OraClient11g_home2}', dsn='myDSN', uid='HODOR', pwd='hodor')
Maybe at this point it will work for you, but I still wasn't able to connect. This computer is a mess of 32 and 64 bit drivers so I figured I was pointing to the wrong one. So once again into the Start Menu, where under All Programs I found a folder called "Oracle in OraClient11g_home2" and right under it, one called "Oracle in OraClient11g_home32Bit". I changed my connection line in Python to the following:
cnxn = pyodbc.connect(driver='{Oracle in OraClient11g_home32Bit}', dsn='myDSN', uid='HODOR', pwd='hodor')
And it connected.

SQL queries through PYODBC fail silently on one machine, works on another

I am working on a program to automate parsing data from XML files and storing it into several databases. (Specifically the USGS realtime water quality service, if anyone's interested, at http://waterservices.usgs.gov/rest/WaterML-Interim-REST-Service.html) It's written in Python 2.5.1 using LXML and PYODBC. The databases are in Microsoft Access 2000.
The connection function is as follows:
def get_AccessConnection(db):
connString = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=' + db
cnxn = pyodbc.connect(connString, autocommit=False)
cursor = cnxn.cursor()
return cnxn, cursor
where db is the filepath to the database.
The program:
a) opens the connection to the database
b) parses 2 to 8 XML files for that database and builds the values from them into a series of records to insert into the database (using a nested dictionary structure, not a user-defined type)
c) loops through the series of records, cursor.execute()-ing an SQL query for each one
d) commits and closes the database connection
If the cursor.execute() call throws an error, it writes the traceback and the query to the log file and moves on.
When my coworker runs it on his machine, for one particular database, specific records will simply not be there, with no errors recorded. When I run the exact same code on the exact same copy of the database over the exact same network path from my machine, all the data that should be there is there.
My coworker and I are both on Windows XP computers with Microsoft Access 2000 and the same versions of Python, lxml, and pyodbc installed. I have no idea how to check whether we have the same version of the Microsoft ODBC drivers. I haven't been able to find any difference between the records that are there and the records that aren't. I'm in the process of testing whether the same problem happens with the other databases, and whether it happens on a third coworker's computer as well.
What I'd really like to know is ANYTHING anyone can think of that would cause this, because it doesn't make sense to me. To summarize: Python code executing SQL queries will silently fail half of them on one computer and work perfectly on another.
Edit:
No more problem. I just had my coworker run it again, and the database was updated completely with no missing records. Still no idea why it failed in the first place, nor whether or not it will happen again, but "problem solved."
I have no idea how to check whether
we have the same version of the
Microsoft ODBC drivers.
I think you're looking for Control Panel | Administrative Tools | Data Sources (ODBC). Click the "Drivers" tab.
I think either Access 2000 or Office 2000 shipped with a desktop edition of SQL Server called "MSDE". Might be worth installing that for testing. (Or production, for that matter.)

Categories