I'm trying to generate and insert many (>1.000.000) Rows in a MS Access Database. For the generation I use numpy functions, therefore I try to access the database with python. I started with pyodbc:
import numpy as np
import pyodbc as db
connection_string = "Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/Users.../DataCreation.accdb;"
connection = db.connect(connection_string)
cur = connection.cursor()
k = 0
numberofdatasets=1000
for l in range(50):
params=np.empty(numberofdatasets, dtype=[('valnr', int),('val', float)])
for j in range(numberofdatasets):
params[j]= (k, somevalue generated with a numpy function)
k=k+1
params = np.array(params).tolist()
cur.executemany("INSERT INTO DataFinal VALUES (1,?,1,?);", params)
connection.commit()
connection.close()
This works, but takes way too long for being useful to me. I timed it and the problem is the
cur.executemany
I searched the internet and found the fast_executemany flag. But when I add the line
cur.fast_executemany = True
my kernel dies. Does anyone have an idea why? I'm using 64bit Windows 10, Python 3.6, Spyder 3.2.8 and MS Access 2016. Please don't suggest not using MS Access, I'm aware there are more efficient databases to do this, but right now this is all I can use. I also am aware that it might not be best du first generate the numpy array and then turn it into a list. My next try was turbodbc and its function
cursor.executemanycolumns
but this threw an error from the driver and therefore is a different problem I believe. Any help is appreciated but maybe I should add that I just started using Python in connection with databases and I prefer to understand the problem at least a bit and not just copy some mystery code :) Thanks.
The pyodbc fast_executemany feature uses an ODBC mechanism called "parameter arrays". Not all ODBC drivers support parameter arrays, and apparently the Microsoft Access ODBC driver is one that doesn't. As mentioned in the pyodbc Wiki
Note that this feature ... is currently only recommended for applications running on Windows that use Microsoft's ODBC Driver for SQL Server.
Related
I'm trying to connect to an oldschool jTDS ms server for a variety of different analysis tasks. Firstly just using Python with SQL alchemy, as well as using Tableau and Presto.
Focusing on SQL Alchemy first at the moment I'm getting an error of:
Data source name not found and no default driver specified
With this, based on this thread here Connecting to SQL Server 2012 using sqlalchemy and pyodbc
i.e,
import urllib
params = urllib.parse.quote_plus("DRIVER={FreeTDS};"
"SERVER=x-y.x.com;"
"DATABASE=;"
"UID=user;"
"PWD=password")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={FreeTDS}".format(params))
Connecting works fine through Dbeaver, using a jTDS SQL Server (MSSQL) driver (which is labelled as legacy).
Curious as to how to resolve this issue, I'll keep researching away, but would appreciate any help.
I imagine there is an old drive on the internet I need to integrate into SQL Alchemy to begin with, and then perhaps migrating this data to something newer.
Appreciate your time
I am trying to switch a pyodbc connection to sqlalchemy engine. My working pyodbc connection is:
con = pyodbc.connect('DSN=SageLine50v23;UID=#####;PWD=#####;')
This is what I've tried.
con = create_engine('pyodbc://'+username+':'+password+'#'+url+'/'+db_name+'?driver=SageLine50v23')
I am trying to connect to my Sage 50 accounting data but just can't work out how to build the connection string. This is where I downloaded the odbc driver https://my.sage.co.uk/public/help/askarticle.aspx?articleid=19136.
I got some orginal help for the pyodbc connection using this website (which is working) https://www.cdata.com/kb/tech/sageuk-odbc-python-linux.rst but would like to use SQLAlchemy for it connection with pandas. Any ideas? Assume the issue is with this part pyodbc://
According to this thread Sage 50 uses MySQL to store its data. However, Sage also provides its own ODBC driver which may or may not use the same SQL dialect as MySQL itself.
SQLAlchemy needs to know which SQL dialect to use, so you could try using the mysql+pyodbc://... prefix for your connection URI. If that doesn't work (presumably because "Sage SQL" is too different from "MySQL SQL") then you may want to ask Sage support if they know of a SQLAlchemy dialect for their product.
I have a program, in which I have been using the phoenixdb package developed by Lukas Lalinsky but during the past few days it seems to have become very unstable. I think this is due to the size of the database (as it is constantly growing). By unstable I mean that around half my queries are failing with a runtime exception.
So I have moved on and tried to find a more stable way to connect with my Phoenix "server". Therefore I want to try out a JDBC connection. As far as I have understood Phoenix should have great integration with JDBC.
I do however have problems with understanding how to set up the initial connection.
I read the following Usage section of the JayDeBeApi package, but I don't know what the Driver Class is or where it is located? If I have to download it myself? How to set it up? And so forth.
I was hoping someone in here would know and hopefully explain it in detail.
Thanks!
EDIT:
I've managed to figure out that my connect statement should be something along this:
import jaybedeapi as jdbc
conn = jdbc.connect('org.apache.phoenix.jdbc.PhoenixDriver', ['jdbc:phoenix:<ip>:<port>:', '', ''], '<location-of-phoenix-client.jar>')
However I still don't know where to get my hands on that phoenix-client.jar file and how to reference to it.
I managed to find the solution after having set up a Java project and testing out JDBC in that development environment and getting a successful connection.
To get the JDBC connection working in Java I used the JDBC driver found in the Phoenix distribution from Apache here. I used the driver that matched my Phoenix and HBase versions - phoenix-4.9.0-HBase-1.2-client.jar
Once that setup was completed and I could connect to Phoenix using Java I started trying to set it up using Python. I started a connection to Phoenix with the following:
import jaydebeapi as jdbc
import os
cwd = os.getcwd()
jar = cwd + '/phoenix-4.9.0-HBase-1.2-client.jar'
drivername = 'org.apache.phoenix.jdbc.PhoenixDriver'
url = 'jdbc:phoenix:<ip>:<port>/'
conn = jdbc.connect(drivername, url, jar)
Now I had a successful connection through JDBC to Phoenix using Python. Hope someone else out there can use this question in the future.
I created a cursor using the following and could issue commands like in the following:
cursor = conn.cursor()
sql = """SELECT ...."""
cursor.execute(sql)
resp = cursor.fetchone() # could use .fetchall() or .fetchmany() if needed
I hope this helps someone out there!
I'm using python 3.4 (ActiveState) and pyodbc 3.0.7 on a Windows 7 box to connect to a SQL Server 2008 RC2 database running on Window NT 6.1.
The problem I'm having is that the code below fails silently. No changes are made to the database.
connection = pyodbc.connect("DRIVER={SQL Server};SERVER=(local);DATABASE=Kerb;UID=sa;PWD=password", autocommit=True)
cursor = connection.cursor()
cursor.execute('''INSERT INTO [My].[Sample] (Case) VALUES (1);''')
I've also attempted to force the insert with a commit statement (which, unless I'm mistaken, shouldn't be necessary due to the autocommit=True), this also fails with no output.
cursor.execute('''INSERT INTO [My].[Sample] (Case) VALUES (1);''')
cursor.commit()
So my solution so far has been to add a sleep, which has solved the problem. But I worry about implementing this solution in production as it doesn't take into account network lag, etc.
cursor.execute('''INSERT INTO [My].[Sample] (Case) VALUES (1);''')
time.sleep(1)
I believe my question may be related to:
pyODBC and SQL Server 2008 and Python 3
If anyone has any ideas for further debugging or has documentation regarding this bit of asynchronous behavior I would love to hear it.
Thanks!
Unfortunately it appears that PyODBC cannot execute insert statements without the use of a timeout. I have started using PyMSSQL and the timeout is no longer required for a successful commit.
I am on a windows server 2003 and accessing a locally stored MS Access 2000 MDB from python 2.5.4 scripts using pyodbc 2.1.5.
The db access is very slow this way (I am on fast machine and all other db operations are normal) and I wonder if there is a better way to access the MDB from python? Maybe a better odbc driver?
This is an example script like I use:
import pyodbc
cstring = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=t:\data.mdb'
conn = pyodbc.connect(cstring)
cursor = conn.cursor()
sql="UPDATE ..."
cursor.execute(sql)
conn.commit()
conn.close()
Try setting up your connection once on program startup and then reusing the connection everywhere. Rather than closing it after every Execute or Commit.
Tony's suggestion makes the most sense to me. However, if it's not enough, you could also try a later version of the driver, such as this one that works with Office 2007 files (as well as older versions, of course). You can download and install it even if you don't have Office.
Once you have it installed, try a connection string like this:
Provider=Microsoft.ACE.OLEDB.12.0;Data Source=T:\data.mdb;