I am using python 2.7, pyodbc and mysql 5.5. I am on windows
I have query which returns millions of rows and I would like to process it in chunks.
using the fetchmany function.
He a portion of the code
import pyodbc
connection = pyodbc.connect('Driver={MySQL ODBC 5.1 Driver};Server=127.0.0.1;Port=3306;Database=XXXX;User=root; Password='';Option=3;')
cursor_1 = connection.cursor()
strSQLStatement = 'SELECT x1, x2 from X'
cursor_1.execute(strSQLStatement)
# the error occurs here
x1 = cursor_1.fetchmany(10)
print x1
connection.close()
My problem:
I get the error MySQL client ran out of memory
I guess that this is because the cursor_1.execute tries to read everything into memory and tried the following (one by one) but to no avail
In user interface (ODBC – admin tools) I ticked the “Don't cache results of forwarding-only cursors”
connection.query("SET GLOBAL query_cache_size = 40000")
My question:
Does pyodbc has the possibility to run the query and serve the results only on demand ?
The MySQL manual suggests to invoke mysql with the --quick option. Can this be done also when not using the command line?
Thanks for your help.
P.S: suggestions for an alternative MySQL module are also welcome, but I use portable python so my choice is limited.
Using MySQLdb with SSCursor will solve your issues.
Unfortunately the documentation isn't great but it is mentioned in the user guide and you can find an example in this stackoverflow question.
Use the LIMIT clause on the query string.
http://dev.mysql.com/doc/refman/5.5/en/select.html
By using
SELECT x1, x2 from X LIMIT 0,1000
You'll only get the 1st 1k records, then by doing :
SELECT x1, x2 from X LIMIT 1000,2000
You'd get the next 1k records.
Loop this appropriately to get all your records. (I dont know python so cant help here :( )
Related
I'm trying to generate and insert many (>1.000.000) Rows in a MS Access Database. For the generation I use numpy functions, therefore I try to access the database with python. I started with pyodbc:
import numpy as np
import pyodbc as db
connection_string = "Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/Users.../DataCreation.accdb;"
connection = db.connect(connection_string)
cur = connection.cursor()
k = 0
numberofdatasets=1000
for l in range(50):
params=np.empty(numberofdatasets, dtype=[('valnr', int),('val', float)])
for j in range(numberofdatasets):
params[j]= (k, somevalue generated with a numpy function)
k=k+1
params = np.array(params).tolist()
cur.executemany("INSERT INTO DataFinal VALUES (1,?,1,?);", params)
connection.commit()
connection.close()
This works, but takes way too long for being useful to me. I timed it and the problem is the
cur.executemany
I searched the internet and found the fast_executemany flag. But when I add the line
cur.fast_executemany = True
my kernel dies. Does anyone have an idea why? I'm using 64bit Windows 10, Python 3.6, Spyder 3.2.8 and MS Access 2016. Please don't suggest not using MS Access, I'm aware there are more efficient databases to do this, but right now this is all I can use. I also am aware that it might not be best du first generate the numpy array and then turn it into a list. My next try was turbodbc and its function
cursor.executemanycolumns
but this threw an error from the driver and therefore is a different problem I believe. Any help is appreciated but maybe I should add that I just started using Python in connection with databases and I prefer to understand the problem at least a bit and not just copy some mystery code :) Thanks.
The pyodbc fast_executemany feature uses an ODBC mechanism called "parameter arrays". Not all ODBC drivers support parameter arrays, and apparently the Microsoft Access ODBC driver is one that doesn't. As mentioned in the pyodbc Wiki
Note that this feature ... is currently only recommended for applications running on Windows that use Microsoft's ODBC Driver for SQL Server.
I'm relatively new to databases, so this question may have a simple answer (I've been searching for hours).
I want to write a Python script that pulls the SQL Code stored in SQL Server Management Studio. I can successfully connect to the database using pyodbc and run queries against the database tables, but I would like to be able to pull, for example, the SQL code stored in a procedure, function, view, etc. without running it.
This seems like something that would be relatively simple. I know it can be done in Powershell, but I would prefer to use Python. Is there some sort of module or pyodbc hack that will do this?
You can use the sp_helptext command in SQL Server, which will give you the SQL Server object source code line by line:
stored_proc_text = ""
res = cursor.execute('sp_helptext my_stored_procedure')
for row in res:
stored_proc_text += row[0]
print(stored_proc_text)
Good luck!
I have been at this for a while, trying all kinds of different packages from openSource, IBM, and many others. I have not yet found one that works without some sort of confusing install method that I can not get to work, or some sort of integration with other third-party pieces that I can not seem to get working.
I am simply trying to perform SQL statements on a Informix Server using Python. No different than mySQL and other tools. Using cursors or full result dumps, really do not care. I want to be able to formalize a query string statically or dynamically and then tell whatever tools/module to execute said query and return results (if any).
I have tried:
ibm_db 2.0.5.1 (https://pypi.python.org/pypi/ibm_db)
IBM Informix Client SDK
pymssql
unixODBC
Looked at but do not want to use Jython (JPython).
What I have managed:
I have been able to install and get the IBM Informix Client SDK installed and working. I can connect to my Informix DB server and perform queries.
I have mySQL working and connecting and querying.
I have written a Java program to perform queries using a Java driver, compiled it, combined it with a bash script to perform queries and email results.
I am just stumped. Looking for assistance on what to download (URLs), how to go about installing it (tips and tricks, environment variables, where to install it, etc..) I want to have something that does not depend on Java or writing Java, etc. I am looking for a solution that may will give me the ability to write Python to query, insert, update, and delete from an Informix database and tables. I want to combine my previously written Java and Bash script into a Python script.
Frustrated and looking for any assistance.
Thank you for listening and please ask questions if you do not understand my plea.
Informix on Linux is a bag of pain. My personal setup to get Informix-connect to work with CPython3 is stacking the Informix Client SDK with unixODBC and pyodbc. There are some hoops to jump through, none of which are documented. Almost all the setup is completely useless yet required to prevent some parts of the Informix-driver to bail out. Note that some options are case- and space-sensitive (Description=Informix != description = Informix).
Install the Informix Client SDK. You don't need all the garbage that comes in the package, just Informix Connect. I assume you use the default path /opt/IBM/informix
Add /opt/IBM/informix/lib/cli and /opt/IBM/informix/lib/esql to your dynamic linker lookup paths. On Fedora you can do this by putting them in a new file /etc/ld.so.conf.d/informix.conf
Create a new /etc/odbc.ini and add the following:
[ODBC Data Sources]
Infdrv1=IBM INFORMIX ODBC DRIVER
[Infdrv1]
Driver=/opt/IBM/informix/lib/cli/iclit09b.so
Description=Informix
Database=WHATEVER_YOUR_DB_NAME_IS
Servername=WHATEVER_YOUR_SERVER_NAME_IS
CLIENT_LOCALE=en_us.8859-1 # MAY BE DIFFERENT
DB_LOCALE=en_us.819 # MAY BE DIFFERENT
[ODBC]
UNICODE=UCS-2
Create a new /etc/odbcinst.ini and add the following
[IBM INFORMIX ODBC DRIVER]
Description=Informix Driver
Driver=libifcli.so
You need to set the environment variables INFORMIXDIR and ODBCINI. On Fedora you may add a new file /etc/profile.d/informix.sh and add
export INFORMIXDIR=/opt/IBM/informix
export ODBCINI=/etc/odbc.ini
Edit /opt/IBM/informix/etc/sqlhosts and put your basic connection information there. In the most simple case it has only one line that reads
YOUR_SERVER_NAME\tonsoctcp\tYOUR_DB_NAME\tpdap-np
Note that pdap-np is actually port 1526 which is also the Informix "Turbo"-Driver tcp port. See your /etc/services
Create an empty .odbc.ini in your $HOME e.g. by touch $HOME/.odbc.ini. It needs to be there. It needs to be 0 bytes. I love this part.
Install unixODBC and pyodbc from your favorite repository.
Remember to get your env-changes going, e.g. via reboot. You can now connect like this:
import pyodbc
DRIVER = 'IBM INFORMIX ODBC DRIVER'
SERVER = 'YOUR_SERVER_NAME'
DATABASE = 'YOUR_DB_NAME'
constr = 'DRIVER={%s};SERVER=%s;DATABASE=%s;UID=%s;PWD=%s' % (DRIVER, SERVER, DATABASE, USER, PASS)
con = pyodbc.connect(constr, autocommit=False)
From there on you can get your cursor, execute queries, fetch results and such. Note that there are numerous bugs in quirks in IBM's ODBC-driver, out of my head:
Rows that contain NULLs may cause a segfault as the IBM driver puts a 32bit int where a 64bit int is expected to signal the value being null. In case you are affected by this, you need to patch unixODBC for all possible column types to deal with this.
Columns without names cause the driver to segfault (e.g. SELECT COUNT(*) FROM foobar needs to be SELECT COUNT(*) AS c FROM foobar).
Make sure your encoding actually works as expected. UTF8 is something not enterprise-enough for IBM and UCS-2 is the only thing I got to work.
I am running MySQL 5.1 on my windows vista installation. The table in question uses MyISAM, has about 10 million rows. It is used to store text messages posted by users on a website.
I am trying to run the following query on it,
query = "select id, text from messages order by id limit %d offset %d" %(limit, offset)
where limit is set to a fixed value (in this case 20000) and offset is incremented in steps of 20000.
This query goes into an infinite loop when offset = 240000. This particular value and not any other value.
I isolated this query into a script and ran it, and got the same results. I then tried to run the last query (with offset = 240000) directly, and it worked !
I then tried executing the same queries directly in a mysql client to make sure that the error was not in the python DB accessor module. All the queries returned results, except the one with offset = 240000.
I then looked at the mysql server logs and saw the following.
[ERROR] C:\Program Files\MySQL\MySQL Server 5.1\bin\mysqld: Sort aborted
This probably means that when I stopped the python process (out of frustration), the mysqld process was 'sorting' something. When I looked at the my.ini file, I saw a lot of MAX_* options. I am currently experimenting with these, but just throwing it out there in the meanwhile.
Any help appreciated!
Have you checked the table with myisamchk?
I am working on a program to automate parsing data from XML files and storing it into several databases. (Specifically the USGS realtime water quality service, if anyone's interested, at http://waterservices.usgs.gov/rest/WaterML-Interim-REST-Service.html) It's written in Python 2.5.1 using LXML and PYODBC. The databases are in Microsoft Access 2000.
The connection function is as follows:
def get_AccessConnection(db):
connString = 'DRIVER={Microsoft Access Driver (*.mdb)};DBQ=' + db
cnxn = pyodbc.connect(connString, autocommit=False)
cursor = cnxn.cursor()
return cnxn, cursor
where db is the filepath to the database.
The program:
a) opens the connection to the database
b) parses 2 to 8 XML files for that database and builds the values from them into a series of records to insert into the database (using a nested dictionary structure, not a user-defined type)
c) loops through the series of records, cursor.execute()-ing an SQL query for each one
d) commits and closes the database connection
If the cursor.execute() call throws an error, it writes the traceback and the query to the log file and moves on.
When my coworker runs it on his machine, for one particular database, specific records will simply not be there, with no errors recorded. When I run the exact same code on the exact same copy of the database over the exact same network path from my machine, all the data that should be there is there.
My coworker and I are both on Windows XP computers with Microsoft Access 2000 and the same versions of Python, lxml, and pyodbc installed. I have no idea how to check whether we have the same version of the Microsoft ODBC drivers. I haven't been able to find any difference between the records that are there and the records that aren't. I'm in the process of testing whether the same problem happens with the other databases, and whether it happens on a third coworker's computer as well.
What I'd really like to know is ANYTHING anyone can think of that would cause this, because it doesn't make sense to me. To summarize: Python code executing SQL queries will silently fail half of them on one computer and work perfectly on another.
Edit:
No more problem. I just had my coworker run it again, and the database was updated completely with no missing records. Still no idea why it failed in the first place, nor whether or not it will happen again, but "problem solved."
I have no idea how to check whether
we have the same version of the
Microsoft ODBC drivers.
I think you're looking for Control Panel | Administrative Tools | Data Sources (ODBC). Click the "Drivers" tab.
I think either Access 2000 or Office 2000 shipped with a desktop edition of SQL Server called "MSDE". Might be worth installing that for testing. (Or production, for that matter.)