Why do sql APIs have separate connection and cursor objects?

Why do sql APIs have separate connection and cursor objects? - python

Before I can get to work with, say, the sqlite3 library in Python, I need to make a connection to the database and a cursor to the connection.
connection = sqlite3.connect(path)
cursor = connection.cursor()
Queries and result fetching are done with the cursor object, not the connection.
From what I've seen this is a standard set-up for Python SQL APIs. (Obviously this is not the usual "cursor" that points to a row in a result set.) In contrast, PHP's mysqli or PDO libraries put a query() method directly on the connection object.
So why the two-step arrangement in Python, and any other library that works this way? What is the use case for maintaining separate connection and cursor objects?

This is most likely just an arbitrary design decision. Most database APIs have some type of object that represents the results of a query, which you can then iterate through to get each of the rows. There are basically two ways you can do this:
Perform a query on the connection object, and it returns a new results object.
Create a results object, and then perform the query on this object to fill it in.
There isn't any significant difference between the two arrangements, and Python has chosen the second method (the results object is called cursor).
Perhaps the first method seems more logical, since most of the cursor methods (e.g. .fetchone()) aren't really useful until after you perform a query. On the other hand, this design separates the object that just represent the database connection from the object that represents all aspects of a specific query. The Python cursor class does have some methods that apply to a specific query and must be called before .execute(): .setinputsizes() and .setoutputsize() to pre-allocate buffers for the query.
Python is hardly unique in this style. Its cursor is not too different from the mysqli_stmt and PDOStatement classes of the modern PHP APIs. In PDO you don't get a statement until you call PDO::prepare(), but with mysqli you have a choice: you can call mysql::prepare() to get a statement, or can use mysqli::stmt_init() to get a fresh statement and then call prepare() and execute() on this object. This is quite similar to the Python style.

Related

Get the `.rowcount` of SQLAlchemy ORM query

How do I get both the rowcount and the instances of an SQLAlchemy ORM Query in a single roundtrip to the database?
If I call query.all(), I get a standard list, which I can get the len() of, but that requires loading the whole resultset in memory at once.
If I call iter(query), I get back a standard python generator, with no access to the .rowcount of the underlying ResultProxy.
If I call query.count() then iter(query) I'm doing two roundtrips of a potentially expensive query to the database.
if I managed to get hold of a ResultProxy for a Query, that would give me the .rowcount, then I could use Query.instances() to get the same generator that Query.__iter__() would give me.
But is there a convenient way of getting at the ResultProxy of a Query other than repeating what Query.__iter__() and Query._execute_and_instances() do? Seems rather inconvenient.
Notice
As mentioned by This answer (thanks #terminus), getting the ResultProxy.rowcount might or might not be useful, and is explicitly warned against in SQLAlchemy documentation for pure "select" statements.
That said, in the case of psycopg2, the .rowcount of the underlying cursor is documented to return the correct number of records returned by any query, even "SELECT" queries, unless you're using the stream_results=True option (thanks #SuperShoot).

how to store a SQL-database in a python object, and perform queries in the object?

I have a big postgrSQL database. I would like to somehow store the full database in a python object, which form/structure would reflect the one of the database. Namely I imagine something like
* An object database, with an attribute .tables which a kind of list of object "table", and a table object has an attribute "list_of_keys" (list of the column names) and an attribute "rows", which reflects all the rows of the corresponding table in the database.
Now, the main point i need is: i want to be able to perform a search in the database object, with exactely the same SQL synthax that i would use in the corresponding SQL database. Thus something like
database.execute("SELECT * FROM .....")
where, i repeat, "database" is a purely python object (which was filled with data coming from an SQL database, but which is now independent of it).
My aim is: i want to be able to apply the same algorithm either on a SQL-Database, or on a python-object, such as described above, without changing my code. So, i imagine, let "database" be either a usual database-connector/cursor (like with psycopg, for example), or a python object as i described, and the same piece of code
database.execute("SELECT BLABLABLA")
would work in both cases.
Is there any known module which allows that ?
thanks.

It might get a bit complicated, but take a look at SQLite's in-memory storage:
import sqlite3
cnx = sqlite3.connect(':memory:')
cnx.execute('CREATE TABLE ...')
There are some differences in the SQL syntax, but the basic stuff works fine. This might also take a good amount of RAM, depending on your data.

sqlalchemy change with executemany from version 0.5 to 0.7

I'm attempting to collect up a list of dictionaries, and do a bulk insert into a mysql db with sqlalchemy.
According to these docs for version 0.5, you do this with an executemany function call off of a connection object. This is the only place where I've been able to find that executemany exists.
However, in these docs for 0.7, I find that even though executemany is referenced, they do not use it in the code snippet, and in fact, it no longer exists in the connection class.
It seems that the two functions were combined, but if so, how is the connection.execute method different from the session.execute method? It seems in the docs that session.execute does not support bulk inserts, so how would one go about inserting several thousand dictionaries into a single table?

I think you're misreading the 0.5 link, the example you're pointing to still uses "execute()". SQLAlchemy has never exposed an explicit executemany() method. executemany() is specifically a function of the underlying DBAPI, which SQLAlchemy will make use of if the given parameter set is detected as a list of parameters.
session.execute() supports the same functionality as connection.execute(), except the parameter list is given using the named argument "params". The docstring isn't explicit about this which should likely be adjusted.
You can also get a transaction-specific Connection object from the Session using the session.connection() method.

Why do you need to create a cursor when querying a sqlite database?

I'm completely new to Python's sqlite3 module (and SQL in general for that matter), and this just completely stumps me. The abundant lack of descriptions of cursor objects (rather, their necessity) also seems odd.
This snippet of code is the preferred way of doing things:
import sqlite3
conn = sqlite3.connect("db.sqlite")
c = conn.cursor()
c.execute('''insert into table "users" values ("Jack Bauer", "555-555-5555")''')
conn.commit()
c.close()
This one isn't, even though it works just as well and without the (seemingly pointless) cursor:
import sqlite3
conn = sqlite3.connect("db.sqlite")
conn.execute('''insert into table "users" values ("Jack Bauer", "555-555-5555")''')
conn.commit()
Can anyone tell me why I need a cursor?
It just seems like pointless overhead. For every method in my script that accesses a database, I'm supposed to create and destroy a cursor?
Why not just use the connection object?

Just a misapplied abstraction it seems to me. A db cursor is an abstraction, meant for data set traversal.
From Wikipedia article on subject:
In computer science and technology, a database cursor is a control
structure that enables traversal over the records in a database.
Cursors facilitate subsequent processing in conjunction with the
traversal, such as retrieval, addition and removal of database
records. The database cursor characteristic of traversal makes cursors
akin to the programming language concept of iterator.
And:
Cursors can not only be used to fetch data from the DBMS into an
application but also to identify a row in a table to be updated or
deleted. The SQL:2003 standard defines positioned update and
positioned delete SQL statements for that purpose. Such statements do
not use a regular WHERE clause with predicates. Instead, a cursor
identifies the row. The cursor must be opened and already positioned
on a row by means of FETCH statement.
If you check the docs on Python sqlite module, you can see that a python module cursor is needed even for a CREATE TABLE statement, so it's used for cases where a mere connection object should suffice - as correctly pointed out by the OP. Such abstraction is different from what people understand a db cursor to be and hence, the confusion/frustration on the part of users. Regardless of efficiency, it's just a conceptual overhead. Would be nice if it was pointed out in the docs that the python module cursor is bit different than what a cursor is in SQL and databases.

According to the official docs connection.execute() is a nonstandard shortcut that creates an intermediate cursor object:
Connection.execute
This is a nonstandard shortcut that creates a cursor object by calling the cursor() method, calls the cursor’s execute() method with the parameters given, and returns the cursor.

You need a cursor object to fetch results. Your example works because it's an INSERT and thus you aren't trying to get any rows back from it, but if you look at the sqlite3 docs, you'll notice that there aren't any .fetchXXXX methods on connection objects, so if you tried to do a SELECT without a cursor, you'd have no way to get the resulting data.
Cursor objects allow you to keep track of which result set is which, since it's possible to run multiple queries before you're done fetching the results of the first.

12.6.8. Using sqlite3 efficiently
12.6.8.1. Using shortcut methods
Using the nonstandard execute(), executemany() and executescript() methods of the Connection object, your code can be written more concisely because you don’t have to create the (often superfluous) Cursor objects explicitly. Instead, the Cursor objects are created implicitly and these shortcut methods return the cursor objects. This way, you can execute a SELECT statement and iterate over it directly using only a single call on the Connection object.
(sqlite3 documentation; emphasis mine.)
Why not just use the connection object?
Because those methods of the connection object are nonstandard, i.e. they are not part of Python Database API Specification v2.0 (PEP 249).
As long as you use the standard methods of the Cursor object, you can be sure that if you switch to another database implementation that follows the above specification, your code will be fully portable. Perhaps you will only need to change the import line.
But if you use the connection.execute there is a chance that switching won't be that straightforward. That's the main reason you might want to use cursor.execute instead.
However if you are certain that you're not going to switch, I'd say it's completely OK to take the connection.execute shortcut and be "efficient".

It gives us the ability to have multiple separate working environments through the same connection to the database.

mysqldb pulls whole query result in one chunk always even if I just do a fetchone?

So if I do
import MySQLdb
conn = MySQLdb.connect(...)
cur = conn.cursor()
cur.execute("SELECT * FROM HUGE_TABLE")
print "hello?"
print cur.fetchone()
It looks to me that MySQLdb gets the entire huge table before it gets to the "print".
I previously assumed it did some sort of "cursor/state" lazy retrieval in the background,
but it doesn't look like it to me.
Is this right? If so is it because it has to be this way or is this due to a limitation
of the MySQL wire protocol? Does this mean that java/hibernate behave the same way?
I guess I need to use the "limit 1" MySQL clauses and relatives if I want to walk through
a large table without pulling in the whole thing at once? Or no? Thanks in advance.

In the _mysql module, use the following call:
conn.use_result()
That tells the connection you want to fetch rows one by one, leaving the remainder on the server (but leaving the cursor open).
The alternative (and the default) is:
conn.store_result()
This tells the connection to fetch the entire result set after executing the query, and subsequent fetches will just iterate through the result set, which is now in memory in your Python app. If your result set is very large, you should consider using LIMIT to restrict it to something you can handle.
Note that MySQL does not allow another query to be run until you have fetched all the rows from the one you have left open.
In the MySQLdb module, the equivalent is to use one of these two different cursor objects from MySQLdb.cusrors:
CursorUseResultMixIn
CursorStoreResultMixIn

This is correct in every other language I've used. The fetchone is just going to only retrieve the first row of the resultset which in this case is the entire database. It's more of a convenience method than anything, it's designed to be easier to use if you KNOW that there's only one result coming down or you only care about the first.

oursql is an alternative MySQL DB-API interface that exposes a few more of the lower-level details, and also provides other facilities for dealing with large datasets.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.