Get the `.rowcount` of SQLAlchemy ORM query

Get the `.rowcount` of SQLAlchemy ORM query - python

How do I get both the rowcount and the instances of an SQLAlchemy ORM Query in a single roundtrip to the database?
If I call query.all(), I get a standard list, which I can get the len() of, but that requires loading the whole resultset in memory at once.
If I call iter(query), I get back a standard python generator, with no access to the .rowcount of the underlying ResultProxy.
If I call query.count() then iter(query) I'm doing two roundtrips of a potentially expensive query to the database.
if I managed to get hold of a ResultProxy for a Query, that would give me the .rowcount, then I could use Query.instances() to get the same generator that Query.__iter__() would give me.
But is there a convenient way of getting at the ResultProxy of a Query other than repeating what Query.__iter__() and Query._execute_and_instances() do? Seems rather inconvenient.
Notice
As mentioned by This answer (thanks #terminus), getting the ResultProxy.rowcount might or might not be useful, and is explicitly warned against in SQLAlchemy documentation for pure "select" statements.
That said, in the case of psycopg2, the .rowcount of the underlying cursor is documented to return the correct number of records returned by any query, even "SELECT" queries, unless you're using the stream_results=True option (thanks #SuperShoot).

Related

SQLAlchemy Expression Language first or one

I'm using the sqlalchemy expression language (i.e. Core). I'm trying to build a query that should only return one result.
I want to do
query = select([table]).where(cond).one()
or
query = select([table]).where(cond).first()
but that only yields
AttributeError: 'Select' object has no attribute 'one'
The closest I have come is
query = select([table]).where(cond).limit(1)
but that is not entirely satisfactory because I get a list of results where I want a single result. I can work around by inserting extra logic but I'd be much happier to find a way to do this cleanly. I also would prefer not to use plain text queries.
Any ideas? Much appreciated.

one and first are methods available on ORM query objects. On closer inspection you can see that they cause execution of the query and then post process to get get the first/only entry and error check etc.
The SQL dialects that I have checked actually don't seem to have this functionality inbuilt for queries (oops I thought they did). They have limit or something similar.
The only option is to work around using limit and some logic or some call to execute or on the result.

Why do sql APIs have separate connection and cursor objects?

Before I can get to work with, say, the sqlite3 library in Python, I need to make a connection to the database and a cursor to the connection.
connection = sqlite3.connect(path)
cursor = connection.cursor()
Queries and result fetching are done with the cursor object, not the connection.
From what I've seen this is a standard set-up for Python SQL APIs. (Obviously this is not the usual "cursor" that points to a row in a result set.) In contrast, PHP's mysqli or PDO libraries put a query() method directly on the connection object.
So why the two-step arrangement in Python, and any other library that works this way? What is the use case for maintaining separate connection and cursor objects?

This is most likely just an arbitrary design decision. Most database APIs have some type of object that represents the results of a query, which you can then iterate through to get each of the rows. There are basically two ways you can do this:
Perform a query on the connection object, and it returns a new results object.
Create a results object, and then perform the query on this object to fill it in.
There isn't any significant difference between the two arrangements, and Python has chosen the second method (the results object is called cursor).
Perhaps the first method seems more logical, since most of the cursor methods (e.g. .fetchone()) aren't really useful until after you perform a query. On the other hand, this design separates the object that just represent the database connection from the object that represents all aspects of a specific query. The Python cursor class does have some methods that apply to a specific query and must be called before .execute(): .setinputsizes() and .setoutputsize() to pre-allocate buffers for the query.
Python is hardly unique in this style. Its cursor is not too different from the mysqli_stmt and PDOStatement classes of the modern PHP APIs. In PDO you don't get a statement until you call PDO::prepare(), but with mysqli you have a choice: you can call mysql::prepare() to get a statement, or can use mysqli::stmt_init() to get a fresh statement and then call prepare() and execute() on this object. This is quite similar to the Python style.

Confusion between prepared statement and parameterized query in Python

As far as I understand, prepared statements are (mainly) a database feature that allows you to separate parameters from the code that uses such parameters. Example:
PREPARE fooplan (int, text, bool, numeric) AS
INSERT INTO foo VALUES($1, $2, $3, $4);
EXECUTE fooplan(1, 'Hunter Valley', 't', 200.00);
A parameterized query substitutes the manual string interpolation, so instead of doing
cursor.execute("SELECT FROM tablename WHERE fieldname = %s" % value)
we can do
cursor.execute("SELECT FROM tablename WHERE fieldname = %s", [value])
Now, it seems that prepared statements are, for the most part, used in the database language and parameterized queries are mainly used in the programming language connecting to the database, although I have seen exceptions to this rule.
The problem is that asking about the difference between prepared statement and parameterized query brings a lot of confusion. Their purpose is admittedly the same, but their methodology seems distinct. Yet, there are sources indicating that both are the same. MySQLdb and Psycopg2 seem to support parameterized queries but don’t support prepared statements (e.g. here for MySQLdb and in the TODO list for postgres drivers or this answer in the sqlalchemy group). Actually, there is a gist implementing a psycopg2 cursor supporting prepared statements and a minimal explanation about it. There is also a suggestion of subclassing the cursor object in psycopg2 to provide the prepared statement manually.
I would like to get an authoritative answer to the following questions:
Is there a meaningful difference between prepared statement and parameterized query? Does this matter in practice? If you use parameterized queries, do you need to worry about prepared statements?
If there is a difference, what is the current status of prepared statements in the Python ecosystem? Which database adapters support prepared statements?

Prepared statement: A reference to a pre-interpreted query routine on the database, ready to accept parameters
Parametrized query: A query made by your code in such a way that you are passing values in alongside some SQL that has placeholder values, usually ? or %s or something of that flavor.
The confusion here seems to stem from the (apparent) lack of distinction between the ability to directly get a prepared statement object and the ability to pass values into a 'parametrized query' method that acts very much like one... because it is one, or at least makes one for you.
For example: the C interface of the SQLite3 library has a lot of tools for working with prepared statement objects, but the Python api makes almost no mention of them. You can't prepare a statement and use it multiple times whenever you want. Instead, you can use sqlite3.executemany(sql, params) which takes the SQL code, creates a prepared statement internally, then uses that statement in a loop to process each of your parameter tuples in the iterable you gave.
Many other SQL libraries in Python behave the same way. Working with prepared statement objects can be a real pain, and can lead to ambiguity, and in a language like Python which has such a lean towards clarity and ease over raw execution speed they aren't really the greatest option. Essentially, if you find yourself having to make hundreds of thousands or millions of calls to a complex SQL query that gets re-interpreted every time, you should probably be doing things differently. Regardless, sometimes people wish they could have direct access to these objects because if you keep the same prepared statement around the database server won't have to keep interpreting the same SQL code over and over; most of the time this will be approaching the problem from the wrong direction and you will get much greater savings elsewhere or by restructuring your code.*
Perhaps more importantly in general is the way that prepared statements and parametrized queries keep your data sanitary and separate from your SQL code. This is vastly preferable to string formatting! You should think of parametrized queries and prepared statements, in one form or another, as the only way to pass variable data from your application into the database. If you try to build the SQL statement otherwise, it will not only run significantly slower but you will be vulnerable to other problems.
*e.g., by producing the data that is to be fed into the DB in a generator function then using executemany() to insert it all at once from the generator, rather than calling execute() each time you loop.
tl;dr
A parametrized query is a single operation which generates a prepared statement internally, then passes in your parameters and executes.
edit: A lot of people see this answer! I want to also clarify that many database engines also have concepts of a prepared statement that can be constructed explicitly with plaintext query syntax, then reused over the lifetime of a client's session (in postgres for example). Sometimes you have control over whether the query plan is cached to save even more time. Some frameworks use these automatically (I've seen rails' ORM do it aggressively), sometimes usefully and sometimes to their detriment when there are permutations of form for the queries being prepared.
Also if you want to nit pick, parametrized queries do not always use a prepared statement under the hood; they should do so if possible, but sometimes it's just formatting in the parameter values. The real difference between 'prepared statement' and 'parametrized query' here is really just the shape of the API you use.

First, your questions shows very good preparation - well done.
I am not sure, if I am the person to provide authoritative answer, but I will try to explain my
understanding of the situation.
Prepared statement is an object, created on side of database server as a result of PREPARE
statement, turning provided SQL statement into sort of temporary procedure with parameters. Prepared
statement has lifetime of current database session and are discarded after the session is over.
SQL statement DEALOCATE allows destroying the prepared statement explicitly.
Database clients can use SQL statement EXECUTE to execute the prepared statement by calling it's
name and parameters.
Parametrized statement is alias for prepared statement as usually, the prepared statement has
some parameters.
Parametrized query seems to be less often used alias for the same (24 mil Google hits for
parametrized statement, 14 mil hits for parametrized query). It is possible, that some people use
this term for another purpose.
Advantages of prepared statements are:
faster execution of actual prepared statement call (not counting the time for PREPARE)
resistency to SQL injection attack
Players in executing SQL query
Real application will probably have following participants:
application code
ORM package (e.g. sqlalchemy)
database driver
database server
From application point of view it is not easy to know, if the code will really use prepared
statement on database server or not as any of participants may lack support of prepared
statements.
Conclusions
In application code prevent direct shaping of SQL query as it is prone to SQL injection attack. For
this reason it is recommended using whatever the ORM provides to parametrized query even if it does
not result on using prepared statements on database server side as the ORM code can be optimized to
prevent this sort of attack.
Decide, if prepared statement is worth for performance reasons. If you have simple SQL query,
which is executed only few times, it will not help, sometime it will even slow down the execution a
bit.
For complex query being executed many times and having relatively short execution time will be
the effect the biggest. In such a case, you may follow these steps:
check, that the database you are going to use supports the PREPARE statement. In most cases it
will be present.
check, that the drive you use is supporting prepared statements and if not, try to find another
one supporting it.
Check support of this feature on ORM package level. Sometime it vary driver by driver (e.g.
sqlalchemy states some limitations on prepared statements with MySQL due to how MySQL manages
that).
If you are in search for real authoritative answer, I would head to authors of sqlalchemy.

An sql statement can't be execute immediately: the DBMS must interpret them before the execution.
Prepared statements are statement already interpreted, the DBMS change parameters and the query starts immediately. This is a feature of certain DBMS and you can achieve fast response (comparable with stored procedures).
Parametrized statement are just a way you compose the query string in your programming languages. Since it doesn't matter how sql string are formed, you have slower response by DBMS.
If you measure time executing 3-4 time the same query (select with different conditions) you will see the same time with parametrized queries, the time is smaller from the second execution of prepared statement (the first time the DBMS has to interpret the script anyway).

I think the comment about using executemany fails to deal with the case where an application stores data into the database OVER TIME and wants each insert statement to be as efficient as possible. Thus the desire to prepare the insert statement once and re-use the prepared statement.
Alternatively, one could put the desired statement(s) in a stored proc and re-use it.

Get first AND last element with SQLAlchemy

In my Python (Flask) code, I need to get the first element and the last one sorted by a given variable from a SQLAlchemy query.
I first wrote the following code :
first_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)).first()
# Do some things
last_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.asc(Valuation.date)).first()
# Do other things
As these queries can be heavy for the PostgreSQL database, and as I am duplicating my code, I think it will be better to use only one request, but I don't know SQLAlchemy enough to do it...
(When queries are effectively triggered, for example ?)
What is the best solution to this problem ?

1) How to get First and Last record from a sql query? this is about how to get first and last records in one query.
2) Here are docs on sqlalchemy query. Specifically pay attention to union_all (to implement answers from above).
It also has info on when queries are triggered (basically, queries are triggered when you use methods, that returns results, like first() or all(). That means, Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)) will not emit query to database).
Also, if memory is not a problem, I'd say get all() objects from your first query and just get first and last result via python:
results = Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)).all()
first_valuation = results[0]
last_valuation = results[-1]
It will be faster than performing two (even unified) queries, but will potentially eat a lot of memory, if your database is large enough.

No need to complicate the process so much.
first_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)).first()
# Do some things
last_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.asc(Valuation.date)).first()
This is what you've and it's good enough. It's not heavy for any database. If you think that it's becoming too heavy, then you can always use some index.
Don't try to get all the results using all() and retrieving from it in list style. When you do, all() it loads everything into the memory which is extremely and extremely bad if you have a lot of results. It's a lot better to execute just two queries to get those items.

sqlalchemy change with executemany from version 0.5 to 0.7

I'm attempting to collect up a list of dictionaries, and do a bulk insert into a mysql db with sqlalchemy.
According to these docs for version 0.5, you do this with an executemany function call off of a connection object. This is the only place where I've been able to find that executemany exists.
However, in these docs for 0.7, I find that even though executemany is referenced, they do not use it in the code snippet, and in fact, it no longer exists in the connection class.
It seems that the two functions were combined, but if so, how is the connection.execute method different from the session.execute method? It seems in the docs that session.execute does not support bulk inserts, so how would one go about inserting several thousand dictionaries into a single table?

I think you're misreading the 0.5 link, the example you're pointing to still uses "execute()". SQLAlchemy has never exposed an explicit executemany() method. executemany() is specifically a function of the underlying DBAPI, which SQLAlchemy will make use of if the given parameter set is detected as a list of parameters.
session.execute() supports the same functionality as connection.execute(), except the parameter list is given using the named argument "params". The docstring isn't explicit about this which should likely be adjusted.
You can also get a transaction-specific Connection object from the Session using the session.connection() method.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.