I'm attempting to collect up a list of dictionaries, and do a bulk insert into a mysql db with sqlalchemy.
According to these docs for version 0.5, you do this with an executemany function call off of a connection object. This is the only place where I've been able to find that executemany exists.
However, in these docs for 0.7, I find that even though executemany is referenced, they do not use it in the code snippet, and in fact, it no longer exists in the connection class.
It seems that the two functions were combined, but if so, how is the connection.execute method different from the session.execute method? It seems in the docs that session.execute does not support bulk inserts, so how would one go about inserting several thousand dictionaries into a single table?
I think you're misreading the 0.5 link, the example you're pointing to still uses "execute()". SQLAlchemy has never exposed an explicit executemany() method. executemany() is specifically a function of the underlying DBAPI, which SQLAlchemy will make use of if the given parameter set is detected as a list of parameters.
session.execute() supports the same functionality as connection.execute(), except the parameter list is given using the named argument "params". The docstring isn't explicit about this which should likely be adjusted.
You can also get a transaction-specific Connection object from the Session using the session.connection() method.
Related
Let's say I got a query like this ...
baseQuery = MyDbObj.query.filter_by(someProp='foo')
If, at a later point, I extend that query with something else (let's say, another filter) ...
derivedQuery = baseQuery.filter_by(anotherProp='bar')
will this result in the original query be modified, internally, or is a new query instance created?
Background: My use case is that I got multiple cases that only differ in one filter. Right now there is a ton of copy pasted query code (not my fault, I inherited this codebase) which I am cleaning up. For the cases where only one query is ultimately executed, I don't care if the original query gets modified. However I also have cases where two queries are executed, so here it matters that I can extend two queries from a base-query without them interfering with each other.
Though maybe a solution here could be to do that filtering in python itself, and not making two queries against the DB in the first place (I will keep that as a 2nd option).
SQLAlchemy creates a copy when filtering. So when you do
derivedQuery = baseQuery.filter_by(anotherProp='bar')
then derivedQuery is a copy of baseQuery with the filter applied. See the docs for more details.
How do I get both the rowcount and the instances of an SQLAlchemy ORM Query in a single roundtrip to the database?
If I call query.all(), I get a standard list, which I can get the len() of, but that requires loading the whole resultset in memory at once.
If I call iter(query), I get back a standard python generator, with no access to the .rowcount of the underlying ResultProxy.
If I call query.count() then iter(query) I'm doing two roundtrips of a potentially expensive query to the database.
if I managed to get hold of a ResultProxy for a Query, that would give me the .rowcount, then I could use Query.instances() to get the same generator that Query.__iter__() would give me.
But is there a convenient way of getting at the ResultProxy of a Query other than repeating what Query.__iter__() and Query._execute_and_instances() do? Seems rather inconvenient.
Notice
As mentioned by This answer (thanks #terminus), getting the ResultProxy.rowcount might or might not be useful, and is explicitly warned against in SQLAlchemy documentation for pure "select" statements.
That said, in the case of psycopg2, the .rowcount of the underlying cursor is documented to return the correct number of records returned by any query, even "SELECT" queries, unless you're using the stream_results=True option (thanks #SuperShoot).
Before I can get to work with, say, the sqlite3 library in Python, I need to make a connection to the database and a cursor to the connection.
connection = sqlite3.connect(path)
cursor = connection.cursor()
Queries and result fetching are done with the cursor object, not the connection.
From what I've seen this is a standard set-up for Python SQL APIs. (Obviously this is not the usual "cursor" that points to a row in a result set.) In contrast, PHP's mysqli or PDO libraries put a query() method directly on the connection object.
So why the two-step arrangement in Python, and any other library that works this way? What is the use case for maintaining separate connection and cursor objects?
This is most likely just an arbitrary design decision. Most database APIs have some type of object that represents the results of a query, which you can then iterate through to get each of the rows. There are basically two ways you can do this:
Perform a query on the connection object, and it returns a new results object.
Create a results object, and then perform the query on this object to fill it in.
There isn't any significant difference between the two arrangements, and Python has chosen the second method (the results object is called cursor).
Perhaps the first method seems more logical, since most of the cursor methods (e.g. .fetchone()) aren't really useful until after you perform a query. On the other hand, this design separates the object that just represent the database connection from the object that represents all aspects of a specific query. The Python cursor class does have some methods that apply to a specific query and must be called before .execute(): .setinputsizes() and .setoutputsize() to pre-allocate buffers for the query.
Python is hardly unique in this style. Its cursor is not too different from the mysqli_stmt and PDOStatement classes of the modern PHP APIs. In PDO you don't get a statement until you call PDO::prepare(), but with mysqli you have a choice: you can call mysql::prepare() to get a statement, or can use mysqli::stmt_init() to get a fresh statement and then call prepare() and execute() on this object. This is quite similar to the Python style.
I have a big postgrSQL database. I would like to somehow store the full database in a python object, which form/structure would reflect the one of the database. Namely I imagine something like
* An object database, with an attribute .tables which a kind of list of object "table", and a table object has an attribute "list_of_keys" (list of the column names) and an attribute "rows", which reflects all the rows of the corresponding table in the database.
Now, the main point i need is: i want to be able to perform a search in the database object, with exactely the same SQL synthax that i would use in the corresponding SQL database. Thus something like
database.execute("SELECT * FROM .....")
where, i repeat, "database" is a purely python object (which was filled with data coming from an SQL database, but which is now independent of it).
My aim is: i want to be able to apply the same algorithm either on a SQL-Database, or on a python-object, such as described above, without changing my code. So, i imagine, let "database" be either a usual database-connector/cursor (like with psycopg, for example), or a python object as i described, and the same piece of code
database.execute("SELECT BLABLABLA")
would work in both cases.
Is there any known module which allows that ?
thanks.
It might get a bit complicated, but take a look at SQLite's in-memory storage:
import sqlite3
cnx = sqlite3.connect(':memory:')
cnx.execute('CREATE TABLE ...')
There are some differences in the SQL syntax, but the basic stuff works fine. This might also take a good amount of RAM, depending on your data.
I'm completely new to Python's sqlite3 module (and SQL in general for that matter), and this just completely stumps me. The abundant lack of descriptions of cursor objects (rather, their necessity) also seems odd.
This snippet of code is the preferred way of doing things:
import sqlite3
conn = sqlite3.connect("db.sqlite")
c = conn.cursor()
c.execute('''insert into table "users" values ("Jack Bauer", "555-555-5555")''')
conn.commit()
c.close()
This one isn't, even though it works just as well and without the (seemingly pointless) cursor:
import sqlite3
conn = sqlite3.connect("db.sqlite")
conn.execute('''insert into table "users" values ("Jack Bauer", "555-555-5555")''')
conn.commit()
Can anyone tell me why I need a cursor?
It just seems like pointless overhead. For every method in my script that accesses a database, I'm supposed to create and destroy a cursor?
Why not just use the connection object?
Just a misapplied abstraction it seems to me. A db cursor is an abstraction, meant for data set traversal.
From Wikipedia article on subject:
In computer science and technology, a database cursor is a control
structure that enables traversal over the records in a database.
Cursors facilitate subsequent processing in conjunction with the
traversal, such as retrieval, addition and removal of database
records. The database cursor characteristic of traversal makes cursors
akin to the programming language concept of iterator.
And:
Cursors can not only be used to fetch data from the DBMS into an
application but also to identify a row in a table to be updated or
deleted. The SQL:2003 standard defines positioned update and
positioned delete SQL statements for that purpose. Such statements do
not use a regular WHERE clause with predicates. Instead, a cursor
identifies the row. The cursor must be opened and already positioned
on a row by means of FETCH statement.
If you check the docs on Python sqlite module, you can see that a python module cursor is needed even for a CREATE TABLE statement, so it's used for cases where a mere connection object should suffice - as correctly pointed out by the OP. Such abstraction is different from what people understand a db cursor to be and hence, the confusion/frustration on the part of users. Regardless of efficiency, it's just a conceptual overhead. Would be nice if it was pointed out in the docs that the python module cursor is bit different than what a cursor is in SQL and databases.
According to the official docs connection.execute() is a nonstandard shortcut that creates an intermediate cursor object:
Connection.execute
This is a nonstandard shortcut that creates a cursor object by calling the cursor() method, calls the cursor’s execute() method with the parameters given, and returns the cursor.
You need a cursor object to fetch results. Your example works because it's an INSERT and thus you aren't trying to get any rows back from it, but if you look at the sqlite3 docs, you'll notice that there aren't any .fetchXXXX methods on connection objects, so if you tried to do a SELECT without a cursor, you'd have no way to get the resulting data.
Cursor objects allow you to keep track of which result set is which, since it's possible to run multiple queries before you're done fetching the results of the first.
12.6.8. Using sqlite3 efficiently
12.6.8.1. Using shortcut methods
Using the nonstandard execute(), executemany() and executescript() methods of the Connection object, your code can be written more concisely because you don’t have to create the (often superfluous) Cursor objects explicitly. Instead, the Cursor objects are created implicitly and these shortcut methods return the cursor objects. This way, you can execute a SELECT statement and iterate over it directly using only a single call on the Connection object.
(sqlite3 documentation; emphasis mine.)
Why not just use the connection object?
Because those methods of the connection object are nonstandard, i.e. they are not part of Python Database API Specification v2.0 (PEP 249).
As long as you use the standard methods of the Cursor object, you can be sure that if you switch to another database implementation that follows the above specification, your code will be fully portable. Perhaps you will only need to change the import line.
But if you use the connection.execute there is a chance that switching won't be that straightforward. That's the main reason you might want to use cursor.execute instead.
However if you are certain that you're not going to switch, I'd say it's completely OK to take the connection.execute shortcut and be "efficient".
It gives us the ability to have multiple separate working environments through the same connection to the database.