Are query objects in SqlAlchemy immutable? - python

Let's say I got a query like this ...
baseQuery = MyDbObj.query.filter_by(someProp='foo')
If, at a later point, I extend that query with something else (let's say, another filter) ...
derivedQuery = baseQuery.filter_by(anotherProp='bar')
will this result in the original query be modified, internally, or is a new query instance created?
Background: My use case is that I got multiple cases that only differ in one filter. Right now there is a ton of copy pasted query code (not my fault, I inherited this codebase) which I am cleaning up. For the cases where only one query is ultimately executed, I don't care if the original query gets modified. However I also have cases where two queries are executed, so here it matters that I can extend two queries from a base-query without them interfering with each other.
Though maybe a solution here could be to do that filtering in python itself, and not making two queries against the DB in the first place (I will keep that as a 2nd option).

SQLAlchemy creates a copy when filtering. So when you do
derivedQuery = baseQuery.filter_by(anotherProp='bar')
then derivedQuery is a copy of baseQuery with the filter applied. See the docs for more details.

Related

How can I prefetch_related() everything related to an object?

I'm trying to export all data connected to an User instance to CSV file. In order to do so, I need to get it from the DB first. Using something like
data = SomeModel.objects.filter(owner=user)
on every model possible seems to be very inefficient, so I want to use prefetch_related(). My question is, is there any way to prefetch all different model's instances with FK pointing at my User, at once?
Actually, you don't need to "prefetch everything" in order to create a CSV file – or, anything else – and you really don't want to. Python's CSV support is of course designed to work "row by row," and that's what you want to do here: in a loop, read one row at a time from the database and write it one row at a time to the file.
Remember that Django is lazy. Functions like filter() specify what the filtration is going to be, but things really don't start happening until you start to iterate over the actual collection. That's when Django will build the query, submit it to the SQL engine, and start retrieving the data that's returned ... one row at a time.
Let the SQL engine, Python and the operating system take care of "efficiency." They're really good at that sort of thing.

SQLAlchemy: if just using core, and not the ORM, can I get a different class back than RowProxy?

This really boils down to questions like Can I assign values in RowProxy using the sqlalchemy? and their recommendation of just casting results to dict.
I want to assign values on the result rows and I am mostly using SQLAlchemy, with raw sql, for its brilliant multi-RDBMS support (with some use of introspection too).
Basically, a lot of my selects look like
results = connection.execute("select foo from bar where zoom = ?", binds)
results = [dict(row) for row in results]
But it would be even better if I could just specify a different result class, like plain old dict, either when initiating the connection, or on execute. Probably faster too, as well as more convenient.
I took a look at https://docs.sqlalchemy.org/en/latest/core/engines.html and at the source of sqlalchemy.engine.result and it looks like _process_row on the ResultProxy classes is where RowProxy gets registered. But I saw no way to modify that through the API.
I realize this an edge case compared to ORM use, but I really can't use ORM in my case. It's perfectly fine if it's not possible, just don't want to overlook something already included.

sqlalchemy automatically extend query or update or insert upon table definition

in my app I have a mixin that defines 2 fields like start_date and end_date. I've added this mixin to all table declarations which require these fields.
I've also defined a function that returns filters (conditions) to test a timestamp (e.g. now) to be >= start_date and < end_date. Currently I'm manually adding these filters whenever I need to query a table with these fields.
However sometimes me or my colleagues forget to add the filters, and I wonder whether it is possible to automatically extend any query on such a table. Like e.g. an additional function in the mixin that is invoked by SQLalchemy whenever it "compiles" the statement. I'm using 'compile' only as an example here, actually I don't know when or how to best do that.
Any idea how to achieve this?
In case it works for SELECT, does it also work for INSERT and UPDATE?
thanks a lot for your help
Juergen
Take a look at this example. You can change the criteria expressed in the private method to refer to your start and end dates.
Note that this query will be less efficient because it overrides the get method to bypass the identity map.
I'm not sure what the enable_assertions false call does; I'd recommend understanding that before proceeding.
I tried extending Query but had a hard time. Eventually (and unfortunately) I moved back to my previous approach of little helper functions returning filters and applying them to queries.
I still wish I would find an approach that automatically adds certain filters if a table (Base) has certain columns.
Juergen

Get first AND last element with SQLAlchemy

In my Python (Flask) code, I need to get the first element and the last one sorted by a given variable from a SQLAlchemy query.
I first wrote the following code :
first_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)).first()
# Do some things
last_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.asc(Valuation.date)).first()
# Do other things
As these queries can be heavy for the PostgreSQL database, and as I am duplicating my code, I think it will be better to use only one request, but I don't know SQLAlchemy enough to do it...
(When queries are effectively triggered, for example ?)
What is the best solution to this problem ?
1) How to get First and Last record from a sql query? this is about how to get first and last records in one query.
2) Here are docs on sqlalchemy query. Specifically pay attention to union_all (to implement answers from above).
It also has info on when queries are triggered (basically, queries are triggered when you use methods, that returns results, like first() or all(). That means, Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)) will not emit query to database).
Also, if memory is not a problem, I'd say get all() objects from your first query and just get first and last result via python:
results = Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)).all()
first_valuation = results[0]
last_valuation = results[-1]
It will be faster than performing two (even unified) queries, but will potentially eat a lot of memory, if your database is large enough.
No need to complicate the process so much.
first_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.desc(Valuation.date)).first()
# Do some things
last_valuation = Valuation.query.filter_by(..).order_by(sqlalchemy.asc(Valuation.date)).first()
This is what you've and it's good enough. It's not heavy for any database. If you think that it's becoming too heavy, then you can always use some index.
Don't try to get all the results using all() and retrieving from it in list style. When you do, all() it loads everything into the memory which is extremely and extremely bad if you have a lot of results. It's a lot better to execute just two queries to get those items.

Model.objects.only("columnname") doesn't work. It shows me everything

I have a model called Theme. It has a lot of columns, but I need to retrieve only the field called "name", so I did this:
Theme.objects.only("name")
But it doesn't work, it is still retrieving all the columns.
PD: I don't want to use values() because it returns only a python dictionary. I need to return a set of model instances, to access to its attributes and methods.
Using only or its counterpart defer does not prevent accessing the deferred attributes. It only delays retrieval of said attributes until they are accessed. So take the following:
for theme in Theme.objects.all():
print theme.name
print theme.other_attribute
This will execute a single query when the loop starts. Now consider the following:
for theme in Theme.objects.only('name'):
print theme.name
print theme.other_attribute
In this case, the other_attribute is not loaded in the initial query at the start of the loop. However, it is added to the model's list of deferred attributes. When you try to access it, another query is executed to retrieve the value of other_attribute. In the second case, a total of n+1 queries is executed for n Theme objects.
The only and defer methods should only ever be used in advanced use-cases, after the need for optimization arises, and after proper analysing of your code. Even then, there are often workarounds that work better than deferring fields. Please read the note at the bottom of the defer documentation.
If what you want is a single column, I think what you are looking for is .values() instead of .only.

Categories