SQLAlchemy how LazyLoading works

SQLAlchemy how LazyLoading works - python

Hi I would like to understand how does sqlalchemy lazy loading works? Assuming I have this query
results = (
session.query(Parent).
options(lazyload(Parent.children)).
filter(Parent.id == 1).
all()
)
for parent in results:
logging.error(parent.children)
I want to know if I access the parent.children on the for loop will this create a new select statement? or is the record or parent.children already cached or something? I'm thinking of how this will affect the performance. I just want to most optimize way.
Should I use lazyloading?
Will accessing per item on the loop create a new sqlalchemy
How do I find out if a query is being run by sqlalchemy? (Just want to find out if accessing per entry will create a select statement

Maybe.
Do you mean issue a new query? The answer is yes, that's the point of lazyload(). The relationship collection attribute is populated when first accessed, lazily. If on the other hand you'd wish to avoid the possible N+1 situation, you could for example use joinedload() instead in order to populate children in the same query.
Use logging. Pass echo=True in your engine configuration.

Related

Flask and SQLAlchemy sort in display without new query?

I'm displaying the results from an SQLAlchemy (Flask-SQLAlchemy) query on a particular view. However the sorting/order is only set by what I originally passed into the query ( order_by(desc(SelectedTable.date_changed)) ). I'm trying to now add functionality that each column that is displayed can be selected to order the presentation.
Is there a way to alter the way a returned query object is sorted once it's returned to create this behavior? Or will I need to build custom queries for each possible column that could be sorted by and ascending/descending?
Is there a recipe for implementing something like this? I've tried google, here, the Flask, Flask-SQLAlchemy, and SQLAlchemy docs for something along these lines but haven't seen anything that touches on the subject and beginning to think that I'm going to need to use custom queries or without new queries try some JavaScript in the Jinja Template to achieve this.
Thanks!

flask-sqlalchemy delete query failing with "Could not evaluate current criteria in Python"

I have a query using flask-sqlalchemy in which I want to delete all the stocks from the database where there ticker matches one in a list. This is the current query I have:
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete()
Where new_tickers is a list of str of valid tickers.
The error I am getting is the following:
sqlalchemy.exc.InvalidRequestError: Could not evaluate current criteria in Python: "Cannot evaluate clauselist with operator <function comma_op at 0x1104e4730>". Specify 'fetch' or False for the synchronize_session parameter.

You need to use one of options for bulk delete
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session=False)
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session='evaluate')
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session='fetch')
Basically, SQLAlchemy maintains the session in Python as you issue various SQLAlchemy methods. When you delete entries, how will SQLAlchemy remove any removed rows from the session? This is controlled by a parameter to the delete method, "synchronize_session". synchronize_session has three possible:
'evaluate': it evaluates the produced query directly in Python to determine the objects that need to be removed from the session. This is the default and is very efficient, but is not very robust and complicated queries cannot be be evaluated. If it can't evaluate the query, it raises the sqlalchemy.orm.evaluator.UnevaluatableError condition
'fetch': this performs a select query before the delete and uses that result to determine which objects in the session need to be removed. This is less efficient (potential much less efficient) but will be able to handle any valid query
False: this doesn't attempt to update the session, so it's very efficient, however if you continue to use the session after the delete you may get inaccurate results.
Which option you use is very dependent on how your code uses the session. In most simple queries where you just need to delete rows based on a complicated query, False should work fine. (the example in the question fits this scenario)
SQLAlchemy Delete Method Reference

Try it with this code:
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session=False)
https://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=delete#sqlalchemy.orm.query.Query.delete

get_or_create in Peewee

The paragraph titled Get or create on the peewee documentation says:
While peewee has a get_or_create() method, this should really not be
used outside of tests as it is vulnerable to a race condition. The
proper way to perform a get or create with peewee is to rely on the
database to enforce a constraint.
And then it goes on with an example that only shows the create part, not the get part.
What is the best way to perform a get or create with peewee?

Everything you are doing inside a transaction is atomic.
So as long as you are calling get_or_create() inside a transaction, that paragraph is wrong.

getting the id of a created record in SQLAlchemy

How can I get the id of the created record in SQLAlchemy?
I'm doing:
engine.execute("insert into users values (1,'john')")

When you execute a plain text statement, you're at the mercy of the DBAPI you're using as to whether or not the new PK value is available and via what means. With SQlite and MySQL DBAPIs you'll have it as result.lastrowid, which just gives you the value of .lastrowid for the cursor. With PG, Oracle, etc., there's no ".lastrowid" - as someone else said you can use "RETURNING" for those in which case results are available via result.fetchone() (although using RETURNING with oracle, again not taking advantage of SQLAlchemy expression constructs, requires several awkward steps), or if RETURNING isn't available you can use direct sequence access (NEXTVAL in pg), or a "post fetch" operation (CURRVAL in PG, ##identity or scope_identity() in MSSQL).
Sounds complicated right ? That's why you're better off using table.insert(). SQLAlchemy's primary system of providing newly generated PKs is designed to work with these constructs. One you're there, the result.last_inserted_ids() method gives you the newly generated (possibly composite) PK in all cases, regardless of backend. The above methods of .lastrowid, sequence execution, RETURNING etc. are all dealt with for you (0.6 uses RETURNING when available).

There's an extra clause you can add: RETURNING
ie
INSERT INTO users (name, address) VALUES ('richo', 'beaconsfield') RETURNING id
Then just retrieve a row like your insert was a SELECT statement.

Django objects.filter, how "expensive" would this be?

I am trying to make a search view in Django. It is a search form with freetext input + some options to select, so that you can filter on years and so on. This is some of the code I have in the view so far, the part that does the filtering. And I would like some input on how expensive this would be on the database server.
soknad_list = Soknad.objects.all()
if var1:
soknad_list = soknad_list.filter(pub_date__year=var1)
if var2:
soknad_list = soknad_list.filter(muncipality__name__exact=var2)
if var3:
soknad_list = soknad_list.filter(genre__name__exact=var3)
# TEXT SEARCH
stop_word_list = re.compile(STOP_WORDS, re.IGNORECASE)
search_term = '%s' % request.GET['q']
cleaned_search_term = stop_word_list.sub('', search_term)
cleaned_search_term = cleaned_search_term.strip()
if len(cleaned_search_term) != 0:
soknad_list = soknad_list.filter(Q(dream__icontains=cleaned_search_term) | Q(tags__icontains=cleaned_search_term) | Q(name__icontains=cleaned_search_term) | Q(school__name__icontains=cleaned_search_term))
So what I do is, first make a list of all objects, then I check which variables exists (I fetch these with GET on an earlier point) and then I filter the results if they exists. But this doesn't seem too elegant, it probably does a lot of queries to achieve the result, so is there a better way to this?
It does exactly what I want, but I guess there is a better/smarter way to do this. Any ideas?

filter itself doesn't execute a query, no query is executed until you explicitly fetch items from query (e.g. get), and list( query ) also executes it.

You can see the query that will be generated by using:
soknad_list.query.as_sql()[0]
You can then put that into your database shell to see how long the query takes, or use EXPLAIN (if your database backend supports it) to see how expensive it is.

As Aaron mentioned, you should get a hold of the query text that is going to be run against the database and use an EXPLAIN (or other some method) to view the query execution plan. Once you have a hold of the execution plan for the query you can see what is going on in the database itself. There are a lot of operations that see very expensive to run through procedural code that are very trivial for any database to run, especially if you provide indexes that the database can use for speeding up your query.
If I read your question correctly, you're retrieving a result set of all rows in the Soknad table. Once you have these results back you use the filter() method to trim down your results meet your criteria. From looking at the Django documentation, it looks like this will do an in-memory filter rather than re-query the database (of course, this really depends on which data access layer you're using and not on Django itself).
The most optimal solution would be to use a full-text search engine (Lucene, ferret, etc) to handle this for you. If that is not available or practical the next best option would be to to construct a query predicate (WHERE clause) before issuing your query to the database and let the database perform the filtering.
However, as with all things that involve the database, the real answer is 'it depends.' The best suggestion is to try out several different approaches using data that is close to production and benchmark them over at least 3 iterations before settling on a final solution to the problem. It may be just as fast, or even faster, to filter in memory rather than filter in the database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.