SQLAlchemy: use related object when session is closed - python

I have many models with relational links to each other which I have to use. My code is very complicated so I cannot keep session alive after a query. Instead, I try to preload all the objects:
def db_get_structure():
with Session(my_engine) as session:
deps = {x.id: x for x in session.query(Department).all()}
...
return (deps, ...)
def some_logic(id):
struct = db_get_structure()
return some_other_logic(struct.deps[id].owner)
However, I get the following error anyway regardless of the fact that all the objects are already loaded:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Department at 0x10476e780> is not bound to a Session; lazy load operation of attribute 'owner' cannot proceed
Is it possible to link preloaded objects with each other so that the relations will work after session get closed?
I know about joined queries (.options(joinedload(), but this approach leads to more code lines and bigger DB request, and I think this should be solved simpler, because all the objects are already loaded into Python objects.
It's even possible now to request the related objects like struct.deps[struct.deps[id].owner_id], but I think the ORM should do this and provide shorter notation struct.deps[id].owner using some "cached load".

Whenever you access an attribute on a DB entity that has not yet been loaded from the DB, SQLAlchemy will issue an implicit SQL statement to the DB to fetch that data. My guess is that this is what happens when you issue struct.deps[struct.deps[id].owner_id].
If the object in question has been removed from the session it is in a "detached" state and SQLAlchemy protects you from accidentally running into inconsistent data. In order to work with that object again it needs to be "re-attached".
I've done this already fairly often with session.merge:
attached_object = new_session.merge(detached_object)
But this will reconile the object instance with the DB and potentially issue updates to the DB if necessary. The detached_object is taken as "truth".
I believe you can do the reverse (attaching it by reading from the DB instead of writing to it) by using session.refresh(detached_object), but I need to verify this. I'll update the post if I found something.
Both ways have to talk to the DB with at least a select to ensure the data is consistent.
In order to avoid loading, issue session.merge(..., load=False). But this has some very important cavetas. Have a look at the docs of session.merge() for details.
I will need to read up on your link you added concerning your "complicated code". I would like to understand why you need to throw away your session the way you do it. Maybe there is an easier way?

Related

Problem in SQLAlchemy adding an object and returning value [duplicate]

Does updating an object after issuing session.commit() works correctly? Or do I need to refresh the object?
I think this is sufficient for the question, but I may provide more information to clear my question if needed.
Edit:
By updating, I meant setting some attribute i.e. column values of the object.
Short answer: No, you do not need to refresh manually, sqlalchemy will do it for you.
It is useful to know when it happens, so below is short overview.
From documentation of Session.commit():
By default, the Session also expires all database loaded state on all
ORM-managed attributes after transaction commit. This so that
subsequent operations load the most recent data from the database.
This behavior can be disabled using the expire_on_commit=False option
to sessionmaker or the Session constructor.
Basically, given you did not set expire_on_commit=False, object will be refreshed automatically as soon as you try accessing (reading, not setting) its attributes after session.commit().
my_obj = session.query(MyType).get(1)
my_obj.field1 = 'value1'
session.commit() # will commit and expire my_obj
my_obj.field1 = 'new value' # object is still the same, but field is updated
print my_obje.field1 # at this point SA will first refresh the object from the database; and make sure that new values for changed fields are applied
In fact, if you enable logging, you will see that sqlalchemy emits new SELECT statements as soon as you access (read) persistant instances' attributes.

Mongoengine - Can I get the same document object instance from two subsequent queries?

This is the use case: I have a server that receives instructions from many clients. Each client instructions are handled by its own Session object, who holds all the information about the state of the session and queries mongoengine for the data it needs.
Now, suppose session1 queries mongoengine and gets document "A" as a document object.
Later, session2 also queries and gets document "A", as another separate document object.
Now we have 2 document objects representing document "A", and to get them consistent I need to call A.update() and A.reload() all the time, which seems unnecessary.
Is there any way I can get a reference to the same document object over the two queries? This way both sessions could make changes to the document object and those changes would be seen by the other sessions, since they would be made to the same python object.
I've thought about making a wrapper for mongoengine that caches the documents that we have as document objects at runtime and ensures there are no multiple objects for the same document at any given time. But my knowledge of mongoengine is too rudimentary to do it at the time.
Any thoughts on this? Is my entire design flawed? Is there any easy solution?
I don't think going in that direction is a good idea. From what I understand you are in a web application context, you might be able to get something working for threads within a single process but you won't be able to share instances across different processes (and it gets even worse if you have processes running on different machines).
One way to address this is to use optimistic concurrency validation, you basically maintain a field like "version-identifier" that gets updated whenever the instance is updated and whenever you save/update the object, you run a query like "update object if version-identifier=... else you fail"
This means that if there are concurrent requests, 1 of them will succeed (first one to be flusged), the other one will fail because the version-identifier that they have is outdated. MongoEngine has no built in support for that but more info can be found here https://github.com/MongoEngine/mongoengine/issues/1563

Django model isn't persisting data to DB on real-time

I'm using Django Python framework, and MySQL DBMS.
In the screenshot below, I'm creating the new_survey_draft object using the SurveyDraft.objects.create() as shown, assuming that it should create a new row in the surveydraft DB table, but as also shown in the screenshot, and after debugging my code, the new_survey_draft object was created with id=pk=270 , while the DB table shown in the other window to the right doesn't have the new row with the id=270.
Even when setting a break point in the publish_survey_draft() called after the object instantiation, I called the SurveyDraft.objects.get(pk=270) which returned the object, but still there is not id=270 in the DB table.
And finally, after resuming the code and returning from all definitions, the row was successfully added to the DB table with the id=270.
I'm wondering what's happening behind the seen, and is it possible that Django stores data in objects without persisting to DB on real-time, and only persists the data all together on some later execution point?
I've been stuck in this for hours and couldn't find anything helpful online, so I really appreciate any advice regarding the issue.
After digging deep into this issue, I just found that there is a concept called Atomic Requests that's enabled in my Django project by setting the ATOMIC_REQUESTS to True in the settings.py under the DATABASES dictionary as explained here
It works like this. Before calling a view function, Django starts a
transaction. If the response is produced without problems, Django
commits the transaction. If the view produces an exception, Django
rolls back the transaction.
That's why the changes were not persisting in the database while debugging my code using break points, since the changes will only be committed to the DB once the successful response is returned.

Intercept all queries on a model in SQLAlchemy

I need to intercept all queries that concern a model in SQLAlchemy, in a way that I can inspect it at the point where any of the query methods (all(), one(), scalar(), etc.) is executed.
I have thought about the following approaches:
1. Subclass the Query class
I could subclass sqlalchemy.orm.Query and override the execution code, starting basically from something like this.
However, I am writing a library that can be used in other SQLAlchemy applications, and thus the creation of the declarative base, let alone engines and sessions, is outside my scope.
Maybe I have missed something and it is possible to override the Query class for my models without knowledge of the session?
2. Use the before_execute Core Event
I have also thought of hooking into execution with the before_execute event.
The problem is thatit is bound to an engine (see above). Also, I need to modify objects in the session, and I got the impression that I do not have access to a session from within this event.
What I want to be able to do is something like:
session.query(MyModel).filter_by(foo="bar").all() is executed.
Intercept that query and do something like storing the query in a log table within the same database (not literally that, but a set of different things that basically need the exact same functionality as this example operation)
Let the query execute like normal.
What I am trying to do in the end is inject items from another data store into the SQLAlchemy database on-the-fly upon querying. While this seems stupid - trust me, it might be less stupid than it sounds (or even more stupid) ;).
The before_compile query event might be useful for you.
from weakref import WeakSet
from sqlalchemy import event
from sqlalchemy.orm import Query
visited_queries = WeakSet()
#event.listens_for(Query, 'before_compile')
def log_query(query):
# You can get the session
session = query.session
# Prevent recursion if you want to compile the query to log it!
if query not in visited_queries:
visited_queries.add(query)
# do something with query.statement
You can look at query.column_descriptions to see if your model is being queried.

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

Categories