Why SQLAlchemy send extra SELECTs when accessing a persisted model property - python

Given a simple declarative based class;
class Entity(db.Model):
__tablename__ = 'brand'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255), nullable=False)
And the next script
entity = Entity()
entity.name = 'random name'
db.session.add(entity)
db.session.commit()
# Just by accessing the property name of the created object a
# SELECT statement is sent to the database.
print entity.name
When I enable echo mode in SQLAlchemy, I can see in the terminal the INSERT statement and an extra SELECT just when I access a property (column) of the model (table row).
If I don't access to any property, the query is not created.
What is the reason for that behavior? In this basic example, We already have the value of the name property assigned to the object. So, Why is needed an extra query? It to secure an up to date value, or something like that?

By default, SQLAlchemy expires objects in the session when you commit. This is controlled via the expire_on_commit parameter.
The reasoning behind this is that the row behind the instance could have been modified outside of the transaction, so if you are not careful you could run into data races, but if you know what you are doing you can safely turn it off.

Related

SQLAlchemy fails to delete object with foreign key relationship [duplicate]

My User model has a relationship to the Address model. I've specified that the relationship should cascade the delete operation. However, when I query and delete a user, I get an error that the address row is still referenced. How do I delete the user and the addresses?
class User(db.Model):
id = db.Column(db.Integer, primary_key=True)
addresses = db.relationship('Address', cascade='all,delete', backref='user')
class Address(db.Model):
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey(User.id))
db.session.query(User).filter(User.my_id==1).delete()
IntegrityError: (IntegrityError) update or delete on table "user" violates foreign key constraint "addresses_user_id_fkey" on table "address"
DETAIL: Key (my_id)=(1) is still referenced from table "address".
'DELETE FROM "user" WHERE "user".id = %(id_1)s' {'id_1': 1}
You have the following...
db.session.query(User).filter(User.my_id==1).delete()
Note that after "filter", you are still returned a Query object. Therefore, when you call delete(), you are calling delete() on the Query object (not the User object). This means you are doing a bulk delete (albeit probably with just a single row being deleted)
The documentation for the Query.delete() method that you are using says...
The method does not offer in-Python cascading of relationships - it is
assumed that ON DELETE CASCADE/SET NULL/etc. is configured for any
foreign key references which require it, otherwise the database may
emit an integrity violation if foreign key references are being
enforced.
As it says, running delete in this manner will ignore the Python cascade rules that you've set up. You probably wanted to do something like..
user = db.session.query(User).filter(User.my_id==1).first()
db.session.delete(user)
Otherwise, you may wish to look at setting up the cascade for your database as well.

Bug in SQLAlchemy Rollback after DB Exception?

My SQLAlchemy application (running on top of MariaDB) includes two models MyModelA and MyModelB where the latter is a child-record of the former:
class MyModelA(db.Model):
a_id = db.Column(db.Integer, nullable=False, primary_key=True)
my_field1 = db.Column(db.String(1024), nullable=True)
class MyModelB(db.Model):
b_id = db.Column(db.Integer, nullable=False, primary_key=True)
a_id = db.Column(db.Integer, db.ForeignKey(MyModelA.a_id), nullable=False)
my_field2 = db.Column(db.String(1024), nullable=True)
These are the instances of MyModelA and MyModelB that I create:
>>> my_a = MyModelA(my_field1="A1")
>>> my_a.aid
1
>>> MyModelB(a_id=my_a.aid, my_field2="B1")
I have the following code that deletes the instance of MyModelA where a_id==1:
db.session.commit()
try:
my_a = MyModelA.query.get(a_id=1)
assert my_a is not None
print "#1) Number of MyModelAs: %s\n" % MyModelA.query.count()
db.session.delete(my_a)
db.session.commit()
except IntegrityError:
print "#2) Cannot delete instance of MyModelA because it has child record(s)!"
db.session.rollback()
print "#3) Number of MyModelAs: %s\n" % MyModelA.query.count()
When I run this code look at the unexpected results I get:
#1) Number of MyModelAs: 1
#2) Cannot delete instance of MyModelA because it has child record(s)!
#3) Number of MyModelAs: 0
The delete supposedly fails and the DB throws an exception which causes a rollback. However even after the rollback, the number of rows in the table indicates that the row -- which supposedly wasn't deleted -- is actually gone!!!
Why is this happening? How can I fix this? It seems like a bug in SQLAlchemy.
TL;DR
Your problem might be related to the lack of explicit relationship declaration.
For example, here there is a sample of objects' relationship. In addition to the usage of a ForeignKey field, the class explicitly uses the relationship directive to define that connection. In the session API documentation, the following text appears:
object references should be constructed at the object level, not at the foreign key level
Which might imply to the way of SQLAlchemy to manage relations. I am not deeply familiar with the underlying mechanisms, but it is possible that this is what happens. Your session only includes the MyModelA object. Since you did not use the relationship() directive in the definition of MyModelB, objects of MyModelA type are not aware to the fact that some other object might refer them through a ForeignKey. Hence, when the session is about to commit, it is not aware to the fact that deleting the object affects some other MyModelB object, and its transaction mechanism does not take that into account.
I suggest that adding the relationship explicitly might prevent that behavior.

SQLAlchemy Automatically Create Entry If Doesn't Exist As Foreign Key

I have a large number of .create() calls that rely on a ForeignKey in another table (Users). However, there is no point in the code where I actually create users.
Is there a way for there to be a Users entry created for each foreign key is specified on another table in SQLAlchemy?
For example:
class Rr(db.Model):
__tablename__ = 'rr'
id = db.Column(db.Integer, primary_key=True)
submitter = db.Column(db.String(50), db.ForeignKey('user.username'))
class User(db.Model):
__tablename__ = 'user'
username = db.Column(db.String, primary_key=True)
so If I call Rr(id, submitter=John) is there a way for a John entry to be created in the user table if it does not already exist?
I understand that I can create a wrapper around the .create() method such that it checks the submitter and creates one if it doesn't exist but this seems excess as there are a large number of models that want Users to be automatically created.
I can't think of any orm or sql implementation that does what you ask but there is something that effectively accomplishes what you seek to do described in this SO answer: Does SQLAlchemy have an equivalent of Django's get_or_create?
basically get the User from the db if it exists, if it doesn't create it.
The only down side to this method is that you would need to do 2 queries instead of one but I don't think there is a way to do what you seek in one query

Enforcing uniqueness using SQLAlchemy association proxies

I'm trying to use association proxies to make dealing with tag-style records a little simpler, but I'm running into a problem enforcing uniqueness and getting objects to reuse existing tags rather than always create new ones.
Here is a setup similar to what I have. The examples in the documentation have a few recipes for enforcing uniqueness, but they all rely on having access to a session and usually require a single global session, which I cannot do in my case.
from sqlalchemy import Column, Integer, String, create_engine, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.ext.associationproxy import association_proxy
Base = declarative_base()
engine = create_engine('sqlite://', echo=True)
Session = sessionmaker(bind=engine)
def _tag_find_or_create(name):
# can't use global objects here, may be multiple sessions and engines
# ?? No access to session here, how to do a query
tag = session.query(Tag).filter_by(name=name).first()
tag = Tag.query.filter_by(name=name).first()
if not tag:
tag = Tag(name=name)
return tag
class Item(Base)
__tablename__ = 'item'
id = Column(Integer, primary_key=True)
tags = relationship('Tag', secondary='itemtag')
tagnames = association_proxy('tags', 'name', creator=_tag_find_or_create)
class ItemTag(Base)
__tablename__ = 'itemtag'
id = Column(Integer, primary_key=True)
item_id = Column(Integer, ForeignKey('item.id'))
tag_id = Column(Integer, ForeignKey('tag.id'))
class Tag(Base)
__tablename__ = 'tag'
id = Column(Integer, primary_key=True)
name = Column(String(50), nullable=False)
# Scenario 1
session = Session()
item = Item()
session.add(item)
item.tagnames.append('red')
# Scenario 2
item2 = Item()
item2.tagnames.append('blue')
item2.tagnames.append('red')
session.add(item2)
Without the creator function, I just get tons of duplicate Tag items. The creator function seems like the most obvious place to put this type of check, but I'm unsure how to do a query from inside the creator function.
Consider the two scenarios provided at the bottom of the example. In the first example, it seems like there should be a way to get access to the session in the creator function, since the object the tags are being added to is already associated with a session.
In the second example, the Item object isn't yet associated with a session, so the validation check can't happen in the creator function. It would have to happen later when the object is actually added to a session.
For the first scenario, how would I go about getting access to the session object in the creator function?
For the second scenario, is there a way to "listen" for when the parent object is added to a session and validate the association proxies at that point?
For the first scenario, you can use object_session.
As for the question overall: true, you need access to the current session; if using scoped_session in your application is appropriate, then the second part of the Recipe you link to should work fine to use. See Contextual/Thread-local Sessions for more info.
Working with events and change objects when they change from transient to persistent state will not make your code pretty or very robust. So I would immediately add new Tag objects to the session, and if the transaction is rolled back, they would not be in the database.
Note that in a multi-user environment you are likely to have race condition: the same tag is new and created in simultaneously by two users. The user who commits last will fail (if you have a unique constraint on the database).
In this case you might consider be without the unique constraint, and have a (daily) procedure to clean those duplicates up (and reassign relations). With time there would be less and less new items, and less possibilities for such clashes.

How does Django foreign key access work

Say I have a model like this.
class Job(models.Model):
client = models.ForeignKey(Contacts, null=True)
and lets say I have job j. I know I can access the client belonging to j like this
j.client
but there is also
j.client_id
So my question is how does accessing j.client work?
Does django store client__id then when j.client is called it does a query to find the correct object?
Or is the object reference stored to j and accessing client__id is getting the id from the Contact object?
I've looked around the source code a bit but couldn't find the answer to my question
What you are probably talking about is client and client_id (single underscore).
The client_id attribute is a regular (integer) attribute. This is the foreign key that is saved to the database. You will only ever see a client_id column in the database, even though you specify the ForeignKey as client.
The client attribute is an object descriptor instance. It is a special class that overrides the __get__ and __set__ methods, so settings and accessing that attributes invokes that class's methods. This is the magic that gives you access to the actual related model instance. __get__ will retrieve the correct model instance from the database if it isn't loaded already, based on the client_id attribute. __set__ will also set the client_id attribute to the primary key of the related object, so that client_id is always up-to-date.
Note that this attribute is also available in query lookups, and is quite handy. E.g., if you have just the primary key of a foreign object, and not the model instance itself, the following queries look very similar:
job = Job.objects.filter(client__id=pk)
job = Job.objects.filter(client_id=pk)
However, underneath the first query accesses an attribute on the related object (double underscore) and performs an OUTER JOIN. The second query only ever accesses a local attribute, thus not having to perform the OUTER JOIN statement and saving performance.
This is explained in the docs:
https://docs.djangoproject.com/en/dev/ref/models/fields/#database-representation
In the database there is only client_id field (single underscore)
On the model instance you will have client attribute, when you access it this will cause Django to load the related object from the db and instantiate as another model instance.
You will also have client_id attribute (one underscore) which has the primary key value of the related object, as stored in the db field.
When doing ORM queries you are able to use client__id (double underscore) syntax to lookup against fields on the related model, eg you could also do client__name if Client model had a name field. This will become a SQL JOIN query across both models.
eg
Job.objects.get(client__id=1)
Job.objects.filter(client__name='John')
client = Client.objects.get(pk=1)
Job.objects.get(client=client)
j.client gives you the models.Model object. You can access it's properties like ...
client = j.client
id = client.id
name = client.name
But there should not be a j.client__id field. You should use j.client.id to get the id field. Although you can use j.client__id field to do filters and such.
So,
id = j.client.id # good
id = j.client__id # bad
and
job = Job.objects.get(client__id=1) # good
job = Job.objects.get(client.id=1) # bad

Categories