SQLAlchemy - Update ForeignKey when setting the relationship

SQLAlchemy - Update ForeignKey when setting the relationship - python

I have a class:
class ExampleClass(Base):
__tablename__ = 'chart'
id = Column(Integer, primary_key=True)
element_id = Column(Integer, ForeignKey('anotherTable.id'))
element = relationship(AnotherClass)
element2_id = Column(Integer, ForeignKey('anotherTable2.id'))
element2 = relationship(AnotherClass2)
I want to do a lookup based on the element_id and element2_id :
class ExampleClass(Base):
...
def get_with_element2(self, element2):
return session.query(ExampleClass).\
filter_by(element_id = self.element_id,
element2_id = element2.id).first()
The problem I find is that if I instantiate a new ExampleClass object and assign it an element, the element_id field is not being set:
a = ExampleClass(element=element_obj)
a.element_id => None
How can I solve this? What's the best way to deal with this kind of situation?

First off, all the examples below assume that your ExampleClass instance is at least in the pending state if not the "persistent" state (that is, session.add(a)). In other words, if you aren't yet interacting with a Session and have not added the ExampleClass object to one, then you won't get any of the database-level behavior of relationship(), of which maintaining foreign key column values is the primary feature. You are of course free to make this assignment directly:
a = ExampleClass(element_id=element_obj.id)
but this is obviously not making use of the automation provided by the relationship() construct.
The assignment of foreign key attributes by relationship() occurs during a flush, which is a process that only occurs when interaction with the database is necessary, such as before you emit a SQL statement using session.query() or before you complete your transaction using session.commit().
Generally, the philosophy of relationship() is that you'd deal only with the "element" and "element2" attributes here, and let the foreign key attributes be handled behind the scenes. You can write your query like this:
session.query(ExampleClass).\
filter_by(element=self.element).\
filter_by(element2=element2)
The ORM will take a comparison such as SomeClass.somerelationship=someobject and convert that into the foreign-key expression SomeClass.some_fk=some_id, but the difference is, the evaluation of the ultimate value of "some_id" is deferred until the right before the query is executed. Before the query is executed, the Query() object tells the Session to "autoflush", which will have the effect of your ExampleClass row being inserted along with the primary key identifier of element_obj being assigned to the element_id attribute on the ExampleClass object.
you could get a similar effect while still using the FK attributes like this, this is mostly just to understand how it works though:
session.query(ExampleClass).\
filter_by(element_id=bindparam(callable_=lambda: self.element_id)).\
filter_by(element2_id=element2.id)
or even more explicit, do the flush first:
session.flush()
session.query(ExampleClass).\
filter_by(element_id=self.element_id).\
filter_by(element2_id=element2.id)
So to the degree you'd want to refer to foreign-key attributes like element_id explicitly, you'd also need to do the things relationship() does for you explicitly, as well. If you deal strictly with object instances and the relationship()-bound attribute, and leave typical defaults like autoflush enabled, it will generally do the "right thing" and make sure attributes are ready when needed.

Related

How does Django foreign key access work

Say I have a model like this.
class Job(models.Model):
client = models.ForeignKey(Contacts, null=True)
and lets say I have job j. I know I can access the client belonging to j like this
j.client
but there is also
j.client_id
So my question is how does accessing j.client work?
Does django store client__id then when j.client is called it does a query to find the correct object?
Or is the object reference stored to j and accessing client__id is getting the id from the Contact object?
I've looked around the source code a bit but couldn't find the answer to my question

What you are probably talking about is client and client_id (single underscore).
The client_id attribute is a regular (integer) attribute. This is the foreign key that is saved to the database. You will only ever see a client_id column in the database, even though you specify the ForeignKey as client.
The client attribute is an object descriptor instance. It is a special class that overrides the __get__ and __set__ methods, so settings and accessing that attributes invokes that class's methods. This is the magic that gives you access to the actual related model instance. __get__ will retrieve the correct model instance from the database if it isn't loaded already, based on the client_id attribute. __set__ will also set the client_id attribute to the primary key of the related object, so that client_id is always up-to-date.
Note that this attribute is also available in query lookups, and is quite handy. E.g., if you have just the primary key of a foreign object, and not the model instance itself, the following queries look very similar:
job = Job.objects.filter(client__id=pk)
job = Job.objects.filter(client_id=pk)
However, underneath the first query accesses an attribute on the related object (double underscore) and performs an OUTER JOIN. The second query only ever accesses a local attribute, thus not having to perform the OUTER JOIN statement and saving performance.

This is explained in the docs:
https://docs.djangoproject.com/en/dev/ref/models/fields/#database-representation
In the database there is only client_id field (single underscore)
On the model instance you will have client attribute, when you access it this will cause Django to load the related object from the db and instantiate as another model instance.
You will also have client_id attribute (one underscore) which has the primary key value of the related object, as stored in the db field.
When doing ORM queries you are able to use client__id (double underscore) syntax to lookup against fields on the related model, eg you could also do client__name if Client model had a name field. This will become a SQL JOIN query across both models.
eg
Job.objects.get(client__id=1)
Job.objects.filter(client__name='John')
client = Client.objects.get(pk=1)
Job.objects.get(client=client)

j.client gives you the models.Model object. You can access it's properties like ...
client = j.client
id = client.id
name = client.name
But there should not be a j.client__id field. You should use j.client.id to get the id field. Although you can use j.client__id field to do filters and such.
So,
id = j.client.id # good
id = j.client__id # bad
and
job = Job.objects.get(client__id=1) # good
job = Job.objects.get(client.id=1) # bad

What's the difference between Model.query and session.query(Model) in SQLAlchemy?

I'm a beginner in SQLAlchemy and found query can be done in 2 method:
Approach 1:
DBSession = scoped_session(sessionmaker())
class _Base(object):
query = DBSession.query_property()
Base = declarative_base(cls=_Base)
class SomeModel(Base):
key = Column(Unicode, primary_key=True)
value = Column(Unicode)
# When querying
result = SomeModel.query.filter(...)
Approach 2
DBSession = scoped_session(sessionmaker())
Base = declarative_base()
class SomeModel(Base):
key = Column(Unicode, primary_key=True)
value = Column(Unicode)
# When querying
session = DBSession()
result = session.query(SomeModel).filter(...)
Is there any difference between them?

In the code above, there is no difference. This is because, in line 3 of the first example:
the query property is explicitly bound to DBSession
there is no custom Query object passed to query_property
As #petr-viktorin points out in the answer here, there must be a session available before you define your model in the first example, which might be problematic depending on the structure of your application.
If, however, you need a custom query that adds additional query parameters automatically to all queries, then only the first example will allow that. A custom query class that inherits from sqlalchemy.orm.query.Query can be passed as an argument to query_property. This question shows an example of that pattern.
Even if a model object has a custom query property defined on it, that property is not used when querying with session.query, as in the last line in the second example. This means something like the first example the only option if you need a custom query class.

I see these downsides to query_property:
You cannot use it on a different Session than the one you've configured (though you could always use session.query then).
You need a session object available before you define your schema.
These could bite you when you want to write tests, for example.
Also, session.query fits better with how SQLAlchemy works; query_property looks like it's just added on top for convenience (or similarity with other systems?).
I'd recommend you stick to session.query.

An answer (here) to a different SQLAlchemy question might help. That answer starts with:
You can use Model.query, because the Model (or usually its base class, especially in cases where declarative extension is used) is assigned Session.query_property. In this case the Model.query is equivalent to Session.query(Model).

Duplicating key names and parent as properties in Google App Engine (GAE) Datastore?

After reading about the GAE Datastore API, I am still unsure if I need to duplicate key names and parents as properties for an entity.
Let's say there are two kinds of entities: Employee and Division. Each employee has a division as its parent, and is identified by an account name. I use the account name as the key name for employees. But when modeling Employee, I would still keep these two as properties:
division = db.ReferenceProperty(Division)
account_name = db.StringProperty()
Obviously I have to manually keep division consistent with its parent, and account_name with its key name. The reasons I am doing this extra work are:
I am afraid GQL/Datastore API may not support parent and key name as well as normal property. Is there anything I can do about a property but not parent or key name (or are they essentially reference properties)? How do I use key names in GQL queries?
The meaning of key name and parent is not particularly clear. As the names are not self-descriptive, I have to inform other contributors that we use account name as key name...
But this is really unnecessary work, wasting time and storage space. I cannot get rid of the SQL-thinking that - why doesn't Google just let us define a property to be the key? and another to be the parent? Then we could name them and use as normal properties...
What's the best practice here?

Keep in mind that in the GAE Datastore you can never change the parent or key_name of an entity once it has been created. These values are permanent for the life of the entity.
If there is even a small chance that the account_name of an Employee could change then you can not use it as a key_name. If it never changes then it could be a very good key_name and will allow you to do cheap gets for Employees using Employee.get_by_key_name() instead of expensive queries.
Parent is not meant to be equivalent to a foreign key. A better equivalent to a foreign key is a reference property.
The main reason you use parent is so that the parent and child entities are in the same entity group which allows you to operate on them both in a single transaction. If you just need a reference to the division from the Employee then just use a reference property. I suggest getting familiar with how entity groups work as this is very important on GAE data modeling:
https://developers.google.com/appengine/docs/python/datastore/entities#Transactions_and_Entity_Groups
Using parent can also cause write performance issues as there is a limit to how quickly you can write to a single entity group (approximately one write per second). When deciding whether to use parent or a reference property you need to think about which entities need to be modified in the same transaction. In many cases you can use Cross Group (XG) transactions instead. It is all about which trade-offs you want to make.
So my suggestions are:
If your account_name for an employee will absolutely never change then use it as a key_name. Otherwise just make it a basic property.
If you need to modify the Employee and the Division in the same transaction (and you can't get this to work with XG transactions) and you will never change the Division of an Employee then make the Division the parent of the Employee. Otherwise just model this relationship with a reference property.

When you create a new Employee object with a Divison as a parent, it would go something like:
div = Division()
... #Complete the division properties
div.put()
emp = Employee(key_name=<account_name>, parent=div)
... #Complete the employee properties
emp.put()
Then, when you want to get a reference to the Division an Employee is part of:
div = emp.parent()
#Get the Employee account_name (which is the employees's key name):
account_name = emp.key().name()
You don't have to store a RefrenceProperty to the Division an Employee is part of since it's already done in the parent. Additionally, you can get the account_name from the Employee entity's key as needed.
To query on the key:
emp = Employee.get_by_key_name(<account_name>, parent=<division>)
#OR
div = Division.get_by_key_name(<keyname>)
#Get all employees in a division
emps = Employee.all().ancestor(div)

SQLAlchemy modelling a complex relationship using reflection

I am querying a proprietary database which is maintained by a third party. The database has many tables each with large numbers of fields.
My problem refers to three tables of interest, Tree, Site and Meter.
The tree table describes nodes in a simple tree structure. Along with other data it has a foreign key referencing its own primary key. It also has an Object_Type field and an Object_ID field. The Site and Meter tables each have many fields.
A tree node has a one-to-one relationship with either be a meter or a site. If the Object_Type field is 1 then the Object_ID field refers to the primary key in the Site table. If it is 2 then it refers to the primary key in the Meter table.
following this example https://bitbucket.org/sqlalchemy/sqlalchemy/src/408388e5faf4/examples/declarative_reflection/declarative_reflection.py
I am using reflection to load the table structures like so
Base = declarative_base(cls=DeclarativeReflectedBase)
class Meter(Base):
__tablename__ = 'Meter'
class Site(Base):
__tablename__ = 'Site'
class Tree(Base):
__tablename__ = 'Tree'
Parent_Node_ID = Column(Integer, ForeignKey('Tree.Node_ID'))
Node_ID = Column(Integer, primary_key=True)
children = relationship("Tree", backref=backref('parent', remote_side=[Node_ID]))
Base.prepare(engine)
I have included the self-referential relationship and that works perfectly. How can I add the two relationships using Object_ID as the foreign key, with the appropriate check on the Object_Type field?

First a note on reflection. I've found myself much better off not relying on reflection.
it does not require a valid database connection for you to load/work with your code
it violates the python guide that explicit is better than implicit. If you look at you code you are better off seeing the elements (columns etc) rather than having them magically created outside your field of view.
This means more code but more maintainable.
The reason I suggested that is at least in part that I cannot see schema in your posting.
If you create the tables and classes in your code rather than relying on reflection, you can then have better control over mapping.
In this case you want to use polymorphic mapping
create a TreeNode class as above.
create SiteNode and MeterNode as subclasses
Your code would then include something like:
mapper(TreeNode,tree_table,polymorphic_on=tree_table.c.object_type)
mapper(SiteNode, site_table,inherits=TreeNode,
inherit_condition=site_table.c.node_id==tree_table.c.node_id,
polymorphic_identity=1)
Hope this helps.

for tree.object_id to be a foreign key that can refer either to Site or Meter, you can either have Site and Meter descend from a common base table, that is, joined table inheritance, or be mapped to the same table, that is, single table inheritance, or as someone said have Tree be mapped to two different tables as well as a common base table. This last suggestion goes well with the idea that TreeNode already has a "type" field.
The final alternative which might be easier is to use two foreign keys on TreeNode directly - site_id and meter_id, as well as two relationships, "meter" and "site"; then use a Python #property to return one or the other:
class TreeNode(Base):
# ...
#property
def object(self):
return self.meter or self.site

The underlying query of a mapped attribute

If I have an sqlalchemy-mapped instance. Can I get an underlying dynamic query object corresponding to an attribute of said instance?
For example:
e = Employee()
e.projects
#how do I get a query object loaded with the underlying sql of e.projects

I think you're describing the lazy="dynamic" property of relationship(). something like
class Employee(Base):
__table_name__ = "employees"
...
projects = relationship(..., lazy="dynamic")
which will cause Employee().project to return a sqlalchemy.orm.Query instance instead of a collection containing the related items. However, that means there's no (simple) way to access the collection directly. If you still need that (most likely you really do want it to be lazily loaded, set up two relationship()s instead.
class Employee(Base):
__table_name__ = "employees"
...
projects_query = relationship(..., lazy="dynamic")
projects = relationship(..., lazy="select")
edit: You said
I need somehow to get the dynamic query object of an already lazy relationship mapped property.
Supposing we have an instance i of class Foo related to a class Bar by the property bars. First, we need to get the property that handles the relationship.
from sqlalchemy.orm.attributes import manager_of_class
p = manager_of_class(Foo).mapper.get_property('bars')
We'd like an expression that and_s together all of the columns on i that relate it to bars. If you need to operate on Foo through an alias, substitute it in here.
e = sqlalchemy.and_(*[getattr(Foo, c.key) == getattr(i, c.key)
for c in p.local_side])
Now we can create a query that expresses this relationship. Substitute aliases for Foo and Bar here as needed.
q = session.query(Foo) \
.filter(e) \
.join(Foo.bars) \
.with_entities(Bar)

Not sure about the question in general, but you definitely can enable SQL logging by setting echo=True, which will log the SQL statement as soon as you try to get value of the attribute.
Depending on your relationship configuration, it might have been eagerly pre-loaded.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.