Sqlalchemy query on multiple relationship between two tables

Sqlalchemy query on multiple relationship between two tables - python

I am having trouble with the following setup of a sqlalchemy ORM connected to a postgresql db.
class Map(Base):
__tablename__ = "map"
id = Column(BigInteger, Sequence(name="myseq"), primary_key=True)
cmp_1_id = Column(BigInteger, ForeignKey("component.id"))
cmp_2_id = Column(BigInteger, ForeignKey("component.id"))
class Component(Base):
__tablename__ = "component"
id = Column(BigInteger, Sequence(name="myseq"), primary_key=True)
map_1 = relationship("Map", back_populates="cmp_1", foreign_keys=Map.cmp_1_id, uselist=False)
map_2 = relationship("Map", back_populates="cmp_2", foreign_keys=Map.cmp_2_id, uselist=False)
Map.cmp_1 = relationship(
"Component", back_populates="map_1", primaryjoin=Map.cmp_1_id == Component.id
)
Map.cmp_2 = relationship(
"Component", back_populates="map_2", primaryjoin=Map.cmp_2_id == Component.id
)
Now I want to query a specific Map object, whose cmp_1 object has a certain "known_value" of other_attribute. I tried various statements, using Query API and Select API and with a colleague finally found this solution to be working:
(session.query(Map.id)
.join(Map.cmp_1)
.where(Component.other_attribute=="known_value")
.one()[0]
)
During my research on the topic I read through some other SO articles, which raised further questions. So here they come:
My main question: why can't I do the query like this:
(session.query(Alias_Map_Expanded.id)
.where(Map.cmp_1.other_attribute=="known_value")
).one()[0]
This raises the exception AttributeError: Neither 'InstrumentedAttribute' object nor 'Comparator' object associated with Map.cmp_1 has an attribute 'other_attribute'
More generally: how would I design the model better (as in: more robust and easier to jump between relations) to possibly be able to do the above? The relationships need to be One (Component) To Many (Map), i.e a Map object points to one to two (cmp_2 is optional) components. In turn a component can be pointed to from multiple Map rows/objects.
Based on this: Should I always define a foreign key along with a relationship to not break the relationship inside the db? Update: I removed this question because I now find it rather misleading and not really worth having it answered.
Based on that: I guess I also need to use the post_update to not have a circular dependency? Or do I misinterpret the use of post_update?
Thanks in advance!

After some thorough consulting of the extensive sqlalchemy docs I found some answers:
To my first question and the related query: in my ORM classes I did not specify the loading type of the data, leaving it at the default type "lazy". Therefore the other_attribute attribute's value is not loaded with the first query but rather it would take a second call to query1_result.other_attribute upon which the related content would be queried separately. Alternatively I'd need to specify an eager loading type for the proposed query to be working.
I just figured, even if I use eager-loaded queries, I still cannot filter related objects, using class-level ORM syntax, because at that point the ORM instance has not yet mapped its relative. The filtering ("where" clause) needs to be formulated on SQL level, i.e like the first example I gave above...
There is most likely no meaningful answer to that, especially without deeper knowledge of my database structure...
Third question, based on link: I think my question is somewhat strange and maybe even misleading. I will remove it from the original post.
Last question, based on 2nd link: I haven't investigated so much more on this question, being it the least important to me, but I think I got the concept of post_update wrong and will not need it for my purpose.
I got all of it from sqlalchemy docs, so in case you hit that question and have a similar problem, work your way through the extensive documentation conglomeration. The answer is most likely there.

Related

Why am I getting a duplicate foreign key error?

I'm trying to use Python and SQLAlcehmy to insert stuff into the database but it's giving me a duplicate foreign key error. I didn't have any problems when I was executing the SQL queries to create the tables earlier.

You're getting the duplicate because you've written the code as a one to one relationship, when it is at least a one to many relationship.
Sql doesn't let you have more than one of any variable. It creates keys for each variable, and when you try to insert the same variable, but haven't set up any type of relationship between the table it gets really upset at you, and throws up the error you're getting.
The below code is a one-to-many relationship for your tables using flask to connect to the database.. if you aren't using flask yourself.. figure out the translation, or use it.
class ChildcareUnit(db.Model):
Childcare_id=db.Column('ChildcareUnit_id',db.Integer,primary_key=True)
fullname = db.Column(String(250), nullable = False)
shortname = db.Column(String(250), nullable = False)
_Menu = db.relationship('Menu')
def __init__(self,fullname,shortname):
self.fullname = fullname
self.shortname = shortname
def __repr__(self):
return '<ChildcareUnit %r>' % self.id
class Menu(db.Model):
menu_id = db.Column('menu_id', db.Integer, primary_key=True)
menu_date = db.Column('Date', Date, nullable=True)
idChildcareUnit=db.Column(db.Integer,db.Forgeinkey('ChilecareUnit.ChilecareUnit_id'))
ChilecareUnits = db.relationship('ChildcareUnit')
def __init__(self,menu_date):
self.menu_date = menu_date
def __repr__(self):
return '<Menu %r>' % self.id
A couple differences here to note. the Columns are now db.Column() not Column(). This is the Flask code at work. it makes a connection between your database and the column in that table, saying "hey, these two things are connected".
Also, look at the db.Relationship() variables I've added to both of the tables. This is what tells your ORM that the two tables have a 1-2-many relationship. They need to be in both of the tables, and the relationship column in one table needs to list the other for it to work, as you can see.
Lastly, look at __repr__. This is what you're ORM uses to generate the foreign Keys for your database. It is also really important to include. Your code will either be super super slow without it, or just not work all together.
there are two different options you have to generate foreign keys in sqlalchemy. __repr__ and __str__
__repr__ is designed to generate keys that are easier for the machine to read, which will help with performance, but might make reading and understanding them a little more difficult.
__str__ is designed to be human friendly. It'll make your foreign keys easier to understand, but it will also make your code run just a little bit slower.
You can always use __str__ while you're developing, and then switch __repr__ when you're ready to have your final database.

Django defering the foreign key look up

Working a django project and trying to speed up the calls. I noticed that Django automatically does a second query to evaulate any foreign key relationships. For instance if my models look like:
Model Person:
name = model.CharField("blah")
Model Address:
person = model.ForeignKey(Person)
Then I make:
p1 = Person("Bob")
address1 = Address(p1)
print (p1.id) #let it be 1 cause it is the first entry
then when I call:
address1.objects.filter(person_id = "1")
I get:
Query #1: SELECT address.id, address.person_id FROM address
Query #2: SELECT person.id, person.name FROM person
I want to get rid of the 2nd call, query #2. I have tried using "defer" from django documentation, but that did not work (in fact it makes even more calls). "values" is a possibility but in actual practice, there are many more fields I want to pull. The only thing I want it to do is not evaluate the FOREIGN KEY. I would be happy to get the person_id back, or not. This drastically reduces the runtime especially when I do a command like: Address.objects.all(), because it Django evaluates every foreign key.

Having just seen your other question on the same issue, I'm going to guess that you have defined a __unicode__ method that references the ForeignKey field. If you query for some objects in the shell and output them, the __unicode__ method will be called, which requires a query to get the ForeignKey. The solution is to either rewrite that method so it doesn't need that reference, or - as I say in the other question - use select_related().
Next time, please provide full code, including some that actually demonstrates the problem you are having.

How can you keep the Django ORM from making mistakes when you pass the wrong kind of object?

We found this while testing, one machine was setup with MyISAM as the default engine and one was set with InnoDB as the default engine. We have code similar to the following
class StudyManager(models.Manager):
def scored(self, school=None, student=None):
qset = self.objects.all()
if school:
qset = qset.filter(school=school)
if student:
qset = qset.filter(student=student)
return qset.order_by('something')
The problem code looked like this:
print Study.objects.scored(student).count()
which meant that the "student" was being treated as a school. This got thru testing in with MyISAM because student.id == school.id because MyISAM can't do a rollback and gets completely re-created each test (resetting the autoincrement id field). InnoDB caught these errors because rollback evidently does not reset the autoincrement fields.
Problem is, during testing, there could be many other errors that are going uncaught due to duck typing since all models have an id field. I'm worried about the id's on objects lining up (in production or in testing) and that causing problems/failing to find the bugs.
I could add asserts like so:
class StudyManager(models.Manager):
def scored(self, school=None, student=None):
qset = self.objects.all()
if school:
assert(isinstance(school, School))
qset = qset.filter(school=school)
if student:
assert(isinstance(student, Student))
qset = qset.filter(student=student)
return qset.order_by('something')
But this looks nasty, and is a lot of work (to go back and retrofit). It's also slower in debug mode.
I've thought about the idea that the id field for the models could be coerced into model_id (student_id for Student, school_id for School) so that schools would not have a student_id, this would only involve specifying the primary key field, but django has a shortcut for that in .pk so I'm guessing that might not help in all cases.
Is there a more elegant solution to catching this kind of bug? Being an old C++ hand, I kind of miss type safety.

This is an aspect of Python and has nothing to do with Django per se.
By defining default values for function parameters you do not eliminate the concept of positional arguments — you simply make it possible to not specify all parameters when invoking the function. #mVChr is correct in saying that you need to get in the habit of using the parameter name(s) when you call the routine, particularly when there is inherent ambiguity in just what it is being called with.
You might also consider having two separate routines whose names quiet clearly identify their expected parameter types.

SQLAlchemy modelling a complex relationship using reflection

I am querying a proprietary database which is maintained by a third party. The database has many tables each with large numbers of fields.
My problem refers to three tables of interest, Tree, Site and Meter.
The tree table describes nodes in a simple tree structure. Along with other data it has a foreign key referencing its own primary key. It also has an Object_Type field and an Object_ID field. The Site and Meter tables each have many fields.
A tree node has a one-to-one relationship with either be a meter or a site. If the Object_Type field is 1 then the Object_ID field refers to the primary key in the Site table. If it is 2 then it refers to the primary key in the Meter table.
following this example https://bitbucket.org/sqlalchemy/sqlalchemy/src/408388e5faf4/examples/declarative_reflection/declarative_reflection.py
I am using reflection to load the table structures like so
Base = declarative_base(cls=DeclarativeReflectedBase)
class Meter(Base):
__tablename__ = 'Meter'
class Site(Base):
__tablename__ = 'Site'
class Tree(Base):
__tablename__ = 'Tree'
Parent_Node_ID = Column(Integer, ForeignKey('Tree.Node_ID'))
Node_ID = Column(Integer, primary_key=True)
children = relationship("Tree", backref=backref('parent', remote_side=[Node_ID]))
Base.prepare(engine)
I have included the self-referential relationship and that works perfectly. How can I add the two relationships using Object_ID as the foreign key, with the appropriate check on the Object_Type field?

First a note on reflection. I've found myself much better off not relying on reflection.
it does not require a valid database connection for you to load/work with your code
it violates the python guide that explicit is better than implicit. If you look at you code you are better off seeing the elements (columns etc) rather than having them magically created outside your field of view.
This means more code but more maintainable.
The reason I suggested that is at least in part that I cannot see schema in your posting.
If you create the tables and classes in your code rather than relying on reflection, you can then have better control over mapping.
In this case you want to use polymorphic mapping
create a TreeNode class as above.
create SiteNode and MeterNode as subclasses
Your code would then include something like:
mapper(TreeNode,tree_table,polymorphic_on=tree_table.c.object_type)
mapper(SiteNode, site_table,inherits=TreeNode,
inherit_condition=site_table.c.node_id==tree_table.c.node_id,
polymorphic_identity=1)
Hope this helps.

for tree.object_id to be a foreign key that can refer either to Site or Meter, you can either have Site and Meter descend from a common base table, that is, joined table inheritance, or be mapped to the same table, that is, single table inheritance, or as someone said have Tree be mapped to two different tables as well as a common base table. This last suggestion goes well with the idea that TreeNode already has a "type" field.
The final alternative which might be easier is to use two foreign keys on TreeNode directly - site_id and meter_id, as well as two relationships, "meter" and "site"; then use a Python #property to return one or the other:
class TreeNode(Base):
# ...
#property
def object(self):
return self.meter or self.site

App Engine, Cross reference between two entities

i will like to have two types of entities referring to each other.
but python dont know about name of second entity class in the body of first yet.
so how shall i code.
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty(reference_class=Business_Info)
class Business_Info (db.Model):
my_business_ = db.ReferenceProperty(reference_class=Business)
if you advice to use reference in only one and use the implicitly created property
(which is a query object) in other.
then i question the CPU quota penalty of using query vs directly using get() on key
Pleas advise how to write this code in python

Queries are a little slower, and so they do use a bit more resources. ReferenceProperty does not require reference_class. So you could always define Business like:
class Business(db.Model):
bus_contact_info_ = db.ReferenceProperty()
There may also be better options for your datastructure too. Check out the modelling relationships article for some ideas.
Is this a one-to-one mapping? If this is a one-to-one mapping, you may be better off denormalizing your data.
Does it ever change? If not (and it is one-to-one), perhaps you could use entity groups and structure your data so that you could just directly use the keys / key names. You might be able to do this by making BusinessInfo a child of Business, then always use 'i' as the key_name. For example:
business = Business().put()
business_info = BusinessInfo(key_name='i', parent=business).put()
# Get business_info from business:
business_info = db.get(db.Key.from_path('BusinessInfo', 'i', parent=business))
# Get business from business_info:
business = db.get(business_info.parent())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.