I've been looking for solutions to this but I couldn't find anything on finding duplicates with SQLAlchemy.
I have a parent-child type relationship, and I'm looking to find all the duplicates in the children on a specific column.
I tried iterating over each parent and counting on the column, but it gave me results that didn't make sense.
parents = session.query(parent).all()
for parent in parents:
dups = session.query(child).filter_by(parentid=parent.id).group_by(child.foo_column).count()
if dups > 0:
# do action on duplicates
How can I get the duplicate children, or is there even a single query that could return all the duplicates?
EDIT:
Table definitions:
class parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
class child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parentid = Column(Integer, ForeignKey('parent.id'))
foo_column = Column(Integer, ForeignKey('foo.id'))
parent = relationship('parent',
backref=backref('children'))
foo = relationship('foo')
The foo_column I'm interested in contains just integer id's, so a duplicate would just be where foo1.id == foo2.id.
What you are trying to achieve requires a self join. Think of how you would do it in SQL. Your query would look something like:
SELECT child_1.id as dup1, child_2.id as dup2
FROM child as child_1 JOIN child as child_2
ON child_1.parentid = child_2.parentid
WHERE child_1.foo_column = child_2.foo_column;
Translating that to SQL Alchemy is straightforward:
child_2 = aliased(child)
dups = session.query(child).
join(child_2, child.parentid == child_2.parentid).
filter(child.foo_column == child_2.foo_column).
with_entities(child.id.label('dup1'), child_2.id.label('dup2'))
Related
I am trying to find a way to select columns with an sqlalchemy relationships:
I have this two tables:
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
label = Column(String)
children = relationship("Child", back_populates="parent")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
parent_id = Column(Integer, ForeignKey('public.parent.id'))
parent = relationship("Parent", back_populates="children")
when i use
db.query( models.Parent).first()
i get the parent object with the list of Children as i was expecting, but what i would like to do is to select only few columns like that:
db.query( models.Parent.id, models.Parent.children)
in this case it doesn't work an i get the following error:
Could not locate column in row for column Children
You can use options with load_only() function.
stmt = select(Parent)
results = await db.execute(
stmt
.options(load_only(Parent.id))
.options(selectinload(Parent.children))
)
*If you want picking some columns, just attach load_only() like this.
.options(selectinload(Parent.children).load_only(Child.id, Child.name))
Yes, I tested execute() but, query() will work.
session.query(User).options(load_only(User.name, User.fullname))
see also:
https://docs.sqlalchemy.org/en/14/orm/loading_columns.html#sqlalchemy.orm.load_only
I have a table called Node. Each node has many children of type Node. Also, a Node object has many parents. I also made an algorithm to find siblings (node with similar parents)
how to make it? do I make a separate table for them? or do I make it in the same table? here is how I tried to do it and failed obviously:
class Node(Model):
id = Column(String, primary_key=True)
name = Column(String)
parent_id = db.Column(db.String, db.ForeignKey('node.id'))
children = db.relationship('node', remote_side=[id], uselist=True)
parents = db.relationship('node', remote_side=[id], uselist=True)
siblings = db.relationship('node', remote_side=[id], uselist=True)
I have no idea how to make this happen.
I actually thought about using a graphDB for this node object. And the other tables with classic SQL but I am not sure it is worth the fuss
Although I don't know how to implement "siblings" yet, here is how to have many self-referential many-to-many relationships:
Connection = Table('connection',
Column('child_id', String, ForeignKey('node.id')),
Column('parent_id', String, ForeignKey('node.id')),
UniqueConstraint('parent_id', 'child_id', name='unique_usage')
)
class Node(Model):
id = Column(String, primary_key=True)
name = Column(String)
# this is the result list of type Node
# where the current node is the "other party" or "child"
parents = relationship('Node', secondary=Connection,
primaryjoin=id == Connection.c.parent_id,
secondaryjoin=id == Connection.c.child_id)
# this is the result list of type Node
# where the current node is the "parent"
children = relationship('Node', secondary=Connection,
primaryjoin=id == Connection.c.child_id,
secondaryjoin=id == Connection.c.parent_id)
basically, for each wanted many-to-many relationship, make the table representing the relationship, then add the relation to your module. You can have two-way relations for each one of them
I will edit my answer later when I figure how to make siblings
I have two tables :
class Parent(Base):
__tablename__ = 'parents'
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
min_quantity = sqlalchemy.Column(sqlalchemy.Integer, default=0, nullable=False)
children = sqlalchemy.orm.relationship('Child', backref=sqlalchemy.orm.backref('parent')
class Child(Base):
__tablename__ = 'children'
id = sqlalchemy.Column(sqlalchemy.Integer, primary_key=True)
expiration_date = sqlalchemy.Column(sqlalchemy.Date)
parent_id = sqlalchemy.Column(sqlalchemy.Integer, sqlalchemy.ForeignKey('parents.id'))
What is the simplest way to select only Parent objects that have at least min_quantity children with an expiration_date < datetime.today() ?
I could query every Parent objects and then do a list comprehension :
parents = session.query(Parent).all()
ok_parents = [parent for parent in parents if len([child for child in parent.children if child.expiration_date < datetime.today()]) > parent.min_quantity]
But it doesn't feel like really efficient code.
I suppose I can do that in only one query with the good options, and I suspect I will have to use having() in my query, but since my knowledge of sqlachemy (and SQL in general) are still limited, I didn't manage to solve this problem.
So basically, I'm open to any proposition.
You're correct in that you could employ a HAVING clause. In addition you could for example query parents joined with children, filter those by expiration date, group by parent id, and filter the groups by count of rows per group in relation to min quantity:
ok_parents = session.query(Parent).\
join(Child).\
filter(Child.expiration_date < datetime.today()).\
group_by(Parent.id).\
having(func.count(1) > Parent.min_quantity).\
all()
A caveat in the above query is that a parent with no children can not be a result because of the inner join. To work around that you'd use outerjoin(Child) and filter the resulting rows by either having no child (Child.id.is_(None)), or expiration date.
Here are my models (ignoring imports):
class Parent(Base):
__tablename__ = 'parents'
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship('Child', backref='parent', lazy='dynamic')
class Child(Base):
__tablename__ = 'children'
id = Column(Integer, primary_key=True)
name = Column(String)
parent_id = Column(Integer, ForeignKey('parents.id'))
Next I create a parent and a child, and relate them:
dbsession = session()
child = Child(name='bar')
dbsession.add(child)
parent = Parent(name='foo')
parent.children.append(child)
dbsession.add(parent)
dbsession.commit()
And all that works fine (so ignore any errors I may have made copying it to here). Now I'm trying to break the relationship, while keeping both the parent and the child in the database, and I'm coming up empty.
I appreciate any help.
I'm not sure exactly what you mean by break a relationship or why but I think this might work:
child = dbsession.query(Child).filter(Child.name=='bar').one()
child.parent_id = None
dbsession.add(child)
dbsession.commit()
This post gives more info about blanking a foreign key: Can a foreign key be NULL and/or duplicate?
I'm using SQLAlchemy to represent a relationship between authors. I'd like to have authors related to other authors (coauthorshp), with extra data in the relation, such that with an author a I can find their coauthors.
How this is done between two different objects is this:
class Association(Base):
__tablename__ = 'association'
left_id = Column(Integer, ForeignKey('left.id'), primary_key=True)
right_id = Column(Integer, ForeignKey('right.id'), primary_key=True)
extra_data = Column(String(80))
child = relationship('Child', backref='parent_assocs')
class Parent(Base):
__tablename__ = 'left'
id = Column(Integer, primary_key=True)
children = relationship('Association', backref='parent')
class Child(Base):
__tablename__ = 'right'
id = Column(Integer, primary_key=True)
but how would I do this in my case?
The nature of a coauthorship is that it is bidirectional. So, when you insert the tuple (id_left, id_right) into the coauthorship table through a coauthoship object, is there a way to also insert the reverse relation easily? I'm asking because I want to use association proxies.
if you'd like to literally have pairs of rows in association, that is, for every id_left, id_right that's inserted, you also insert an id_right, id_left, you'd use an attribute event to listen for append events on either side, and produce an append in the other direction.
If you just want to be able to navigate between Parent/Child in either direction, just a single row of id_left, id_right is sufficient. The examples in the docs regarding this kind of mapping illustrate the whole thing.