SQLAlchemy many-to-many AND filter - python

I have two models, joined by a many-to-many relationship
image_tag = Table('image_tag', Base.metadata,
Column('image_id', Integer, ForeignKey('images.id')),
Column('tag_id', Integer, ForeignKey('tags.id'))
)
class Image(Base):
__tablename__='images'
id = db.Column(db.Integer, primary_key=True)
tags = relationship('Tag', secondary=image_tag, backref=backref('images', order_by=id.desc()), lazy="joined")
class Tag(Base):
__tablename__ = 'tags'
id = Column(Integer, primary_key=True)
tag = Column(String(64), unique=True)
Now lets say I want to filter Images by tag – easy:
_tag = "foo"
Image.query.filter(Image.tags.any(tag=_tag)).all()
But what if I want to filter by many tags and only want to match those Images which match all of the tags?
tags = ["foo", "bar"]
???
Any help is incredibly appreciated. Thanks!

I see two possibilites: Either combine two EXIST clauses that check for each element separately or specify a where clause like this:
WHERE (SELECT COUNT(*) FROM ... WHERE tags.tag in ("foo", "bar")) = 2
Both solutions are kinda ugly but I see a problem here more with the way SQL is than with SQLAlchemy.
In both cases I recommend you build a normal SQL query first, test it and them build that in SQLAlchemy equivalently.
By the way, I could imagine both solutions being very inefficient on large sets of data, but I don't see a solution here.
Initially, I would have tried something like WHERE (SELECT tags.tag FROM ...) = ("foo", "bar") but that does not seem to be valid SQL (at least MySQL threw an error at me) because the subquery in WHERE must return a scalar result.

Related

How to count child table items with or without join to parent table using SQLAlchemy?

I used SQLAlchemy to create a SQLite database which stores bibliographic data of some document, and I want to query the author number of each document.
I know how to do this in raw SQL, but how can I achieve the same result using SQLAlchemy? It is possible without using join?
Here is the classes that I have defined:
class WosDocument(Base):
__tablename__ = 'wos_document'
document_id = Column(Integer, primary_key=True)
unique_id = Column(String, unique=True)
......
authors = relationship('WosAuthor', back_populates='document')
class WosAuthor(Base):
__tablename__ = 'wos_author'
author_id = Column(Integer, primary_key=True, autoincrement=True)
document_unique_id = Column(String, ForeignKey('wos_document.unique_id'))
document = relationship('WosDocument', back_populates='authors')
last_name = Column(String)
first_name = Column(String)
And my goal is to get the same result as this SQL query does:
SELECT a.unique_id, COUNT(*)
FROM wos_document AS a
LEFT JOIN wos_author AS b
ON a.unique_id = b.document_unique_id
GROUP BY a.unique_id
I tried the codes below:
session.query(WosDocument.unique_id, len(WosDocument.authors)).all()
session.query(WosDocument.unique_id, func.count(WosDocument.authors)).all()
The first line raised an error, the second line doesn't give me the desired result, it return only one row and I don't recognize what it is:
[('000275510800023', 40685268)]
Since WosDocument Object has a one-to-many relationship authors, I supposed that I can query the author number of each document without using join explicitly, but I can't find out how to do this with SQLAlchemy.
Can you help me? Thanks!
If you have written the right relation in your model. Then the query would be like:
db.session.query(ParentTable.pk,func.count('*').label("count")).join(Childtable).group_by(ParentTable).all()
The detail of the document of the join() is
https://docs.sqlalchemy.org/en/latest/orm/query.html#sqlalchemy.orm.query.Query.join
If you don't join() explictly you would need to deal with something like parent.relations as a field.

child class to filter parent class by condition in sqlAlchemy

In flask-sqlAlchemy, I have a class User, which has, amongst others, the column status, like so:
class User(Model):
__tablename__ = 'user'
id = Column(Integer, primary_key=True, autoincrement=True, nullable=False)
status = Column(String(255))
age = Column(Integer)
yet_another_property = Column(String(255))
I would now like to have a model ActiveUser which represents all entries in the user table where status is 'active'. Spoken in MySQL, this would be
SELECT * FROM user
WHERE
user.status = 'active';
I think, this should somehow work with ActiveUser being a child class of User and via single table inheritance (as described at http://docs.sqlalchemy.org/en/latest/orm/inheritance.html#single-table-inheritance), like so:
class User(Model):
__tablename__ = 'user'
id = Column(Integer, primary_key=True, autoincrement=True, nullable=False)
status = Column(String(255))
age = Column(Integer)
yet_another_property = Column(String(255))
__mapper_args__ = {
'polymorphic_on':status,
'polymorphic_identity':'user'
}
class ActiveUser(User):
__mapper_args__ = {
'polymorphic_on':status,
'polymorphic_identity':'active'
}
My first question is: Would this work?
It gets a bit more complicate, since actually I want to filter on multiple properties. So, actually, I want to have a class ActiveAdultUser, which represents all Users being in active status and having an age greater or equal 18. Again, spoken in MySQL:
SELECT * FROM user
WHERE
user.status = 'active'
AND
user.age >= 18;
My second question is: So, how would I do that?
Of course I know, I can do queries for active adult Users by applying .filter(and_(User.status == 'active', User.age >=18)) to a query on User. But for the sake of clean code I would like to have this filter on the Model level.
I also thought of overriding the query function on the ActiveUser model, but this looks rather hacky and I don't know if I can rely on this in any circumstance.
I do not think you can do that with polymorphic as you cannot put a "greater than" comparison there. You can only have a fixed polymorphic_identity. You also cannot polymorph on two different fields, only one.
To me this is a case where you are planning to "simplify" something that is already relatively simple. An SQL query with two filtering conditionals is hardly a complicated or even a messy statement.
If you are adamant you want to do this, the correct method is to subclass Query and write your conditional in there. https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/PreFilteredQuery gives an example how to do this.
Personally I would just add two filters to my queries. It is far less messy and fragile than subclassing something that may change in the future versions. Query syntax is likely to be far more persistent.
Another way of removing redundant code would be prewriting the query:
q = (User.age >= 18, User.status == "active")
foo = Session.query(User).filter(*q).all()
or even
q = Session.query(User).filter(User.age >= 18, User.status == "active")
foo = q.filter(add your additional filters here).all()
# or if no additional filters needed
foo = q.all()
If you define q in a suitable place of your code, you can use it throughout your program. This would remove the need to repeat those two filtering conditions in every query.
You can produce multiple mappers for one class – found in the non-traditional mappings section of the documentation:
However, there is a class of mapper known as the non primary mapper with allows additional mappers to be associated with a class, but with a limited scope of use. This scope typically applies to being able to load rows from an alternate table or selectable unit, but still producing classes which are ultimately persisted using the primary mapping.
So instead of a subclass you'd produce a non primary mapper ActiveUser:
In [9]: ActiveUser = mapper(
...: User,
...: User.__table__.select().
...: where(User.status == "active").
...: alias(),
...: non_primary=True)
...:
In [10]: print(session.query(ActiveUser))
SELECT anon_1.id AS anon_1_id, anon_1.status AS anon_1_status, anon_1.age AS anon_1_age, anon_1.yet_another_property AS anon_1_yet_another_property
FROM (SELECT user.id AS id, user.status AS status, user.age AS age, user.yet_another_property AS yet_another_property
FROM user
WHERE user.status = ?) AS anon_1
On the other hand all of this might be rather unnecessary and you could simply query for active users when you need such.

Flask sqlalchemy - many to many relationship - filtering on children level

my filter on children doesn't work. I am not sure, what it is done in wrong way.
country.py
product_country = Table('product_country', Base.metadata,
Column('product_id', Integer, ForeignKey('product.id'), primary_key=True),
Column('country_id', Integer, ForeignKey('country.id'), primary_key=True)
)
class Country(Base):
__tablename__="country"
id = Column(Integer, primary_key=True)
name = Column(String(200))
products = relationship(Product, secondary=product_country, backref='countries')
product.py
class Product(Base):
__tablename__="product"
id = Column(Integer, primary_key=True)
color = Column(Integer)
....
then sqlalchemy search:
country = s.query(Country).join(Country.products).filter(Country.id==1).filter(Product.color==1).first()
Well, I get country with id=1, what is what I want, but in list country.products, I would expect only products with color = 1, but there are all products assigned to country. Please could you help me. Thank you
That's a misunderstanding. The relationship loading is separate from your query, for good reasons. In other words the join you've used is not used to eagerly load the relationship. You could instruct the query that it contains an eager load join with contains_eager(), and read "the zen of joined eager loading" in order to understand and "Using contains_eager() to load a custom-filtered collection result" for an example how to do what you're trying. In your case just add
options(contains_eager(Country.products))
to your query.

Python SQLAlchemy: Filter joined tables with subqueries

In sqlalchemy I defined a model for my database, in this case two tables "sample" and "experiment", which are linked over a many-to-many relationship to each other:
class Sample(Base):
__tablename__ = 'sample'
id = Column(Integer, primary_key=True)
name = Column(String)
date = Column(String)
class Experiment(Base):
__tablename__ = 'experiment'
id = Column(Integer, primary_key=True)
name = Column(String)
description = Column(String)
result = Column(String)
samples = relationship('Sample', secondary='sample_has_experiment', backref="experiments")
t_sample_has_experiment = Table(
'sample_has_experiment', metadata,
Column('sample_id', ForeignKey('sample.id'), primary_key=True, nullable=False, index=True),
Column('experiment_id', ForeignKey('experiment.id'), primary_key=True, nullable=False, index=True)
)
In my database, I have a sample "Dilithium", and in experiments two experiments "Cold fusion" and "Worm hole".
I try to query the "Worm hole" experiment over a join to the sample table:
samples = s.query(Obj.Sample).join(Obj.Sample.experiments).\
filter(Obj.Experiment.name == "Worm hole").all()
for sample in samples:
for experiment in sample.experiments:
print(experiment.name)
But as a result I still get both experiments "Worm hole" AND "Cold Fusion". So it looks like the filtering is not applied. How can I filter that way that I only receive the "Worm Hole" Experiment object? Thank you.
#Dublicate: Indeed it looks like the same question, but the given answer does not solve my problem. In my opinion, my suggested query does exactly the same like in the proposed answer.
What your code says is:
For all samples that were part of the wormhole experiment
Print all experiments that sample is part of
That is, given a particular sample, sample.experiments is always all the experiments that sample belongs to not just the experiment you got to that sample through.
You can see this if you go add a new sample that does not belong to the wormhole experiment. It should not appear in your query.
So, my answer to "why isn't my query filtering on the join," is that I strongly suspect it is.
If you want the sample objects that were in the wormhole experiment then something like
samples = session.query(Experiments).filter_by(name = 'wormhole').one().samples
Edit: Actually I realized what my mistake in the first place was. I have to query all tables and then apply joins and filters:
qry = s.query(Obj.Sample, Obj.Experiment).\ # Query both tables!
filter(Obj.Sample.name == "...").\
filter(Obj.Experiment.name == "...").\
join(Obj.Experiment, Obj.Sample.experiments).all()
Alternative way: The use of a subquery leads to the desired behaviour:
wormhole_exp = s.query(Obj.Experiment).filter(Obj.Experiment.name == "wormhole").subquery()
print(s.query(Obj.Sample, wormhole_exp).join(wormhole_exp, Obj.Sample.experiments).one())

Tracking member tagging of artists

I'm writing a web based jukebox for my own amusement and as a learning experiencing.
I'm finally getting around to finishing my models except for one big component: member genre tags. I know I'll have to relate three models and it'll involve using association_proxy and collection classes.
Here's the relevant models (I had an abstract model to handle declaring the id and name fields, but that caused issues that I'll look at later):
class Member(db.Model):
__tablename__ = 'members'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.String(128))
##tagged_artists is a backref from MemberTaggedArtist
tags = association_proxy('tagged_artists', 'artist',
creator=lambda k,v: MemberTaggedArtist(tag=k, artist=v)
)
class Artist(db.Model):
__tablename__ = 'artists'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.String(128))
class Tag(db.Model):
__tablename__ = 'tags'
id = db.Column('id', db.Integer, primary_key=True)
name = db.Column('name', db.String(128))
class MemberTaggedArtist(db.Model):
__tablename__ = 'membertaggedartists'
id = db.Column('id', db.Integer, primary_key=True)
member_id = db.Column('member_id', db.Integer, db.ForeignKey('members.id'))
artist_id = db.Column('artist_id', db.Integer, db.ForeignKey('artists.id'))
tag_id = db.Column('tag_id', db.Integer, db.ForeignKey('tags.id'))
member = db.relationship(Member, backref=db.backref(
'tagged_artists',
collection_class=attribute_mapped_collection('tag')
)
artist = db.relationship(Artist, backref='applied_tags')
tag = db.relationship(Tag, backref='applied_artists')
What I'd like to happen is this:
>>> member = Member(name='justanr')
>>> artist = Artist(name='Gorguts')
>>> tag = Tag('Death Metal')
>>> member.tags['Death Metal'].append('Gorguts')
>>> member.tags
... {'Death Metal':['Gorguts']}
What currently happens is this (note, I built a mixin to handle repr calls):
>>> member.tags
... {Tag (ID:1 Name:Death Metal): MemberTaggedArtist (ID: 1 Member:justanr Artist:Gorguts Tag:Death Metal)}
I haven't been working with association_proxy long enough to understand what I'm doing wrong and even the brief tutorial in the documentation is giving me issues (I'm not sure why and I don't think it's because I'm using Flask-SQLAlchemy).
In short, I'm attempting to build an association proxy to create a dict of lists and I'm completely lost. I'm unsure what values I should proxy along, if using one middle table is over complicating this, and how to construct a secondary (and possibly tertiary) middle table
I finally came up with a solution. I forgot what I knew about SQL...lol I guess? But I recently watched several talks on SQLA that made me realize that I had an 'O' problem -- I was focusing specifically on the Object portion of ORM instead of thinking "SQLA just makes writing Queries easier, it doesn't do it for me." It also didn't help that FSQLA's query object lulled me in complacency either. Don't get me wrong, it's fantastic for basic queries and joins, but for something this complex...not really.
I also didn't realize that my question was actually several questions. One part was "How do I write this query with SQLA's tools?" The other was, "Help me puzzle the correct association_proxy to get this desired outcome."
I'm still not sure about the association proxy portion of it yet, or if I even want to do that anymore. But I finally puzzled the solution to my first question out:
# Query database for tag names and count for a particular artist
artist = models.Artist.find_by_some_means(magic, vars)
q = db.session.query(
models.Tag.id.label('id'),
models.Tag.name.label('tag'),
db.func.count(models.MemberTaggedArtist.member_id).label('count')
)
q = q.join(models.MemberTaggedArtist, models.Tag.id == models.MemberTaggedArtist.tag_id)
q = q.filter(models.MemberTaggedArtist.artist_id == artist.id)
q = q.group_by(models.MemberTaggedArtist.tag_id)
# order by tag count first, then tag name if two tags happen to have the same count
# there's probably a much better way to get there without recalculating the result
q = q.order_by(db.desc(db.func.count(models.MemberTaggedArtist.member_id)))
q = q.order_by(models.Tag.name)
My original planned query was something along the lines of:
SELECT tags.name AS tag, tags.id AS id, count(mta.member_id) AS count
FROM tags, membertaggedartists AS mta
WHERE tags.id = mta.tag_id AND mta.artist_id = :artist_id
GROUP BY id
ORDER BY count DESC, name ASC
And SQLA interprets my request as:
SELECT tags.id AS id, tags.name AS tag, count(membertaggedartists.member_id) AS count
FROM tags
JOIN membertaggedartists ON tags.id = membertaggedartists.tag_id
WHERE membertaggedartists.artist_id = :artist_id_1
GROUP BY membertaggedartists.tag_id
ORDER BY count(membertaggedartists.member_id) DESC, tags.name
Which is remarkably similar (and uses an explicit join unlike mine, which negates the need for one of my original WHERE clauses).

Categories