Identify what values in a list doesn't exist in a Table column using SQLAlchemy - python

I have a list cities = ['Rome', 'Barcelona', 'Budapest', 'Ljubljana']
Then,
I have a sqlalchemy model as follows -
class Fly(Base):
__tablename__ = 'fly'
pkid = Column('pkid', INTEGER(unsigned=True), primary_key=True, nullable=False)
city = Column('city', VARCHAR(45), unique=True, nullable=False)
country = Column('country', VARCHAR(45))
flight_no = Column('Flight', VARCHAR(45))
I need to check if ALL the values in given cities list exists in fly table or not using sqlalchemy. Return true only if ALL the cities exists in table. Even if a single city doesn't exist in table, I need to return false and list of cities that doesn't exist. How to do that? Any ideas/hints/suggestions? I'm using MYSQL

One way would be to create a (temporary) relation based on the given list and take the set difference between it and the cities from the fly table. In other words create a union of the values from the list1:
from sqlalchemy import union, select, literal
cities_union = union(*[select([literal(v)]) for v in cities])
Then take the difference:
sq = cities_union.select().except_(select([Fly.city]))
and check that no rows are left after the difference:
res = session.query(~exists(sq)).scalar()
For a list of cities lacking from fly table omit the (NOT) EXISTS:
res = session.execute(sq).fetchall()
1 Other database vendors may offer alternative methods for producing relations from arrays, such as Postgresql and its unnest().

Related

How to query with like() when using many-to-many relationships in SQLAlchemy?

I have the folloing many-to-many relationship defined in SQLAlchemy:
training_ids_association_table = db.Table(
"training_ids_association",
db.Model.metadata,
Column("training_id", Integer, ForeignKey("training_sessions.id")),
Column("ids_id", Integer, ForeignKey("image_data_sets.id")),
)
class ImageDataSet(db.Model):
__tablename__ = "image_data_sets"
id = Column(Integer, primary_key=True)
tags = Column(String)
trainings = relationship("TrainingSession", secondary=training_ids_association_table, back_populates="image_data_sets")
class TrainingSession(db.Model):
__tablename__ = "training_sessions"
id = Column(Integer, primary_key=True)
image_data_sets = relationship("ImageDataSet", secondary=training_ids_association_table, back_populates="trainings")
Note the field ImageDataSet.tags, which can contain a list of string items (i.e. tags), separated by a slash character. If possible I would rather stick to that format instead of creating a new table just for these tags.
What I want now is to query table TrainingSession for all entries that have a certain tag set ub their related ImageDataSet's. Now, if an ImageDataSet has only one tag saved in the tags field, then the following works:
TrainingSession.query.filter(TrainingSession.image_data_sets.any(tags=find_tag))
However, as soon as there are multiple tags in the tags field (e.g. something like "tag1/tag2/tag3"), then of course this filter above does not work any more. So I tried it with a like:
.filter(TrainingSession.image_data_sets.like(f'%{find_tag}%'))
But this leads to an NotImplementedError in SQLAlchemy. So is there a way to achieve what I am trying to do here, or do I necessarily need another table for the tags per ImageDataSet?
You can apply any filters on related model columns if you join this model first:
query = session.query(TrainingSession). \
join(TrainingSession.image_data_sets). \
filter(ImageDataSet.tags.like(f"%{find_tag}%"))
This query is translated to the following SQL statement:
SELECT training_sessions.id FROM training_sessions
JOIN training_ids_association ON training_sessions.id = training_ids_association.training_id
JOIN image_data_sets ON image_data_sets.id = training_ids_association.ids_id
WHERE image_data_sets.tags LIKE %(find_tag)s
Note that you may stumble to a problem with storing tags as strings with separators. If some records have tags tag1, tag12, tag123 they will all pass the filter LIKE '%tag1%'.
It would be better to switch to ARRAY column if your database supports this column type (PostgreSQL for example). Your column may be defined like this:
tags = Column(ARRAY(String))
And the query may look like this:
query = session.query(TrainingSession). \
join(TrainingSession.image_data_sets). \
filter(ImageDataSet.tags.any(find_tag))

SQLAlchemy: partial unique constraint where a field has a certain value

In my flask project I need a table with a unique constraint on a column, if the values in an other column are identical. So I try to do something like that:
if premiumuser_id = "a value I don't know in advance" then track_id=unique
This is similar to Creating partial unique index with sqlalchemy on Postgres, but I use sqlite (where partial indexes should also be possible: https://docs.sqlalchemy.org/en/13/dialects/sqlite.html?highlight=partial%20indexes#partial-indexes) and the condition is different.
So far my code looks like that:
class Queue(db.Model):
id = db.Column(db.Integer, primary_key=True)
track_id = db.Column(db.Integer)
premiumuser_id = db.Column(
db.Integer, db.ForeignKey("premium_user.id"), nullable=False
)
__table_args__ = db.Index(
"idx_partially_unique_track",
"track_id",
unique=True,
sqlite_where="and here I'm lost",
)
All examples I've found operate with boolean or fixed values. How should the syntax for sqlite_where look like for the condition: premiumuser_id = "a value I don't know in advance"?

How to return specific dictionary keys from within a nested list from a jsonb column in sqlalchemy

I am attempting to return some named columns from a jsonb data set that is stored with PostgreSQL.
I am able to run a raw query that meets my needs directly, however I am trying to run the query utilising SQLAlchemy, in order to ensure that my code is 'pythonic' and easy to read.
The query that returns the correct result (two columns) is:
SELECT
tmp.item->>'id',
tmp.item->>'name'
FROM (SELECT jsonb_array_elements(t.data -> 'users') AS item FROM tpeople t) as tmp
Example json (each user has 20+ columns)
{ "results":247, "users": [
{"id":"202","regdate":"2015-12-01","name":"Bob Testing"},
{"id":"87","regdate":"2014-12-12","name":"Sally Testing"},
{"id":"811", etc etc}
...
]}
The table is simple enough, with a PK, datetime of json extraction, and the jsonb column for the extract
CREATE TABLE tpeople
(
record_id bigint NOT NULL DEFAULT nextval('"tpeople_record_id_seq"'::regclass) ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 9223372036854775807 CACHE 1 ),
scrape_time timestamp without time zone NOT NULL,
data jsonb NOT NULL,
CONSTRAINT "tpeople_pkey" PRIMARY KEY (record_id)
);
Additionally I have a People Class that looks as follows:
class people(Base):
__tablename__ = 'tpeople'
record_id = Column(BigInteger, primary_key=True, server_default=text("nextval('\"tpeople_record_id_seq\"'::regclass)"))
scrape_time = Column(DateTime, nullable=False)
data = Column(JSONB(astext_type=Text()), nullable=False)
Presently my code to return the two columns looks like this:
from db.db_conn import get_session // Generic connector for my db
from model.models import people
from sqlalchemy import func,
sess = get_session()
sub = sess.query(func.jsonb_array_elements(people.data["users"]).label("item")).subquery()
test = sess.query(sub.c.item).select_entity_from(sub).all()
SQLAlchemy generates the following SQL:
SELECT anon_1.item AS anon_1_item
FROM (SELECT jsonb_array_elements(tpeople.data -> %(data_1)s) AS item
FROM tpeople) AS anon_1
{'data_1': 'users'}
But nothing I seem to do can allow me to only get certain columns within the item itself like the raw SQL I can write. Some of the approaches I have tried as follows (they all error out):
test = sess.query("sub.item.id").select_entity_from(sub).all()
test = sess.query(sub.item.["id"]).select_entity_from(sub).all()
aas = func.jsonb_to_recordset(people.data["users"])
res = sess.query("id").select_from(aas).all()
sub = select(func.jsonb_array_elements(people.data["users"]).label("item"))
Presently I can extract the columns I need in a simple for loop, but this seems like a hacky way to do it, and I'm sure there is something dead obvious I'm missing.
for row in test:
print(row.item['id'])
Searched for a few hours eventually found some who accidentally did this while trying to get another result.
sub = sess.query(func.jsonb_array_elements(people.data["users"]).label("item")).subquery()
tmp = sub.c.item.op('->>')('id')
tmp2 = sub.c.item.op('->>')('name')
test = sess.query(tmp, tmp2).all()

Country-to-City + Country-to-Capital = One-To-Many + One-To-One?

How would you augment this one-to-many relationship, making one on the 'many' side a distinguished one.
Using a concrete example, how would the notion of a capital be introduced in the following model? Would it be an additional one-to-one mapping? Would it interfere with the existing one-to-many? Would merely introducing a db.Boolean is_capital be sufficient? Would this last be idiomatic or is there a more fitting solution?
class City(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
country_id = db.Column(db.Integer, db.ForeignKey('country.id'))
country = db.relationship('Country', back_populates='cities')
class Country(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
cities = db.relationship('City', back_populates='country')
You can either make a new Capital table, which contains one row per Country containing the Country ID and City ID, or you could just add a column to the Country table containing the Capital City ID. Either of these will provide more efficient lookup and more compact storage than a boolean is_capital for each City.

Dynamically add filter to SQLAlchemy TextClause

Assume I have a SQLAlchemy table which looks like:
class Country:
name = VARCHAR
population = INTEGER
continent = VARCHAR
num_states = INTEGER
My application allow seeing name and population for all Countries. So I have a TextClause which looks like
"select name, population from Country"
I allow raw queries in my application so I don't have option to change this to selectable.
At runtime, I want to allow my users to choose a field name and put a field value on which I want to allow filtering. eg: User can say I only want to see name and population for countries where Continent is Asia. So I dynamically want to add the filter
.where(Country.c.continent == 'Asia')
But I can't add .where to a TextClause.
Similarly, my user may choose to see name and population for countries where num_states is greater than 10. So I dynamically want to add the filter
.where(Country.c.num_states > 10)
But again I can't add .where to a TextClause.
What are the options I have to solve this problem?
Could subquery help here in any way?
Please add a filter based on the conditions. filter is used for adding where conditions in sqlalchemy.
Country.query.filter(Country.num_states > 10).all()
You can also do this:
query = Country.query.filter(Country.continent == 'Asia')
if user_input == 'states':
query = query.filter(Country.num_states > 10)
query = query.all()
This is not doable in a general sense without parsing the query. In relational algebra terms, the user applies projection and selection operations to a table, and you want to apply selection operations to it. Since the user can apply arbitrary projections (e.g. user supplies SELECT id FROM table), you are not guaranteed to be able to always apply your filters on top, so you have to apply your filters before the user does. That means you need to rewrite it to SELECT id FROM (some subquery), which requires parsing the user's query.
However, we can sort of cheat depending on the database that you are using, by having the database engine do the parsing for you. The way to do this is with CTEs, by basically shadowing the table name with a CTE.
Using your example, it looks like the following. User supplies query
SELECT name, population FROM country;
You shadow country with a CTE:
WITH country AS (
SELECT * FROM country
WHERE continent = 'Asia'
) SELECT name, population FROM country;
Unfortunately, because of the way SQLAlchemy's CTE support works, it is tough to get it to generate a CTE for a TextClause. The solution is to basically generate the string yourself, using a custom compilation extension, something like this:
class WrappedQuery(Executable, ClauseElement):
def __init__(self, name, outer, inner):
self.name = name
self.outer = outer
self.inner = inner
#compiles(WrappedQuery)
def compile_wrapped_query(element, compiler, **kwargs):
return "WITH {} AS ({}) {}".format(
element.name,
compiler.process(element.outer),
compiler.process(element.inner))
c = Country.__table__
cte = select(["*"]).select_from(c).where(c.c.continent == "Asia")
query = WrappedQuery("country", cte, text("SELECT name, population FROM country"))
session.execute(query)
From my tests, this only works in PostgreSQL. SQLite and SQL Server both treat it as recursive instead of shadowing, and MySQL does not support CTEs.
I couldn't find anything nice for this in the documentation for this. I ended up resorting to pretty much just string processing.... but at least it works!
from sqlalchemy.sql import text
query = """select name, population from Country"""
if continent is not None:
additional_clause = """WHERE continent = {continent};"""
query = query + additional_clause
text_clause = text(
query.format(
continent=continent,
),
)
else:
text_clause = text(query)
with sql_connection() as conn:
results = conn.execute(text_clause)
You could also chain this logic with more clauses, although you'll have to create a boolean flag for the first WHERE clause and then use AND for the subsequent ones.

Categories