SQLAlchemy: partial unique constraint where a field has a certain value - python

In my flask project I need a table with a unique constraint on a column, if the values in an other column are identical. So I try to do something like that:
if premiumuser_id = "a value I don't know in advance" then track_id=unique
This is similar to Creating partial unique index with sqlalchemy on Postgres, but I use sqlite (where partial indexes should also be possible: https://docs.sqlalchemy.org/en/13/dialects/sqlite.html?highlight=partial%20indexes#partial-indexes) and the condition is different.
So far my code looks like that:
class Queue(db.Model):
id = db.Column(db.Integer, primary_key=True)
track_id = db.Column(db.Integer)
premiumuser_id = db.Column(
db.Integer, db.ForeignKey("premium_user.id"), nullable=False
)
__table_args__ = db.Index(
"idx_partially_unique_track",
"track_id",
unique=True,
sqlite_where="and here I'm lost",
)
All examples I've found operate with boolean or fixed values. How should the syntax for sqlite_where look like for the condition: premiumuser_id = "a value I don't know in advance"?

Related

How to query with like() when using many-to-many relationships in SQLAlchemy?

I have the folloing many-to-many relationship defined in SQLAlchemy:
training_ids_association_table = db.Table(
"training_ids_association",
db.Model.metadata,
Column("training_id", Integer, ForeignKey("training_sessions.id")),
Column("ids_id", Integer, ForeignKey("image_data_sets.id")),
)
class ImageDataSet(db.Model):
__tablename__ = "image_data_sets"
id = Column(Integer, primary_key=True)
tags = Column(String)
trainings = relationship("TrainingSession", secondary=training_ids_association_table, back_populates="image_data_sets")
class TrainingSession(db.Model):
__tablename__ = "training_sessions"
id = Column(Integer, primary_key=True)
image_data_sets = relationship("ImageDataSet", secondary=training_ids_association_table, back_populates="trainings")
Note the field ImageDataSet.tags, which can contain a list of string items (i.e. tags), separated by a slash character. If possible I would rather stick to that format instead of creating a new table just for these tags.
What I want now is to query table TrainingSession for all entries that have a certain tag set ub their related ImageDataSet's. Now, if an ImageDataSet has only one tag saved in the tags field, then the following works:
TrainingSession.query.filter(TrainingSession.image_data_sets.any(tags=find_tag))
However, as soon as there are multiple tags in the tags field (e.g. something like "tag1/tag2/tag3"), then of course this filter above does not work any more. So I tried it with a like:
.filter(TrainingSession.image_data_sets.like(f'%{find_tag}%'))
But this leads to an NotImplementedError in SQLAlchemy. So is there a way to achieve what I am trying to do here, or do I necessarily need another table for the tags per ImageDataSet?
You can apply any filters on related model columns if you join this model first:
query = session.query(TrainingSession). \
join(TrainingSession.image_data_sets). \
filter(ImageDataSet.tags.like(f"%{find_tag}%"))
This query is translated to the following SQL statement:
SELECT training_sessions.id FROM training_sessions
JOIN training_ids_association ON training_sessions.id = training_ids_association.training_id
JOIN image_data_sets ON image_data_sets.id = training_ids_association.ids_id
WHERE image_data_sets.tags LIKE %(find_tag)s
Note that you may stumble to a problem with storing tags as strings with separators. If some records have tags tag1, tag12, tag123 they will all pass the filter LIKE '%tag1%'.
It would be better to switch to ARRAY column if your database supports this column type (PostgreSQL for example). Your column may be defined like this:
tags = Column(ARRAY(String))
And the query may look like this:
query = session.query(TrainingSession). \
join(TrainingSession.image_data_sets). \
filter(ImageDataSet.tags.any(find_tag))

About unique=True and (unique=True, index=True) in sqlalchemy

When I create tables use flask-sqlalchemy like this:
class Te(Model):
__tablename__ = 'tt'
id = Column(db.Integer(), primary_key=True)
t1 = Column(db.String(80), unique=True, )
t3 = Column(db.String(80), unique=True, index=True, )
and In my Sequel Pro , I get the table create info:
CREATE TABLE `tt` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`t1` varchar(80) DEFAULT NULL,
`t3` varchar(80) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `t1` (`t1`),
UNIQUE KEY `ix_tt_t3` (`t3`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
this means t1 is entirely same as t3 in MySQL? So when you define unique=True, it's not require to define index=True?
Thanks.
I think you have a term confusion with the index purpose in sqlalchemy. In sql databases index are used speed up query performance.
According to the sqlalchemy documentation of defining constraints and indexes.
You would notice the use of the index key because the sql code generated is:
UNIQUE KEY `ix_tt_t3` (`t3`)
The way how sqlalchemy nouns the index is idx_%columnlabbel. And that matches with the sql code generated.
So the use or not of index it is only related with performance and the unique key means that the column values cannot be repeated all along of the same column in the 'tt' table.
Hope this helps,
It is not required. A unique constraint is more often than not implemented using a unique index, but you need not care about that detail, if all you want is uniqueness.

Identify what values in a list doesn't exist in a Table column using SQLAlchemy

I have a list cities = ['Rome', 'Barcelona', 'Budapest', 'Ljubljana']
Then,
I have a sqlalchemy model as follows -
class Fly(Base):
__tablename__ = 'fly'
pkid = Column('pkid', INTEGER(unsigned=True), primary_key=True, nullable=False)
city = Column('city', VARCHAR(45), unique=True, nullable=False)
country = Column('country', VARCHAR(45))
flight_no = Column('Flight', VARCHAR(45))
I need to check if ALL the values in given cities list exists in fly table or not using sqlalchemy. Return true only if ALL the cities exists in table. Even if a single city doesn't exist in table, I need to return false and list of cities that doesn't exist. How to do that? Any ideas/hints/suggestions? I'm using MYSQL
One way would be to create a (temporary) relation based on the given list and take the set difference between it and the cities from the fly table. In other words create a union of the values from the list1:
from sqlalchemy import union, select, literal
cities_union = union(*[select([literal(v)]) for v in cities])
Then take the difference:
sq = cities_union.select().except_(select([Fly.city]))
and check that no rows are left after the difference:
res = session.query(~exists(sq)).scalar()
For a list of cities lacking from fly table omit the (NOT) EXISTS:
res = session.execute(sq).fetchall()
1 Other database vendors may offer alternative methods for producing relations from arrays, such as Postgresql and its unnest().

Country-to-City + Country-to-Capital = One-To-Many + One-To-One?

How would you augment this one-to-many relationship, making one on the 'many' side a distinguished one.
Using a concrete example, how would the notion of a capital be introduced in the following model? Would it be an additional one-to-one mapping? Would it interfere with the existing one-to-many? Would merely introducing a db.Boolean is_capital be sufficient? Would this last be idiomatic or is there a more fitting solution?
class City(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
country_id = db.Column(db.Integer, db.ForeignKey('country.id'))
country = db.relationship('Country', back_populates='cities')
class Country(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
cities = db.relationship('City', back_populates='country')
You can either make a new Capital table, which contains one row per Country containing the Country ID and City ID, or you could just add a column to the Country table containing the Capital City ID. Either of these will provide more efficient lookup and more compact storage than a boolean is_capital for each City.

MySQL, join vs string parsing

I should first put a warning that this can be a little longer question. So please bear with me. One of my projects (That I started very recently) had a table which looked like this
name (varchar)
scores (text)
Example value is like this -
['AAA', '[{"score": 3.0, "subject": "Algebra"}, {"score": 5.0, "subject": "geography"}]']
As you can see the second field is the string representation of a JSON array.
In good faith, I redesigned this table into the following two tables
table-name:
id - Int, auto_inc, primary_key
name - varchar
table-scores:
id - int, auto_inc, primary_key
subject - varchar
score- float
name - int, FK to table-name
I have this following code in my python file to represent the tables (At this point, I assume that you are familiar with Python and SqlAlchemy, and so I will skip the specific imports and all to make it shorter)
Base = declarative_base()
class Name(Base):
__tabelname__ = "name_table"
id = Column(Integer, primary_key=True)
name = Column(String(255), index=True)
class Score(Base):
__tablename__ = "score_table"
id = Column(Integer, primary_key=True)
subject = Column(String(255), index=True)
score = Column(Float)
name = Column(ForeignKey('Name.id'), nullable=False, index=True)
Name = relationship(u'Name')
The first table has ~ 778284 rows whereas the second table has ~ 907214 rows.
After declaring them and populating them using the initial data I went to make an experiment. The goal - To find all the subjects whose score is > 5.0 for a given name. (Here, for a second, please consider that name is unique across the DB), and then run the same process 100 times and then take the average to find out how long this query is taking. Following is what I am doing (Please imagine and session is a valid db session I obtained before calling this function.)
def test_time():
for i in range(0, 100):
scores = session.query(Score, Name.name).join(Name).filter(Name.name=='AAA').filter(Score.score>5.0).all()
array = []
for score in scores:
array.append((score[0].subject, score[0].score))
I am not doing anything with the array I am creating. But I am calling this function which runs this query 100 times and I am using default_timer from timeit to measure the time elapsed. Following is the result for three runs -
Avarage - 0.10969632864
Avarage - 0.105748419762
Avarage - 0.105768380165
Now, as I was curious, so what I did is that I created another quick and dirty python file and declared this following class there -
class DirtyTable(Base):
__tablename__ = "dirty_table"
name = Column(String(255), primary_key=True)
scores = Column(Text)
And then I created the following function to achieve the same goal but this time reading the data from the second field, parse it back to python dict, run a for loop over all the elements of the list, add in the array only those elements whose score value is > 5.0. Here it goes -
def dirty_timer():
for i in range(0,100):
scores = session.query(DirtyTable).filter_by(name='AAA').all()
for score in scores:
x = json.loads(score.scores)
array = []
for member in x:
if x['score'] > 5.0:
array.append((x['subject'], x['score']))
This is the time of three runs -
Avarage - 0.0288228917122
Avarage - 0.0296836185455
Avarage - 0.0298663306236
Am I missing something? Normalizing the DB (I believe this is all what I tried to do by breaking the original table in two tables) gave me worse result. How is that possible. What is wrong with my approach?
Please let me know your thoughts. Sorry for the long post but had to explain everything properly.

Categories