How would you augment this one-to-many relationship, making one on the 'many' side a distinguished one.
Using a concrete example, how would the notion of a capital be introduced in the following model? Would it be an additional one-to-one mapping? Would it interfere with the existing one-to-many? Would merely introducing a db.Boolean is_capital be sufficient? Would this last be idiomatic or is there a more fitting solution?
class City(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
country_id = db.Column(db.Integer, db.ForeignKey('country.id'))
country = db.relationship('Country', back_populates='cities')
class Country(db.Model):
id = db.Column(db.Integer, primary_key=True)
...
cities = db.relationship('City', back_populates='country')
You can either make a new Capital table, which contains one row per Country containing the Country ID and City ID, or you could just add a column to the Country table containing the Capital City ID. Either of these will provide more efficient lookup and more compact storage than a boolean is_capital for each City.
Related
I have the folloing many-to-many relationship defined in SQLAlchemy:
training_ids_association_table = db.Table(
"training_ids_association",
db.Model.metadata,
Column("training_id", Integer, ForeignKey("training_sessions.id")),
Column("ids_id", Integer, ForeignKey("image_data_sets.id")),
)
class ImageDataSet(db.Model):
__tablename__ = "image_data_sets"
id = Column(Integer, primary_key=True)
tags = Column(String)
trainings = relationship("TrainingSession", secondary=training_ids_association_table, back_populates="image_data_sets")
class TrainingSession(db.Model):
__tablename__ = "training_sessions"
id = Column(Integer, primary_key=True)
image_data_sets = relationship("ImageDataSet", secondary=training_ids_association_table, back_populates="trainings")
Note the field ImageDataSet.tags, which can contain a list of string items (i.e. tags), separated by a slash character. If possible I would rather stick to that format instead of creating a new table just for these tags.
What I want now is to query table TrainingSession for all entries that have a certain tag set ub their related ImageDataSet's. Now, if an ImageDataSet has only one tag saved in the tags field, then the following works:
TrainingSession.query.filter(TrainingSession.image_data_sets.any(tags=find_tag))
However, as soon as there are multiple tags in the tags field (e.g. something like "tag1/tag2/tag3"), then of course this filter above does not work any more. So I tried it with a like:
.filter(TrainingSession.image_data_sets.like(f'%{find_tag}%'))
But this leads to an NotImplementedError in SQLAlchemy. So is there a way to achieve what I am trying to do here, or do I necessarily need another table for the tags per ImageDataSet?
You can apply any filters on related model columns if you join this model first:
query = session.query(TrainingSession). \
join(TrainingSession.image_data_sets). \
filter(ImageDataSet.tags.like(f"%{find_tag}%"))
This query is translated to the following SQL statement:
SELECT training_sessions.id FROM training_sessions
JOIN training_ids_association ON training_sessions.id = training_ids_association.training_id
JOIN image_data_sets ON image_data_sets.id = training_ids_association.ids_id
WHERE image_data_sets.tags LIKE %(find_tag)s
Note that you may stumble to a problem with storing tags as strings with separators. If some records have tags tag1, tag12, tag123 they will all pass the filter LIKE '%tag1%'.
It would be better to switch to ARRAY column if your database supports this column type (PostgreSQL for example). Your column may be defined like this:
tags = Column(ARRAY(String))
And the query may look like this:
query = session.query(TrainingSession). \
join(TrainingSession.image_data_sets). \
filter(ImageDataSet.tags.any(find_tag))
In my flask project I need a table with a unique constraint on a column, if the values in an other column are identical. So I try to do something like that:
if premiumuser_id = "a value I don't know in advance" then track_id=unique
This is similar to Creating partial unique index with sqlalchemy on Postgres, but I use sqlite (where partial indexes should also be possible: https://docs.sqlalchemy.org/en/13/dialects/sqlite.html?highlight=partial%20indexes#partial-indexes) and the condition is different.
So far my code looks like that:
class Queue(db.Model):
id = db.Column(db.Integer, primary_key=True)
track_id = db.Column(db.Integer)
premiumuser_id = db.Column(
db.Integer, db.ForeignKey("premium_user.id"), nullable=False
)
__table_args__ = db.Index(
"idx_partially_unique_track",
"track_id",
unique=True,
sqlite_where="and here I'm lost",
)
All examples I've found operate with boolean or fixed values. How should the syntax for sqlite_where look like for the condition: premiumuser_id = "a value I don't know in advance"?
I have a list cities = ['Rome', 'Barcelona', 'Budapest', 'Ljubljana']
Then,
I have a sqlalchemy model as follows -
class Fly(Base):
__tablename__ = 'fly'
pkid = Column('pkid', INTEGER(unsigned=True), primary_key=True, nullable=False)
city = Column('city', VARCHAR(45), unique=True, nullable=False)
country = Column('country', VARCHAR(45))
flight_no = Column('Flight', VARCHAR(45))
I need to check if ALL the values in given cities list exists in fly table or not using sqlalchemy. Return true only if ALL the cities exists in table. Even if a single city doesn't exist in table, I need to return false and list of cities that doesn't exist. How to do that? Any ideas/hints/suggestions? I'm using MYSQL
One way would be to create a (temporary) relation based on the given list and take the set difference between it and the cities from the fly table. In other words create a union of the values from the list1:
from sqlalchemy import union, select, literal
cities_union = union(*[select([literal(v)]) for v in cities])
Then take the difference:
sq = cities_union.select().except_(select([Fly.city]))
and check that no rows are left after the difference:
res = session.query(~exists(sq)).scalar()
For a list of cities lacking from fly table omit the (NOT) EXISTS:
res = session.execute(sq).fetchall()
1 Other database vendors may offer alternative methods for producing relations from arrays, such as Postgresql and its unnest().
I should first put a warning that this can be a little longer question. So please bear with me. One of my projects (That I started very recently) had a table which looked like this
name (varchar)
scores (text)
Example value is like this -
['AAA', '[{"score": 3.0, "subject": "Algebra"}, {"score": 5.0, "subject": "geography"}]']
As you can see the second field is the string representation of a JSON array.
In good faith, I redesigned this table into the following two tables
table-name:
id - Int, auto_inc, primary_key
name - varchar
table-scores:
id - int, auto_inc, primary_key
subject - varchar
score- float
name - int, FK to table-name
I have this following code in my python file to represent the tables (At this point, I assume that you are familiar with Python and SqlAlchemy, and so I will skip the specific imports and all to make it shorter)
Base = declarative_base()
class Name(Base):
__tabelname__ = "name_table"
id = Column(Integer, primary_key=True)
name = Column(String(255), index=True)
class Score(Base):
__tablename__ = "score_table"
id = Column(Integer, primary_key=True)
subject = Column(String(255), index=True)
score = Column(Float)
name = Column(ForeignKey('Name.id'), nullable=False, index=True)
Name = relationship(u'Name')
The first table has ~ 778284 rows whereas the second table has ~ 907214 rows.
After declaring them and populating them using the initial data I went to make an experiment. The goal - To find all the subjects whose score is > 5.0 for a given name. (Here, for a second, please consider that name is unique across the DB), and then run the same process 100 times and then take the average to find out how long this query is taking. Following is what I am doing (Please imagine and session is a valid db session I obtained before calling this function.)
def test_time():
for i in range(0, 100):
scores = session.query(Score, Name.name).join(Name).filter(Name.name=='AAA').filter(Score.score>5.0).all()
array = []
for score in scores:
array.append((score[0].subject, score[0].score))
I am not doing anything with the array I am creating. But I am calling this function which runs this query 100 times and I am using default_timer from timeit to measure the time elapsed. Following is the result for three runs -
Avarage - 0.10969632864
Avarage - 0.105748419762
Avarage - 0.105768380165
Now, as I was curious, so what I did is that I created another quick and dirty python file and declared this following class there -
class DirtyTable(Base):
__tablename__ = "dirty_table"
name = Column(String(255), primary_key=True)
scores = Column(Text)
And then I created the following function to achieve the same goal but this time reading the data from the second field, parse it back to python dict, run a for loop over all the elements of the list, add in the array only those elements whose score value is > 5.0. Here it goes -
def dirty_timer():
for i in range(0,100):
scores = session.query(DirtyTable).filter_by(name='AAA').all()
for score in scores:
x = json.loads(score.scores)
array = []
for member in x:
if x['score'] > 5.0:
array.append((x['subject'], x['score']))
This is the time of three runs -
Avarage - 0.0288228917122
Avarage - 0.0296836185455
Avarage - 0.0298663306236
Am I missing something? Normalizing the DB (I believe this is all what I tried to do by breaking the original table in two tables) gave me worse result. How is that possible. What is wrong with my approach?
Please let me know your thoughts. Sorry for the long post but had to explain everything properly.
I have a Company that has juniors and seniors. I would like to add users by adding groups instead of individually. Imagine I have Group 1, made of 3 seniors, instead of adding those 3 individually, I'd like to be able to just add Group 1, and have the 3 seniors automatically added to the list of seniors. I'm a little stuck in my current implementation:
class Company(django.model):
juniors = m2m(User)
seniors = m2m(User)
junior_groups = m2m(Group)
senior_groups = m2m(Group)
# currently, I use this signal to add users from a group when a group is added to company
def group_changed(sender, **kwargs):
if kwargs['action'] != 'post_add': return None
co = kwargs['instance']
group_id = kwargs['pk_set'].pop()
juniors = MyGroup.objects.get(pk=group_id).user_set.all()
co.juniors = co.juniors.all() | juniors
co.save()
m2m_changed.connect(...)
The main problem is this looks messy and I have to repeat it for seniors, and potentially other types of users as well.
Is there a more straightforward way to do what I'm trying to do?
Thanks in advance!
are you trying to optimize and avoid having the group object used in your queries ?
if you are ok with a small join query you could use this syntax to get the juniors in company with id = COMP_ID
this way you don't need to handle the users directly and copy them all the time
juniors = User.objects.filter(groups__company_id = COMP_ID , groups__type = Junior)
seniors = User.objects.filter(groups__company_id = COMP_ID , groups__type = Senior)
assuming that
you add related_name "groups" to your m2m relation between groups and users
your groups have type which you manage
you called your foreign-key field 'company' on you Group model
this query can be added as a Property to the company Model , so it give the same programmatic peace of mind