Is this the best way to construct my query in SQLAlchemy - python

I am tracking laptimes for drivers that set fastest laps on a circuit over a period of a month. Every few days I update their latest laptimes so each month they might have many laps completed. I'm trying to build a SQL Alchemy model that returns a list of all laptimes for that month (called a challenge) but filters to the fastest times per driver.
I have this working, but it doesn't feel very pythonic. I don't know if the filtering should be part of a relationship in the model (e.g. a field that returns the list of fastest laps, per driver, per challenge) or if this is done outside of the model as I construct the view (using Flask)
Here is my model:
class Challenges(db.Model):
__tablename__ = 'challenges'
id = db.Column(db.Integer, primary_key=True)
season_id = db.Column(db.Integer, db.ForeignKey('seasons.id'))
type = db.Column(db.Text)
track_id = db.Column(db.Integer, db.ForeignKey('tracks.id'))
layout_id = db.Column(db.Integer, db.ForeignKey('layouts.id'))
car_class_id = db.Column(db.Integer, db.ForeignKey('car_classes.id'))
car_id = db.Column(db.Integer, db.ForeignKey('cars.id'))
# one to many
# laps = db.relationship('Laps', backref='challenge',lazy='dynamic')
# one to one
track = db.relationship('Tracks', backref='challenge',uselist=False)
layout = db.relationship('Layouts', backref='challenge',uselist=False)
car_class = db.relationship('Car_classes', backref='challenge',uselist=False)
car = db.relationship('Cars', backref='challenge',uselist=False)
What I'd like to do is replace the 'Laps' relationship (commented out above which returns ALL laps) with one that returns only the fastest lap per driver for that challenge. I currently have this SQL Alchemy code that runs in the Flask blueprint to get the data ready to display, but it feels like this should be inside the model as a 'leaderboard' relationship. I hope this makes sense. Here is my query.
leaderboard = db.session.query(
Laps.challenge_id,
Drivers.id,
Drivers.name,
Drivers.country,
func.min(Laps.laptime),
Laps.datetime,
Laps.verified
).join(Drivers, Laps.driver_id == Drivers.id
).filter(Laps.challenge_id == challenges[0].id
).group_by(
Laps.driver_id
).order_by(
Laps.laptime.asc()
).all()
So this DOES give me what I need, but it feels really clunky and I'm trying to learn Python the correct way and don't want to use what feels like a work around. If I simply place this code in the model it fails (Laps is not defined) and its not defined as a relationship like I think it should be.
Very new to SQLAlchemy so apologies if this is a simple error on my part. Its more important to me to do this right than simply to get it working. Thanks in advance for any expertise shared.

Related

SQL alchemy query a single column

I am sure this has been answered before and I see a few related answers but none seem to be the issue I am facing. I am using a SQL Alchemy model that uses a SQL server DB underneath and I am using it to query the DB with a session. The normal queries etc work fine with no errors. However when I ask for only one field instead of all it gives me an error (see later).
Basically boiled down to the simplest I have a model like so:
class FactoryShop(Base):
# case insensitive, refers to the actual table in the DB called factoryshop
__tablename__ = 'factoryshop'
ID = Column(Integer, primary_key=True, autoincrement=True)
Name = Column(String(255))
Parts = Column(Integer)
Strength = Column(Integer)
Average = Column(Float)
...
Using a session I can query all columns like so:
>>> session.query(FactoryShop).filter(FactoryShop.Parts==20000)
<sqlalchemy.orm.query.Query object at 0x10578c280>
However if I try to just ask for the Name like below I get a long error. I searched for that specific error which involves 'selectable' but I didn't come across a relevant answer.
>>> session.query(FactoryShop.Name).filter(FactoryShop.Parts==20000)
AttributeError: Neither 'AnnotatedColumn' object nor 'Comparator' object has an attribute 'selectable'
If there is already an answer please point me to it and I will delete this one.
You are not querying for it correctly. But you are very close.
result = session.query(FactoryShop).filter(FactoryShop.Parts==20000).first()
Then, you can call result.Name to get the name of that FactoryShop Object.

Filtering a relationship attribute in SQLAlchemy

I have some code with a Widget object that must undergo some processing periodically. Widgets have a relationship to a Process object that tracks individual processing attempts and holds data about those attempts, such as state information, start and end times, and the result. The relationship looks something like this:
class Widget(Base):
_tablename_ = 'widget'
id = Column(Integer, primary_key=True)
name = Column(String)
attempts = relationship('Process')
class Process(Base):
_tablename_ = 'process'
id = Column(Integer, primary_key=True)
widget_id = Column(Integer, ForeignKey='widget.id'))
start = Column(DateTime)
end = Column(DateTime)
success = Column(Boolean)
I want to have a method on Widget to check whether it's time to process that widget yet, or not. It needs to look at all the attempts, find the most recent successful one, and see if it is older than the threshold.
One option is to iterate Widget.attempts using a list comprehension. Assuming now and delay are reasonable datetime and timedelta objects, then something like this works when defined as a method on Widget:
def ready(self):
recent_success = [attempt for attempt in self.attempts if attempt.success is True and attempt.end >= now - delay]
if recent_success:
return False
return True
That seems like good idiomatic Python, but it's not making good use of the power of the SQL database backing the data, and it's probably less efficient than running a similar SQL query especially once there are a large number of Process objects in the attempts list. I'm having a hard time figuring out the best way to implement this as a query, though.
It's easy enough to run the query inside Widget something like this:
def ready(self):
recent_success = session.query(Process).filter(
and_(
Process.widget_id == self.id,
Process.success == True,
Process.end >= now - delay
)
).order_by(Process.end.desc()).first()
if recent_success:
return False
return True
But I run into problems in unit tests with getting session set properly inside the module that defines Widget. It seems to me that's a poor style choice, and probably not how SQLAlchemy objects are meant to be structured.
I could make the ready() function something external to the Widget class, which would fix the problems with setting session in unit tests, but that seems like poor OO structure.
I think the ideal would be if I could somehow filter Widget.attempts with SQL-like code that's more efficient than a list comprehension, but I haven't found anything that suggests that's possible.
What is actually the best approach for something like this?
You are thinking in the right direction. Any solution within the Widget instance implies you need to process all instances. Seeking the external process would have better performance and easier testability.
You can get all the Widget instances which need to be scheduled for next processing using this query:
q = (
session
.query(Widget)
.filter(Widget.attempts.any(and_(
Process.success == True,
Process.end >= now - delay,
)))
)
widgets_to_process = q.all()
If you really want to have a property on the model, i would not create a separate query, but just use the relationship:
def ready(self, at_time):
successes = [
attempt
for attempt in sorted(self.attempts, key=lambda v: v.end)
if attempt.success and attempt.end >= at_time # at_time = now - delay
]
return bool(successes)

Searching an exact match for the attribute of a relation

I have following SQLAlchemy DB models describing parts that go through several production steps:
class Part(db.Model):
part_number = db.Column(db.Integer, primary_key=True)
production_steps = db.relationship("ProductionStep")
class ProductionStep(db.Model):
id = db.Column(db.Integer, primary_key=True)
part_number = db.Column(db.Integer, db.ForeignKey('part.part_number'))
name = db.Column(db.String)
status = db.Column(db.String)
Now I'd like to query all parts that have a production step with a certain name and status through a Flask-Restless search query.
Is this possible with a Flask-Restless search query? If yes, how can I achieve the specified behaviour?
I'm using Flask-Restless version 0.17.0.
I have tried following filters:
q={"filters":[{"and":[{"name":"production_steps__name","op":"==","val":"cutting"},
{"name":"production_steps__status","op":"any","val":"done"}]}]}
Which leads to following error:
sqlalchemy.exc.InvalidRequestError: Can't compare a collection to an object or collection; use contains() to test for membership.
Which sounds reasonable, so I also tried the following:
q={"filters":[{"and":
[{"name":"production_steps","op":"any","val":{"name":"name","op":"eq","val":"cutting"}},
{"name":"production_steps","op":"any","val":{"name":"status","op":"eq","val":"done"}}]
}]}
This query does work, but it does return parts that match only one of the criterions (e.g. parts with a production step "cutting" where the status is not "done")
As discussed in the comments, Flask-Restless does not seem to support queries like this.
Two possible workarrounds:
Do two search queries: First get all Ids of ProductionSteps with the correct name and status. Second query all Parts that have one of Ids in the production_steps array with the in operator.
Implement your own route that returns the Parts wanted. Code might look something like this:
#app.route('/part/outstanding', methods=['GET'])
def parts_outstanding():
result = Part.query.join(Part.production_steps) \
.filter_by(status='outstanding').all()
#Custom serialization logic
result_json = list(map(lambda part: part.to_dict(), result))
return jsonify(
num_results=len(result),
objects=result_json,
page=1,
total_pages=1
)
I'd advocate for doing two search queries. Implementing your own route seems kinda hacky.

Python sqlalchemy dynamic relationship

I'm trying to understand if it's possible to do something with Sqlalchemy, or if I'm thinking about it the wrong way. As an example, say I have two (these are just examples) classes:
class Customer(db.Model):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
name = Column(String)
addresses = relationship('Address')
class Address(db.Model):
__tablename__ = 'address'
if = Column(Integer, primary_key=True)
address = Column(String)
home = Column(Boolean)
customer_id = Column(Integer, ForeignKey('customer.id'))
And later I want to perform a query that gets the customer and just their home address. Is it possible to do that with something like this:
db.session.query(Customer).join(Address, Address.home == True)
Would the above further refine/restrict the join so the results would only get the home address?
When in doubt if a query construct is what you want, try printing it:
In [29]: db.session.query(Customer).join(Address, Address.home == True)
Out[29]: <sqlalchemy.orm.query.Query at 0x7f14fa651e80>
In [30]: print(_)
SELECT customer.id AS customer_id, customer.name AS customer_name
FROM customer JOIN address ON address.home = true
It is clear that this is not what you want. Every customer is joined with every address that is a home address. Due to how entities are handled this might not be obvious at first. The duplicate rows per customer are ignored and you get a result of distinct Customer entities, even though the underlying query was wrong. The query also effectively just ignores the joined Addresses when forming results.
The easiest solution would be to just query for customer and address tuples with required criteria:
db.session.query(Customer, Address).\
join(Address).\
filter(Address.home)
You could also do something like this
db.session.query(Customer).\
join(Address, (Customer.id == Address.customer_id) & Address.home).\
options(contains_eager(Customer.addresses))
but I'd highly recommend against it. You'd be lying to yourself about what the relationship collection contains and that might backfire at some point. Instead you should add a new one to one relationship to Customer with the custom join condition:
class Customer(db.Model):
...
home_address = relationship(
'Address', uselist=False,
primaryjoin='and_(Customer.id == Address.customer_id, Address.home)')
and then you could use a joined eager load
db.session.query(Customer).options(joinedload(Customer.home_address))
Yeah, that's entirely possible, though you would probably want code like:
# if you know the customer's database id...
# get the first address in the database for the given id that says it's for home
home_address = db.session.query(Address).filter_by(customer_id=customer_id_here, home=True).first()
Instead of having a boolean for home, you might try a 'type' column instead, using an enum. This would let you easily pick an address for places like work, rather than just a binary choice of "either this address is for home or not".
Update: You might also consider using the back_populates keyword argument with the relationship call, so if you have an address instance (called a), you can get the customer it's for with something like a.customer (which is the instance of the Customer class this address is associated with).

Why add self instead of self.last_seen in sqlalchemy db (Flask)

In this code the last_seen field is being refreshed with the current time whenever the user uses the site. However, in the call to the db, he (Minuel Grindberg "Flask Web Development") adds self instead of self.last_seen, which confuses me. I understand what the basic principals of OOP are, and I (thought) understand what self is (reference to the object being created), but I do NOT understand why we don't add self.last_seen in the last line db.session.add(self)? Full code below. . .
class User(UserMixin, db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
email = db.Column(db.String(64), unique=True, index=True)
username = db.Column(db.String(64), unique=True, index=True)
role_id = db.Column(db.Integer, db.ForeignKey('roles.id'))
password_hash = db.Column(db.String(128))
confirmed = db.Column(db.Boolean, default=False)
name = db.Column(db.String(64))
location = db.Column(db.String(64))
about_me = db.Column(db.Text())
member_since = db.Column(db.DateTime(), default=datetime.utcnow)
last_seen = db.Column(db.DateTime(), default=datetime.utcnow)
def ping(self):
self.last_seen = datetime.utcnow()
db.session.add(self)
Looks very simple and I'm sure it is, but obviously I'm missing something, or haven't learned something I should have. If i knew what to google for an answer, I would have certainly done so, but I'm not even sure what to search for other than the principals of Python OOP which I thought I already understood (I did review). Any help would be greatly appreciated because this is driving me crazy, lol.
He is adding the updated model to the DB. The model changed so db.session.add() will update the proper row behind the scene. I don't believe SQLAlchemy would allow you to add on the property of model because it wouldn't know which row to update
Perhaps an example would make this clearer. Let's take the following model:
class User(db.model):
__tablename__ = 'User'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(25))
Now there are 2 very important attributes on the model for inserting/updating it in the DB. The table name and the id. So to add that model to the DB with plain SQL we would need to do something like:
INSERT INTO User (name) VALUES ('Some string');
This is roughly what happens when you use db.session.add() on a new model. To update our model we would need to do something like:
UPDATE User
SET name='Some other String'
WHERE id=1;
Now if you were to only pass one attribute of a model to SQLAlchemy how would it be able to figure out what table you wanted to add to or which row was supposed to get changed?
If you just passed self.name to db.session.add() the query would end up looking like this:
UPDATE # There is no way to know the table
SET name='Some other String'
WHERE ; # There is no way to know which row needs to be changed
SQLAlchemy would most likely throw an exception if you tried. As for why it can't deduce the model from self that is probably way outside the scope of an SO question.
IanAuld is right-- but I'll make an effort to try and explain it in a long-winded fashion.
Lets put ourselves in SQLAlchemy's role, and lets pretend we are the db.session.add method.
self.last_seen is a datetime object, so lets pretend we're sitting at home, and an envelope comes through the door and it's addressed to db.session.add. Great, that's us, so we open it up and read the message which just says 2014-07-29 nothing else. We know we need to file it away in the filing cabinet somewhere, but we just don't have enough information to do so, all we know is we've got a datetime, we've got no idea what User it belongs to, or even if it does belong to a User at all, it's just a datetime-- we're stuck.
If instead the next thing that comes through the door is a parcel, again addressed to db.session.add, and again we open it-- this time it's a little model of a User, it's got a name, an email-- and even a last_seen datetime written on it's arm. Now it's easy-- I can go right to the filing cabinet and have a look to see if I've already got it in there, and either make a few changes to make them match, or simple file this one away if it's new.
That's the difference-- with an ORM model, you're passing these full User's or Products, or anything around, and SQLALchemy knows that it's a db.Model and therefore can know how, and where to handle it by inspecting it's details.

Categories