Searching an exact match for the attribute of a relation - python

I have following SQLAlchemy DB models describing parts that go through several production steps:
class Part(db.Model):
part_number = db.Column(db.Integer, primary_key=True)
production_steps = db.relationship("ProductionStep")
class ProductionStep(db.Model):
id = db.Column(db.Integer, primary_key=True)
part_number = db.Column(db.Integer, db.ForeignKey('part.part_number'))
name = db.Column(db.String)
status = db.Column(db.String)
Now I'd like to query all parts that have a production step with a certain name and status through a Flask-Restless search query.
Is this possible with a Flask-Restless search query? If yes, how can I achieve the specified behaviour?
I'm using Flask-Restless version 0.17.0.
I have tried following filters:
q={"filters":[{"and":[{"name":"production_steps__name","op":"==","val":"cutting"},
{"name":"production_steps__status","op":"any","val":"done"}]}]}
Which leads to following error:
sqlalchemy.exc.InvalidRequestError: Can't compare a collection to an object or collection; use contains() to test for membership.
Which sounds reasonable, so I also tried the following:
q={"filters":[{"and":
[{"name":"production_steps","op":"any","val":{"name":"name","op":"eq","val":"cutting"}},
{"name":"production_steps","op":"any","val":{"name":"status","op":"eq","val":"done"}}]
}]}
This query does work, but it does return parts that match only one of the criterions (e.g. parts with a production step "cutting" where the status is not "done")

As discussed in the comments, Flask-Restless does not seem to support queries like this.
Two possible workarrounds:
Do two search queries: First get all Ids of ProductionSteps with the correct name and status. Second query all Parts that have one of Ids in the production_steps array with the in operator.
Implement your own route that returns the Parts wanted. Code might look something like this:
#app.route('/part/outstanding', methods=['GET'])
def parts_outstanding():
result = Part.query.join(Part.production_steps) \
.filter_by(status='outstanding').all()
#Custom serialization logic
result_json = list(map(lambda part: part.to_dict(), result))
return jsonify(
num_results=len(result),
objects=result_json,
page=1,
total_pages=1
)
I'd advocate for doing two search queries. Implementing your own route seems kinda hacky.

Related

How to return field as set() using peewee

I have currently worked abit with ORM using Peewee and I have been trying to understand how I am able to get the field url from the table. The condition is that column visible needs to be true as well. Meaning that if visible is True and the store_id is 4 then return all the url as set.
I have currently done something like this
from peewee import (
Model,
TextField,
BooleanField
)
from playhouse.pool import PooledPostgresqlDatabase
# -------------------------------------------------------------------------
# Connection to Postgresql
# -------------------------------------------------------------------------
postgres_pool = PooledPostgresqlDatabase(
'xxxxxxx',
host='xxxxxxxx',
user='xxxxxxxx',
password='xxxxxx',
max_connections=20,
stale_timeout=30,
)
# ------------------------------------------------------------------------------- #
class Products(Model):
store_id = TextField(column_name='store_id')
url = TextField(column_name='url')
visible = BooleanField(column_name='visible')
class Meta:
database = postgres_pool
db_table = "develop"
#classmethod
def get_urls(cls):
try:
return set([i.url for i in cls.select().where((cls.store_id == 4) & (cls.visible))])
except Products.IntegrityError:
return None
However using the method takes around 0.13s which feels abit too long for me than what it supposed to do which I believe is due to the for loop and needing to put it as a set() and I wonder if there is a possibility that peewee can do something like cls.select(cls.url).where((cls.store_id == 4) & (cls.visible) and return as set()?
How many products do you have? How big is this set? Why not use distinct() so that the database de-duplicates them for you? What indexes do you have? All of these questions are much more pertinent than "how do I make this python loop faster".
I'd suggest that you need an index on store_id, visible or store_id where visible.
create index "product_urls" on "products" ("store_id") where "visible"
You could even use a covering index but this may take up a lot of disk space:
create index "product_urls" on "products" ("store_id", "url") where visible
Once you've got the actual query sped up with an index, you can also use distinct() to make the db de-dupe the URLs before sending them to Python. Additionally, since you only need the URL, just select that column and use the tuples() method to avoid creating a class:
#classmethod
def get_urls(cls):
query = cls.select(cls.url).where((cls.store_id == 4) & cls.visible)
return set(url for url, in query.distinct().tuples())
Lastly please read the docs: http://docs.peewee-orm.com/en/latest/peewee/querying.html#iterating-over-large-result-sets

Is this the best way to construct my query in SQLAlchemy

I am tracking laptimes for drivers that set fastest laps on a circuit over a period of a month. Every few days I update their latest laptimes so each month they might have many laps completed. I'm trying to build a SQL Alchemy model that returns a list of all laptimes for that month (called a challenge) but filters to the fastest times per driver.
I have this working, but it doesn't feel very pythonic. I don't know if the filtering should be part of a relationship in the model (e.g. a field that returns the list of fastest laps, per driver, per challenge) or if this is done outside of the model as I construct the view (using Flask)
Here is my model:
class Challenges(db.Model):
__tablename__ = 'challenges'
id = db.Column(db.Integer, primary_key=True)
season_id = db.Column(db.Integer, db.ForeignKey('seasons.id'))
type = db.Column(db.Text)
track_id = db.Column(db.Integer, db.ForeignKey('tracks.id'))
layout_id = db.Column(db.Integer, db.ForeignKey('layouts.id'))
car_class_id = db.Column(db.Integer, db.ForeignKey('car_classes.id'))
car_id = db.Column(db.Integer, db.ForeignKey('cars.id'))
# one to many
# laps = db.relationship('Laps', backref='challenge',lazy='dynamic')
# one to one
track = db.relationship('Tracks', backref='challenge',uselist=False)
layout = db.relationship('Layouts', backref='challenge',uselist=False)
car_class = db.relationship('Car_classes', backref='challenge',uselist=False)
car = db.relationship('Cars', backref='challenge',uselist=False)
What I'd like to do is replace the 'Laps' relationship (commented out above which returns ALL laps) with one that returns only the fastest lap per driver for that challenge. I currently have this SQL Alchemy code that runs in the Flask blueprint to get the data ready to display, but it feels like this should be inside the model as a 'leaderboard' relationship. I hope this makes sense. Here is my query.
leaderboard = db.session.query(
Laps.challenge_id,
Drivers.id,
Drivers.name,
Drivers.country,
func.min(Laps.laptime),
Laps.datetime,
Laps.verified
).join(Drivers, Laps.driver_id == Drivers.id
).filter(Laps.challenge_id == challenges[0].id
).group_by(
Laps.driver_id
).order_by(
Laps.laptime.asc()
).all()
So this DOES give me what I need, but it feels really clunky and I'm trying to learn Python the correct way and don't want to use what feels like a work around. If I simply place this code in the model it fails (Laps is not defined) and its not defined as a relationship like I think it should be.
Very new to SQLAlchemy so apologies if this is a simple error on my part. Its more important to me to do this right than simply to get it working. Thanks in advance for any expertise shared.

SQL alchemy query a single column

I am sure this has been answered before and I see a few related answers but none seem to be the issue I am facing. I am using a SQL Alchemy model that uses a SQL server DB underneath and I am using it to query the DB with a session. The normal queries etc work fine with no errors. However when I ask for only one field instead of all it gives me an error (see later).
Basically boiled down to the simplest I have a model like so:
class FactoryShop(Base):
# case insensitive, refers to the actual table in the DB called factoryshop
__tablename__ = 'factoryshop'
ID = Column(Integer, primary_key=True, autoincrement=True)
Name = Column(String(255))
Parts = Column(Integer)
Strength = Column(Integer)
Average = Column(Float)
...
Using a session I can query all columns like so:
>>> session.query(FactoryShop).filter(FactoryShop.Parts==20000)
<sqlalchemy.orm.query.Query object at 0x10578c280>
However if I try to just ask for the Name like below I get a long error. I searched for that specific error which involves 'selectable' but I didn't come across a relevant answer.
>>> session.query(FactoryShop.Name).filter(FactoryShop.Parts==20000)
AttributeError: Neither 'AnnotatedColumn' object nor 'Comparator' object has an attribute 'selectable'
If there is already an answer please point me to it and I will delete this one.
You are not querying for it correctly. But you are very close.
result = session.query(FactoryShop).filter(FactoryShop.Parts==20000).first()
Then, you can call result.Name to get the name of that FactoryShop Object.

How to query distinct values from a projection? - python - google app engine - ndb [duplicate]

I have a simple little "Observation" class:
from google.appengine.ext import ndb
class Observation(ndb.Model):
remote_id = ndb.StringProperty()
dimension_id = ndb.IntegerProperty()
metric = ndb.StringProperty()
timestamp_observed = ndb.StringProperty()
timestamp_received = ndb.DateTimeProperty(auto_now_add=True)
#classmethod
def query_book(cls):
return cls.query()
I can run projection queries against the Datastore to return only certain columns. E.g:
observations = Observation.query().fetch(projection=[Observation.dimension_id])
This works nicely, but I only want unique results. The documentation makes this sound easy:
# Functionally equivalent
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
But when I do this:
observations = Observation.query().fetch(projection=[Observation.dimension_id], group_by=[Observation.dimension_id])
observations = Observation.query().fetch(projection=[Observation.dimension_id], distinct=True)
I get errors for both variants.
TypeError: Unknown configuration option ('group_by')
TypeError: Unknown configuration option ('distinct')
This happens on localhost and in the prod environment too. What am I missing?
Silly me - all of these params need to sit within the query() function, not within fetch(). The projection elements actually works in fetch(), but you need to move both the projection and distinct arguments into query() to get it to work.
From Grouping:
Projection queries can use the distinct keyword to ensure that only
completely unique results will be returned in a result set. This will
only return the first result for entities which have the same values
for the properties that are being projected.
Article.query(projection=[Article.author], group_by=[Article.author])
Article.query(projection=[Article.author], distinct=True)
Both queries are equivalent and will produce each author's name only
once.
Hope this helps anyone else with a similar problem :)

Python sqlalchemy dynamic relationship

I'm trying to understand if it's possible to do something with Sqlalchemy, or if I'm thinking about it the wrong way. As an example, say I have two (these are just examples) classes:
class Customer(db.Model):
__tablename__ = 'customer'
id = Column(Integer, primary_key=True)
name = Column(String)
addresses = relationship('Address')
class Address(db.Model):
__tablename__ = 'address'
if = Column(Integer, primary_key=True)
address = Column(String)
home = Column(Boolean)
customer_id = Column(Integer, ForeignKey('customer.id'))
And later I want to perform a query that gets the customer and just their home address. Is it possible to do that with something like this:
db.session.query(Customer).join(Address, Address.home == True)
Would the above further refine/restrict the join so the results would only get the home address?
When in doubt if a query construct is what you want, try printing it:
In [29]: db.session.query(Customer).join(Address, Address.home == True)
Out[29]: <sqlalchemy.orm.query.Query at 0x7f14fa651e80>
In [30]: print(_)
SELECT customer.id AS customer_id, customer.name AS customer_name
FROM customer JOIN address ON address.home = true
It is clear that this is not what you want. Every customer is joined with every address that is a home address. Due to how entities are handled this might not be obvious at first. The duplicate rows per customer are ignored and you get a result of distinct Customer entities, even though the underlying query was wrong. The query also effectively just ignores the joined Addresses when forming results.
The easiest solution would be to just query for customer and address tuples with required criteria:
db.session.query(Customer, Address).\
join(Address).\
filter(Address.home)
You could also do something like this
db.session.query(Customer).\
join(Address, (Customer.id == Address.customer_id) & Address.home).\
options(contains_eager(Customer.addresses))
but I'd highly recommend against it. You'd be lying to yourself about what the relationship collection contains and that might backfire at some point. Instead you should add a new one to one relationship to Customer with the custom join condition:
class Customer(db.Model):
...
home_address = relationship(
'Address', uselist=False,
primaryjoin='and_(Customer.id == Address.customer_id, Address.home)')
and then you could use a joined eager load
db.session.query(Customer).options(joinedload(Customer.home_address))
Yeah, that's entirely possible, though you would probably want code like:
# if you know the customer's database id...
# get the first address in the database for the given id that says it's for home
home_address = db.session.query(Address).filter_by(customer_id=customer_id_here, home=True).first()
Instead of having a boolean for home, you might try a 'type' column instead, using an enum. This would let you easily pick an address for places like work, rather than just a binary choice of "either this address is for home or not".
Update: You might also consider using the back_populates keyword argument with the relationship call, so if you have an address instance (called a), you can get the customer it's for with something like a.customer (which is the instance of the Customer class this address is associated with).

Categories