SQLAlchemy Left join WHERE clause being converted to zeros and ones - python

Howdie do,
I have the following SQL, that I'm converting to SQLAlchemy:
select t1.`order_id`, t1.`status_type`
from `tracking_update` AS t1 LEFT JOIN `tracking_update` AS t2
ON (t1.`order_id` = t2.`order_id` AND t1.`last_updated` < t2.`last_updated`)
where t1.`order_id` = '21757'and t2.`last_updated` IS NULL
The SQL is just returning the latest tracking update for order id 21757. I'm accomplishing this by doing a left join back to the same table. In order to do this, I'm aliasing the table first:
tUAlias1 = aliased(TrackingUpdate)
tUalias2 = aliased(TrackingUpdate)
So far, this is what I have for my conversion to SQLAlchemy:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(and_(tUAlias1.order_id == '21757', tUalias2.last_updated is None))
And this is the result of the SQLAlchemy code that is executed on the server via log:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE 0 = 1
As you can see, the filter(WHERE clause) is now 0 = 1.
Now, if I remove the and_ statement and try two filters like so:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated is None)
I receive the same result. I know the SQL itself is fine as I can run it with no issue via MySQL workbench.
When SQL run directly, I will receive the following
order ID | Status
21757 D
Also, if I remove the tUalias2.last_updated is None, I actually receive some results, but they are not correct. This is the SQL Log for that:
Python code
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757')
SQLAlchemy run:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE tracking_update_1.order_id = '21757'
Any ideas?

Howdie do,
I figured it out
The Python 'is' operator doesn't play nice with SQLAlchemy
I found this out thanks to the following S/O question:
Selecting Null values SQLAlchemy
I've since updated my query to the following:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated == None)

The problem is not in how SqlAlchemy processes null values, the problem is that you use an operator which is not supported for instrumented' columns and thus the expressiontUalias2.last_updated is Noneevaluates to a value (False), which is then translated to eitherand 0=1. You should writetUalias2.last_updated.is_(None)instead oftUalias2.last_updated is None` to make your code work.

Related

Python Peewee EXISTS Subquery not working as expected

I am using the peewee ORM for a python application and I am trying to write code to fetch batches of records from a SQLite database. I have a subquery that seems to work by itself but when added to an update query the fn.EXISTS(sub_query) seems to have no effect as every record in the database is updated.
Note: I am using the APSW extension for peewee.
def batch_logic(self, id_1, path_1, batch_size=1000, **kwargs):
sub_query = (self.select(ModelClass.granule_id).distinct().where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == collection_id) &
(ModelClass.name.contains(provider_path))
).order_by(ModelClass.discovered_date.asc()).limit(batch_size)).limit(batch_size))
print(f'len(sub_query): {len(sub_query)}')
fb_st_2 = time.time()
updated_records= list(
(self.update(status='new_status').where(fn.EXISTS(sub_query)).returning(ModelClass))
)
print(f'update {len(updated_records)}: {time.time() - fb_st_2}')
db.close()
return updated_records
Below is output from testing locally:
id_1: id_1_1676475997_PQXYEQGJWR
len(sub_query): 2
update 20000: 1.0583274364471436
fetch_batch 20000: 1.1167597770690918
count_things 0: 0.02147078514099121
processed_things: 20000
The subquery is correctly returning 2 but the update query where(fn.EXISTS(sub_query)) seems to be ignored. Have I made a mistake in my understanding of how this works?
Edit 1: I believe GROUP BY is needed as rows can have the same granule_id and I need to fetch rows up to batch_size granule_ids
I think your use of UPDATE...WHERE EXISTS is incorrect or inappropriate here. This may work better for you:
# Unsure why you have a GROUP BY with no aggregation, that seems
# incorrect possibly, so I've removed it.
sub_query = (self.select(ModelClass.id)
.where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == id_1) &
(ModelClass.name.contains(path_1)))
.order_by(ModelClass.discovered_date.asc())
.limit(batch_size))
update = (self.update(status='new_status')
.where(self.id.in_(sub_query))
.returning(ModelClass))
cursor = update.execute() # It's good to explicitly execute().
updated_records = list(cursor)
The key idea, at any rate, is I'm correlating the update with the subquery.

How to achieve simple subquery in peewee without join

I would like to have a simple subquery. One I would like to re-use in several places. Some of those would be joins, some would not.
SQL code would be like this
SELECT IF(x, y, z) as foo, table.*
FROM TABLE
WHERE condition
And then it's used in many places, joining and where'ing by foo.
Sometimes simply like this:
SELECT * FROM
(
SELECT IF(x, y, z) as foo, table.*
FROM TABLE
WHERE condition
) WHERE (foo > 100)
Sometimes more complex, like grouping, joining.
However, I find it quite hard to do in peewee.
I figured out I can do this if I use joins
query1 = table1.select(...).where(...)
query2 = table2.select(...).join(query1, on=(...))...
This would work
query1 = table1.select(...).where(...)
query2 = query1.select(...).join(table2, on=(...))...
This would also work
However, if I just select from query1, it doesn't work. Exact code that fails:
query = tables.Payments.select(fn.IF(tables.Payments.payment_status > 0, tables.Payments.payment_amount, -tables.Payments.payment_amount).alias("x")).where(tables.Payments.payment_amount > 200)
query2 = query.select().where(query.c.x < 0)
I expect query2 to just be a select from Payments where x, calculated according to condition before, is less than 0, but instead it produces bogus SQL code
SELECT FROM `payments` AS `t1` WHERE ((`t1`.`payment_amount` > 200) AND (`t2`.`x` < 0))
Which is obviously malformed and doesn't execute
How do I do this? Is this even possible in peewee?
I know I could write "where()" and replicate my condition there, but that's bad practice, because it's copypasting code, and what if I want to change that condition later? Do I re-do it in 10 places?... Surely there's a proper way to do this
PS: As advised, I have altered my code but it produces malformed SQL query again.
My code:
query = tables.Payments.select(fn.IF(tables.Payments.payment_status > 0, tables.Payments.payment_amount, -tables.Payments.payment_amount).alias("x")).where(tables.Payments.payment_amount > 200)
query2 = query.select_from(query.c.x).where(query.c.x < 0)
Resulting query:
SELECT `t1`.`x` FROM (SELECT IF((`t2`.`payment_status` > 0), `t2`.`payment_amount`, `t2`.`payment_amount` DESC) AS `x` FROM `payments` AS `t2` WHERE (`t2`.`payment_amount` > 200)) AS `t1` WHERE (`t1`.`x` < 0)
As you see, instead of doing a minus operation, it adds DESC which is obviously not right.
How to fix this?
Here is an example of wrapping a subquery using select_from():
db = SqliteDatabase(':memory:')
class Reg(db.Model):
key = TextField()
db.create_tables([Reg])
Reg.create(key='k1')
Reg.create(key='k2')
Reg.create(key='k3')
subq = Reg.select(Reg.key.alias('foo'), Reg)
query = subq.select_from(subq.c.foo).where(subq.c.foo.in_(['k1', 'k3']))
for row in query:
print(row.foo)
# k1
# k3
Another example, this is basically what select_from() does under-the-hood:
query = Select([subq], [subq.c.foo]).bind(db)
for row in query:
print(row)
# {'foo': 'k1'}
# {'foo': 'k2'}
# {'foo': 'k3'}
For your last/most-recent edit to your issue, replace unary minus with X * -1:
query = (Payments.select(fn.IF(
Payments.payment_status > 0,
Payments.payment_amount,
Payments.payment_amount * -1).alias("x")
).where(Payments.payment_amount > 200)
query2 = query.select_from(query.c.x).where(query.c.x < 0)

Hi I'm trying to write a sql alchemy query but it gives me the error "cant determine which from clause to join from"

So I have a sql query which joins table and fetches data
select "FileSets"."Id", "SetFile"."Alias" from "Feeds"
join "FeedSnapshots" on "Feeds"."ActiveSnapshotId"="FeedSnapshots"."Id"
join "Subscriptions" on "Feeds"."Id" = "Subscriptions"."FeedId"
join "SubscriptionSnapshots" on "Subscriptions"."ActiveSnapshotId"="SubscriptionSnapshots"."Id"
join "FileSets" on "SubscriptionSnapshots"."Id"="FileSets"."SubscriptionSnapshotId"
join "SetFile" on "FileSets"."Id"="SetFile"."FileSetId" where "Feeds"."Id"=398 and "Expected"=true
Now I'm trying to convert this to a sqlAlchemy query but it gives me the following error:
sqlalchemy.exc.InvalidRequestError: Can't determine which FROM clause to join from, there are multiple FROMS which can join to this entity. Please use the .select_from() method to establish an explicit left side, as well as providing an explcit ON clause if not present already to help resolve the ambiguity.
My sqlAlchemy query looks like this:
db.session.query(FileSet.id, SetFile.alias).join(FeedSnapshot, Feed.active_snapshot_id == FeedSnapshot.id) \
.join(Subscription, Feed.id == Subscription.feed_id).join(SubscriptionSnapshot, Subscription.active_snapshot_id == SubscriptionSnapshot.id) \
.join(FileSet, SubscriptionSnapshot.id == FileSet.subscription_snapshot_id).join(SetFile, FileSet.id == SetFile.file_set_id) \
.filter(and_(SetFile.expected, Feed.id == orig_feed_snapshot.feed_id)).all()
Can someone tell me what I'm doing wrong in my SqlAlchemy query?
Depending on whether you want to, you can also use one with flask_sqlalchemy instead of the pure sqlalchemy request.
The following query should work.
id_alias_pairs = Feed.query\
.join(FeedSnapshot, Feed.active_snapshot_id == FeedSnapshot.id)\
.join(Subscription, Feed.id == Subscription.feed_id)\
.join(SubscriptionSnapshot, Subscription.active_snapshot_id == SubscriptionSnapshot.id)\
.join(FileSet, SubscriptionSnapshot.id == FileSet.subscription_snapshot_id)\
.join(SetFile, FileSet.id == SetFile.file_set_id)\
.filter(Feed.id==1, SetFile.expected)\
.with_entities(FileSet.id, SetFile.alias)\
.all()
print(id_alias_pairs)

Peewee SQL query join where none of many match

The following SQL finds all posts which haven't any associated tags named 'BadTag'.
select * from post t1
where not exists
(select 1 from tag t2
where t1.id == t2.post_id and t2.name=='BadTag');
How can I write this functionality in Peewee ORM? If I write something along the lines of
Post.select().where(
~Tag.select()
.where(Post.id == Tag.post & Tag.name=='BadTag')
.exists()
)
it gets compiled to
SELECT "t1"."id", ... FROM "post" AS t1 WHERE ? [-1]
Something like
Post.select().join(Tag).where(Tag.name!='BadTag')
doesn't work since a Post can have many Tags.
I'm new to SQL/Peewee so if this is a bad way to go about things I'd welcome pointers.
Do not use manecosta's solution, it is inefficient.
Here is how to do a NOT EXISTS with a subquery:
(Post
.select()
.where(~fn.EXISTS(
Tag.select().where(
(Tag.post == Post.id) & (Tag.name == 'BadTag'))))
You can also do a join:
(Post
.select(Post, fn.COUNT(Tag.id))
.join(Tag, JOIN.LEFT_OUTER)
.where(Tag.name == 'BadTag')
.group_by(Post)
.having(fn.COUNT(Tag.id) == 0))

sqlalchemy query using joinedload exponentially slower with each new filter clause

I have this sqlalchemy query:
query = session.query(Store).options(joinedload('salesmen').
joinedload('comissions').
joinedload('orders')).\
filter(Store.store_code.in_(selected_stores))
stores = query.all()
for store in stores:
for salesman in store.salesmen:
for comission in salesman.comissions:
#generate html for comissions for each salesman in each store
#print html document using PySide
This was working perfectly, however I added two new filter queries:
filter(Comissions.payment_status == 0).\
filter(Order.order_date <= self.dateEdit.date().toPython())
If I add just the first filter the application hangs for a couple of seconds, if I add both the application hangs indefinitely
What am I doing wrong here? How do I make this query fast?
Thank you for your help
EDIT: This is the sql generated, unfortunately the class and variable names are in Portuguese, I just translated them to English so it would be easier to undertand,
so Loja = Store, Vendedores = Salesmen, Pedido = Order, Comission = Comissao
Query generated:
SELECT "Loja"."CodLoja", "Vendedores_1"."CodVendedor", "Vendedores_1"."NomeVendedor", "Vendedores_1"."CodLoja", "Vendedores_1"."PercentualComissao",
"Vendedores_1"."Ativo", "Comissao_1"."CodComissao", "Comissao_1"."CodVendedor", "Comissao_1"."CodPedido",
"Pedidos_1"."CodPedido", "Pedidos_1"."CodLoja", "Pedidos_1"."CodCliente", "Pedidos_1"."NomeCliente", "Pedidos_1"."EnderecoCliente", "Pedidos_1"."BairroCliente",
"Pedidos_1"."CidadeCliente", "Pedidos_1"."UFCliente", "Pedidos_1"."CEPCliente", "Pedidos_1"."FoneCliente", "Pedidos_1"."Fone2Cliente", "Pedidos_1"."PontoReferenciaCliente",
"Pedidos_1"."DataPedido", "Pedidos_1"."ValorProdutos", "Pedidos_1"."ValorCreditoTroca",
"Pedidos_1"."ValorTotalDoPedido", "Pedidos_1"."Situacao", "Pedidos_1"."Vendeu_Teflon", "Pedidos_1"."ValorTotalTeflon",
"Pedidos_1"."DataVenda", "Pedidos_1"."CodVendedor", "Pedidos_1"."TipoVenda", "Comissao_1"."Valor", "Comissao_1"."DataPagamento", "Comissao_1"."StatusPagamento"
FROM "Comissao", "Pedidos", "Loja" LEFT OUTER JOIN "Vendedores" AS "Vendedores_1" ON "Loja"."CodLoja" = "Vendedores_1"."CodLoja"
LEFT OUTER JOIN "Comissao" AS "Comissao_1" ON "Vendedores_1"."CodVendedor" = "Comissao_1"."CodVendedor" LEFT OUTER JOIN "Pedidos" AS "Pedidos_1" ON "Pedidos_1"."CodPedido" = "Comissao_1"."CodPedido"
WHERE "Loja"."CodLoja" IN (:CodLoja_1) AND "Comissao"."StatusPagamento" = :StatusPagamento_1 AND "Pedidos"."DataPedido" <= :DataPedido_1
Your FROM clause is producing a Cartesian product and includes each table twice, once for filtering the result and once for eagerly loading the relationship.
To stop this use contains_eager instead of joinedload in your options. This will look for the related attributes in the query's columns instead of constructing an extra join. You will also need to explicitly join to the other tables in your query, e.g.:
query = session.query(Store)\
.join(Store.salesmen)\
.join(Store.commissions)\
.join(Store.orders)\
.options(contains_eager('salesmen'),
contains_eager('comissions'),
contains_eager('orders'))\
.filter(Store.store_code.in_(selected_stores))\
.filter(Comissions.payment_status == 0)\
.filter(Order.order_date <= self.dateEdit.date().toPython())

Categories