The following SQL finds all posts which haven't any associated tags named 'BadTag'.
select * from post t1
where not exists
(select 1 from tag t2
where t1.id == t2.post_id and t2.name=='BadTag');
How can I write this functionality in Peewee ORM? If I write something along the lines of
Post.select().where(
~Tag.select()
.where(Post.id == Tag.post & Tag.name=='BadTag')
.exists()
)
it gets compiled to
SELECT "t1"."id", ... FROM "post" AS t1 WHERE ? [-1]
Something like
Post.select().join(Tag).where(Tag.name!='BadTag')
doesn't work since a Post can have many Tags.
I'm new to SQL/Peewee so if this is a bad way to go about things I'd welcome pointers.
Do not use manecosta's solution, it is inefficient.
Here is how to do a NOT EXISTS with a subquery:
(Post
.select()
.where(~fn.EXISTS(
Tag.select().where(
(Tag.post == Post.id) & (Tag.name == 'BadTag'))))
You can also do a join:
(Post
.select(Post, fn.COUNT(Tag.id))
.join(Tag, JOIN.LEFT_OUTER)
.where(Tag.name == 'BadTag')
.group_by(Post)
.having(fn.COUNT(Tag.id) == 0))
Related
I am using the peewee ORM for a python application and I am trying to write code to fetch batches of records from a SQLite database. I have a subquery that seems to work by itself but when added to an update query the fn.EXISTS(sub_query) seems to have no effect as every record in the database is updated.
Note: I am using the APSW extension for peewee.
def batch_logic(self, id_1, path_1, batch_size=1000, **kwargs):
sub_query = (self.select(ModelClass.granule_id).distinct().where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == collection_id) &
(ModelClass.name.contains(provider_path))
).order_by(ModelClass.discovered_date.asc()).limit(batch_size)).limit(batch_size))
print(f'len(sub_query): {len(sub_query)}')
fb_st_2 = time.time()
updated_records= list(
(self.update(status='new_status').where(fn.EXISTS(sub_query)).returning(ModelClass))
)
print(f'update {len(updated_records)}: {time.time() - fb_st_2}')
db.close()
return updated_records
Below is output from testing locally:
id_1: id_1_1676475997_PQXYEQGJWR
len(sub_query): 2
update 20000: 1.0583274364471436
fetch_batch 20000: 1.1167597770690918
count_things 0: 0.02147078514099121
processed_things: 20000
The subquery is correctly returning 2 but the update query where(fn.EXISTS(sub_query)) seems to be ignored. Have I made a mistake in my understanding of how this works?
Edit 1: I believe GROUP BY is needed as rows can have the same granule_id and I need to fetch rows up to batch_size granule_ids
I think your use of UPDATE...WHERE EXISTS is incorrect or inappropriate here. This may work better for you:
# Unsure why you have a GROUP BY with no aggregation, that seems
# incorrect possibly, so I've removed it.
sub_query = (self.select(ModelClass.id)
.where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == id_1) &
(ModelClass.name.contains(path_1)))
.order_by(ModelClass.discovered_date.asc())
.limit(batch_size))
update = (self.update(status='new_status')
.where(self.id.in_(sub_query))
.returning(ModelClass))
cursor = update.execute() # It's good to explicitly execute().
updated_records = list(cursor)
The key idea, at any rate, is I'm correlating the update with the subquery.
So I have a sql query which joins table and fetches data
select "FileSets"."Id", "SetFile"."Alias" from "Feeds"
join "FeedSnapshots" on "Feeds"."ActiveSnapshotId"="FeedSnapshots"."Id"
join "Subscriptions" on "Feeds"."Id" = "Subscriptions"."FeedId"
join "SubscriptionSnapshots" on "Subscriptions"."ActiveSnapshotId"="SubscriptionSnapshots"."Id"
join "FileSets" on "SubscriptionSnapshots"."Id"="FileSets"."SubscriptionSnapshotId"
join "SetFile" on "FileSets"."Id"="SetFile"."FileSetId" where "Feeds"."Id"=398 and "Expected"=true
Now I'm trying to convert this to a sqlAlchemy query but it gives me the following error:
sqlalchemy.exc.InvalidRequestError: Can't determine which FROM clause to join from, there are multiple FROMS which can join to this entity. Please use the .select_from() method to establish an explicit left side, as well as providing an explcit ON clause if not present already to help resolve the ambiguity.
My sqlAlchemy query looks like this:
db.session.query(FileSet.id, SetFile.alias).join(FeedSnapshot, Feed.active_snapshot_id == FeedSnapshot.id) \
.join(Subscription, Feed.id == Subscription.feed_id).join(SubscriptionSnapshot, Subscription.active_snapshot_id == SubscriptionSnapshot.id) \
.join(FileSet, SubscriptionSnapshot.id == FileSet.subscription_snapshot_id).join(SetFile, FileSet.id == SetFile.file_set_id) \
.filter(and_(SetFile.expected, Feed.id == orig_feed_snapshot.feed_id)).all()
Can someone tell me what I'm doing wrong in my SqlAlchemy query?
Depending on whether you want to, you can also use one with flask_sqlalchemy instead of the pure sqlalchemy request.
The following query should work.
id_alias_pairs = Feed.query\
.join(FeedSnapshot, Feed.active_snapshot_id == FeedSnapshot.id)\
.join(Subscription, Feed.id == Subscription.feed_id)\
.join(SubscriptionSnapshot, Subscription.active_snapshot_id == SubscriptionSnapshot.id)\
.join(FileSet, SubscriptionSnapshot.id == FileSet.subscription_snapshot_id)\
.join(SetFile, FileSet.id == SetFile.file_set_id)\
.filter(Feed.id==1, SetFile.expected)\
.with_entities(FileSet.id, SetFile.alias)\
.all()
print(id_alias_pairs)
Howdie do,
I have the following SQL, that I'm converting to SQLAlchemy:
select t1.`order_id`, t1.`status_type`
from `tracking_update` AS t1 LEFT JOIN `tracking_update` AS t2
ON (t1.`order_id` = t2.`order_id` AND t1.`last_updated` < t2.`last_updated`)
where t1.`order_id` = '21757'and t2.`last_updated` IS NULL
The SQL is just returning the latest tracking update for order id 21757. I'm accomplishing this by doing a left join back to the same table. In order to do this, I'm aliasing the table first:
tUAlias1 = aliased(TrackingUpdate)
tUalias2 = aliased(TrackingUpdate)
So far, this is what I have for my conversion to SQLAlchemy:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(and_(tUAlias1.order_id == '21757', tUalias2.last_updated is None))
And this is the result of the SQLAlchemy code that is executed on the server via log:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE 0 = 1
As you can see, the filter(WHERE clause) is now 0 = 1.
Now, if I remove the and_ statement and try two filters like so:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated is None)
I receive the same result. I know the SQL itself is fine as I can run it with no issue via MySQL workbench.
When SQL run directly, I will receive the following
order ID | Status
21757 D
Also, if I remove the tUalias2.last_updated is None, I actually receive some results, but they are not correct. This is the SQL Log for that:
Python code
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757')
SQLAlchemy run:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE tracking_update_1.order_id = '21757'
Any ideas?
Howdie do,
I figured it out
The Python 'is' operator doesn't play nice with SQLAlchemy
I found this out thanks to the following S/O question:
Selecting Null values SQLAlchemy
I've since updated my query to the following:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated == None)
The problem is not in how SqlAlchemy processes null values, the problem is that you use an operator which is not supported for instrumented' columns and thus the expressiontUalias2.last_updated is Noneevaluates to a value (False), which is then translated to eitherand 0=1. You should writetUalias2.last_updated.is_(None)instead oftUalias2.last_updated is None` to make your code work.
I have this sqlalchemy query:
query = session.query(Store).options(joinedload('salesmen').
joinedload('comissions').
joinedload('orders')).\
filter(Store.store_code.in_(selected_stores))
stores = query.all()
for store in stores:
for salesman in store.salesmen:
for comission in salesman.comissions:
#generate html for comissions for each salesman in each store
#print html document using PySide
This was working perfectly, however I added two new filter queries:
filter(Comissions.payment_status == 0).\
filter(Order.order_date <= self.dateEdit.date().toPython())
If I add just the first filter the application hangs for a couple of seconds, if I add both the application hangs indefinitely
What am I doing wrong here? How do I make this query fast?
Thank you for your help
EDIT: This is the sql generated, unfortunately the class and variable names are in Portuguese, I just translated them to English so it would be easier to undertand,
so Loja = Store, Vendedores = Salesmen, Pedido = Order, Comission = Comissao
Query generated:
SELECT "Loja"."CodLoja", "Vendedores_1"."CodVendedor", "Vendedores_1"."NomeVendedor", "Vendedores_1"."CodLoja", "Vendedores_1"."PercentualComissao",
"Vendedores_1"."Ativo", "Comissao_1"."CodComissao", "Comissao_1"."CodVendedor", "Comissao_1"."CodPedido",
"Pedidos_1"."CodPedido", "Pedidos_1"."CodLoja", "Pedidos_1"."CodCliente", "Pedidos_1"."NomeCliente", "Pedidos_1"."EnderecoCliente", "Pedidos_1"."BairroCliente",
"Pedidos_1"."CidadeCliente", "Pedidos_1"."UFCliente", "Pedidos_1"."CEPCliente", "Pedidos_1"."FoneCliente", "Pedidos_1"."Fone2Cliente", "Pedidos_1"."PontoReferenciaCliente",
"Pedidos_1"."DataPedido", "Pedidos_1"."ValorProdutos", "Pedidos_1"."ValorCreditoTroca",
"Pedidos_1"."ValorTotalDoPedido", "Pedidos_1"."Situacao", "Pedidos_1"."Vendeu_Teflon", "Pedidos_1"."ValorTotalTeflon",
"Pedidos_1"."DataVenda", "Pedidos_1"."CodVendedor", "Pedidos_1"."TipoVenda", "Comissao_1"."Valor", "Comissao_1"."DataPagamento", "Comissao_1"."StatusPagamento"
FROM "Comissao", "Pedidos", "Loja" LEFT OUTER JOIN "Vendedores" AS "Vendedores_1" ON "Loja"."CodLoja" = "Vendedores_1"."CodLoja"
LEFT OUTER JOIN "Comissao" AS "Comissao_1" ON "Vendedores_1"."CodVendedor" = "Comissao_1"."CodVendedor" LEFT OUTER JOIN "Pedidos" AS "Pedidos_1" ON "Pedidos_1"."CodPedido" = "Comissao_1"."CodPedido"
WHERE "Loja"."CodLoja" IN (:CodLoja_1) AND "Comissao"."StatusPagamento" = :StatusPagamento_1 AND "Pedidos"."DataPedido" <= :DataPedido_1
Your FROM clause is producing a Cartesian product and includes each table twice, once for filtering the result and once for eagerly loading the relationship.
To stop this use contains_eager instead of joinedload in your options. This will look for the related attributes in the query's columns instead of constructing an extra join. You will also need to explicitly join to the other tables in your query, e.g.:
query = session.query(Store)\
.join(Store.salesmen)\
.join(Store.commissions)\
.join(Store.orders)\
.options(contains_eager('salesmen'),
contains_eager('comissions'),
contains_eager('orders'))\
.filter(Store.store_code.in_(selected_stores))\
.filter(Comissions.payment_status == 0)\
.filter(Order.order_date <= self.dateEdit.date().toPython())
I would like to insert records into a sqlite database with fields such that every query that specifies a value for that field does not disqualify the record.
Make Model Engine Parameter
Ford * * 1
Ford Taurus * 2
Ford Escape * 3
So a query = (database.table.Make == Ford') & (database.table.Model == 'Taurus') would return the first two records
EDIT: thanks to woot, I decided to use the following: (database.table.Make.belongs('Ford','')) & (database.table.Model.belongs('Taurus','')) which is the syntax for the IN operator in web2py
Are you looking for something like this? It won't perform well due to the ORs if you have a lot of rows.
SELECT *
FROM Cars
WHERE ( Cars.Make = 'Ford' OR Cars.Make = '*' )
AND ( Cars.Model = 'Taurus' OR Cars.Model = '*' )
Here is a SQL Fiddle example.
If you meant to use NULL, you can just replace that and replace the OR condition with OR Cars.Make IS NULL, etc.
Or to make it maybe a little less verbose:
SELECT *
FROM Cars
WHERE Cars.Make IN ('Ford','*')
AND Cars.Model IN ('Taurus','*')
But you wouldn't be able to use NULL in this case and would have to use the * token.
SQL Fiddle