How to achieve simple subquery in peewee without join - python

I would like to have a simple subquery. One I would like to re-use in several places. Some of those would be joins, some would not.
SQL code would be like this
SELECT IF(x, y, z) as foo, table.*
FROM TABLE
WHERE condition
And then it's used in many places, joining and where'ing by foo.
Sometimes simply like this:
SELECT * FROM
(
SELECT IF(x, y, z) as foo, table.*
FROM TABLE
WHERE condition
) WHERE (foo > 100)
Sometimes more complex, like grouping, joining.
However, I find it quite hard to do in peewee.
I figured out I can do this if I use joins
query1 = table1.select(...).where(...)
query2 = table2.select(...).join(query1, on=(...))...
This would work
query1 = table1.select(...).where(...)
query2 = query1.select(...).join(table2, on=(...))...
This would also work
However, if I just select from query1, it doesn't work. Exact code that fails:
query = tables.Payments.select(fn.IF(tables.Payments.payment_status > 0, tables.Payments.payment_amount, -tables.Payments.payment_amount).alias("x")).where(tables.Payments.payment_amount > 200)
query2 = query.select().where(query.c.x < 0)
I expect query2 to just be a select from Payments where x, calculated according to condition before, is less than 0, but instead it produces bogus SQL code
SELECT FROM `payments` AS `t1` WHERE ((`t1`.`payment_amount` > 200) AND (`t2`.`x` < 0))
Which is obviously malformed and doesn't execute
How do I do this? Is this even possible in peewee?
I know I could write "where()" and replicate my condition there, but that's bad practice, because it's copypasting code, and what if I want to change that condition later? Do I re-do it in 10 places?... Surely there's a proper way to do this
PS: As advised, I have altered my code but it produces malformed SQL query again.
My code:
query = tables.Payments.select(fn.IF(tables.Payments.payment_status > 0, tables.Payments.payment_amount, -tables.Payments.payment_amount).alias("x")).where(tables.Payments.payment_amount > 200)
query2 = query.select_from(query.c.x).where(query.c.x < 0)
Resulting query:
SELECT `t1`.`x` FROM (SELECT IF((`t2`.`payment_status` > 0), `t2`.`payment_amount`, `t2`.`payment_amount` DESC) AS `x` FROM `payments` AS `t2` WHERE (`t2`.`payment_amount` > 200)) AS `t1` WHERE (`t1`.`x` < 0)
As you see, instead of doing a minus operation, it adds DESC which is obviously not right.
How to fix this?

Here is an example of wrapping a subquery using select_from():
db = SqliteDatabase(':memory:')
class Reg(db.Model):
key = TextField()
db.create_tables([Reg])
Reg.create(key='k1')
Reg.create(key='k2')
Reg.create(key='k3')
subq = Reg.select(Reg.key.alias('foo'), Reg)
query = subq.select_from(subq.c.foo).where(subq.c.foo.in_(['k1', 'k3']))
for row in query:
print(row.foo)
# k1
# k3
Another example, this is basically what select_from() does under-the-hood:
query = Select([subq], [subq.c.foo]).bind(db)
for row in query:
print(row)
# {'foo': 'k1'}
# {'foo': 'k2'}
# {'foo': 'k3'}
For your last/most-recent edit to your issue, replace unary minus with X * -1:
query = (Payments.select(fn.IF(
Payments.payment_status > 0,
Payments.payment_amount,
Payments.payment_amount * -1).alias("x")
).where(Payments.payment_amount > 200)
query2 = query.select_from(query.c.x).where(query.c.x < 0)

Related

Python Peewee EXISTS Subquery not working as expected

I am using the peewee ORM for a python application and I am trying to write code to fetch batches of records from a SQLite database. I have a subquery that seems to work by itself but when added to an update query the fn.EXISTS(sub_query) seems to have no effect as every record in the database is updated.
Note: I am using the APSW extension for peewee.
def batch_logic(self, id_1, path_1, batch_size=1000, **kwargs):
sub_query = (self.select(ModelClass.granule_id).distinct().where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == collection_id) &
(ModelClass.name.contains(provider_path))
).order_by(ModelClass.discovered_date.asc()).limit(batch_size)).limit(batch_size))
print(f'len(sub_query): {len(sub_query)}')
fb_st_2 = time.time()
updated_records= list(
(self.update(status='new_status').where(fn.EXISTS(sub_query)).returning(ModelClass))
)
print(f'update {len(updated_records)}: {time.time() - fb_st_2}')
db.close()
return updated_records
Below is output from testing locally:
id_1: id_1_1676475997_PQXYEQGJWR
len(sub_query): 2
update 20000: 1.0583274364471436
fetch_batch 20000: 1.1167597770690918
count_things 0: 0.02147078514099121
processed_things: 20000
The subquery is correctly returning 2 but the update query where(fn.EXISTS(sub_query)) seems to be ignored. Have I made a mistake in my understanding of how this works?
Edit 1: I believe GROUP BY is needed as rows can have the same granule_id and I need to fetch rows up to batch_size granule_ids
I think your use of UPDATE...WHERE EXISTS is incorrect or inappropriate here. This may work better for you:
# Unsure why you have a GROUP BY with no aggregation, that seems
# incorrect possibly, so I've removed it.
sub_query = (self.select(ModelClass.id)
.where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == id_1) &
(ModelClass.name.contains(path_1)))
.order_by(ModelClass.discovered_date.asc())
.limit(batch_size))
update = (self.update(status='new_status')
.where(self.id.in_(sub_query))
.returning(ModelClass))
cursor = update.execute() # It's good to explicitly execute().
updated_records = list(cursor)
The key idea, at any rate, is I'm correlating the update with the subquery.

Dynamically search for null in sqlite select query using python

I'm new to python and I want to do a similar query to this one:
_c.execute('select * from cases where bi = ? and age = ? and
shape = ? and margin = ? and density = ?',(obj['bi'],
obj['age'], obj['margin'], obj['density']))
When some of the parameters are None, for example obj['bi'] = None, the query searches for the row when bi = 'None'. But I want it to search for the row when: 'bi is NULL'
A possible solution is to verify the values of the parameters one by one in a sequence of if-elses. For example:
query = 'select * from cases where'
if obj['bi'] is None:
query += ' bi is null'
else:
query += ' bi = ' + str(obj['bi']) + ' and '
...
# do the same if-else for the other parameters
...
_c.execute(query)
But, it doesn't seem to me as the best solution.
The question is, what is the best solution to the given problem and how to avoid SQL injections.
Okay, after firing up a python REPL and playing around with it a bit, it's simpler than I thought. The Python sqlite bindings turn a Python None into a SQL NULL, not into a string 'None' like it sounded like from your question. In SQL, = doesn't match NULL values, but IS will. So...
Given a table foo looking like:
a | b
--------------
NULL | 1
Dog | 2
Doing:
c = conn.cursor()
c.execute('SELECT * FROM foo WHERE a IS ?', (None,))
print(c.fetchone())
will return the (NULL, 1) row, and
c.execute('SELECT * FROM foo WHERE a IS ?', ('Dog',))
print(c.fetchone())
will return the ('Dog', 2) row.
In other words, use IS not = in your query.

SQLAlchemy Left join WHERE clause being converted to zeros and ones

Howdie do,
I have the following SQL, that I'm converting to SQLAlchemy:
select t1.`order_id`, t1.`status_type`
from `tracking_update` AS t1 LEFT JOIN `tracking_update` AS t2
ON (t1.`order_id` = t2.`order_id` AND t1.`last_updated` < t2.`last_updated`)
where t1.`order_id` = '21757'and t2.`last_updated` IS NULL
The SQL is just returning the latest tracking update for order id 21757. I'm accomplishing this by doing a left join back to the same table. In order to do this, I'm aliasing the table first:
tUAlias1 = aliased(TrackingUpdate)
tUalias2 = aliased(TrackingUpdate)
So far, this is what I have for my conversion to SQLAlchemy:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(and_(tUAlias1.order_id == '21757', tUalias2.last_updated is None))
And this is the result of the SQLAlchemy code that is executed on the server via log:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE 0 = 1
As you can see, the filter(WHERE clause) is now 0 = 1.
Now, if I remove the and_ statement and try two filters like so:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated is None)
I receive the same result. I know the SQL itself is fine as I can run it with no issue via MySQL workbench.
When SQL run directly, I will receive the following
order ID | Status
21757 D
Also, if I remove the tUalias2.last_updated is None, I actually receive some results, but they are not correct. This is the SQL Log for that:
Python code
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757')
SQLAlchemy run:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE tracking_update_1.order_id = '21757'
Any ideas?
Howdie do,
I figured it out
The Python 'is' operator doesn't play nice with SQLAlchemy
I found this out thanks to the following S/O question:
Selecting Null values SQLAlchemy
I've since updated my query to the following:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated == None)
The problem is not in how SqlAlchemy processes null values, the problem is that you use an operator which is not supported for instrumented' columns and thus the expressiontUalias2.last_updated is Noneevaluates to a value (False), which is then translated to eitherand 0=1. You should writetUalias2.last_updated.is_(None)instead oftUalias2.last_updated is None` to make your code work.

Knowing if the result of a SQL request must be a part of another SQL request result

Let's suppose I have the following table :
Id (int, Primary Key) | Value (varchar)
----------------------+----------------
1 | toto
2 | foo
3 | bar
I would like to know if giving two request, the result of the first must be contained in the result of the second without executing them.
Some examples :
# Obvious example
query_1 = "SELECT * FROM example;"
query_2 = "SELECT * FROM example WHERE id = 1;"
is_sub_part_of(query_2, query_1) # True
# An example we can't know before executing the two requests
query_1 = "SELECT * FROM example WHERE id < 2;"
query_2 = "SELECT * FROM example WHERE value = 'toto' or value = 'foo';"
is_sub_part_of(query_2, query_1) # False
# An example we can know before executing the two requests
query_1 = "SELECT * FROM example WHERE id < 2 OR value = 'bar';"
query_2 = "SELECT * FROM example WHERE id < 2 AND value = 'bar';"
is_sub_part_of(query_2, query_1) # True
# An example about columns
query_1 = "SELECT * FROM example;"
query_2 = "SELECT id FROM example;"
is_sub_part_of(query_2, query_1) # True
Do you know if there's a module in Python that is able to do that, or if it's even possible to do ?
Interesting problem. I don't know of any library that will do this for you. My thoughts:
Parse the SQL, see this for example.
Define which filtering operations can be added to a query that can only result in the same or a narrower result set. "AND x" can always be added, I think, without losing the property of being a subset. "OR x" can not. Anything else you can do to the query? For example "SELECT *", vs "SELECT x", vs "SELECT x, y".
Except for that, I can only say it's an interesting idea. You might get some more input on DBA. Is this an idea you're researching or is it related to a real-world problem you are solving, like optimizing a DB query? Maybe your question could be updated with information about this, since this is not a common way to optimize queries (unless you're working on the DB engine itself, I guess).

database field value that matches to every query

I would like to insert records into a sqlite database with fields such that every query that specifies a value for that field does not disqualify the record.
Make Model Engine Parameter
Ford * * 1
Ford Taurus * 2
Ford Escape * 3
So a query = (database.table.Make == Ford') & (database.table.Model == 'Taurus') would return the first two records
EDIT: thanks to woot, I decided to use the following: (database.table.Make.belongs('Ford','')) & (database.table.Model.belongs('Taurus','')) which is the syntax for the IN operator in web2py
Are you looking for something like this? It won't perform well due to the ORs if you have a lot of rows.
SELECT *
FROM Cars
WHERE ( Cars.Make = 'Ford' OR Cars.Make = '*' )
AND ( Cars.Model = 'Taurus' OR Cars.Model = '*' )
Here is a SQL Fiddle example.
If you meant to use NULL, you can just replace that and replace the OR condition with OR Cars.Make IS NULL, etc.
Or to make it maybe a little less verbose:
SELECT *
FROM Cars
WHERE Cars.Make IN ('Ford','*')
AND Cars.Model IN ('Taurus','*')
But you wouldn't be able to use NULL in this case and would have to use the * token.
SQL Fiddle

Categories