How to achieve simple subquery in peewee without join

How to achieve simple subquery in peewee without join - python

I would like to have a simple subquery. One I would like to re-use in several places. Some of those would be joins, some would not.
SQL code would be like this
SELECT IF(x, y, z) as foo, table.*
FROM TABLE
WHERE condition
And then it's used in many places, joining and where'ing by foo.
Sometimes simply like this:
SELECT * FROM
(
SELECT IF(x, y, z) as foo, table.*
FROM TABLE
WHERE condition
) WHERE (foo > 100)
Sometimes more complex, like grouping, joining.
However, I find it quite hard to do in peewee.
I figured out I can do this if I use joins
query1 = table1.select(...).where(...)
query2 = table2.select(...).join(query1, on=(...))...
This would work
query1 = table1.select(...).where(...)
query2 = query1.select(...).join(table2, on=(...))...
This would also work
However, if I just select from query1, it doesn't work. Exact code that fails:
query = tables.Payments.select(fn.IF(tables.Payments.payment_status > 0, tables.Payments.payment_amount, -tables.Payments.payment_amount).alias("x")).where(tables.Payments.payment_amount > 200)
query2 = query.select().where(query.c.x < 0)
I expect query2 to just be a select from Payments where x, calculated according to condition before, is less than 0, but instead it produces bogus SQL code
SELECT FROM `payments` AS `t1` WHERE ((`t1`.`payment_amount` > 200) AND (`t2`.`x` < 0))
Which is obviously malformed and doesn't execute
How do I do this? Is this even possible in peewee?
I know I could write "where()" and replicate my condition there, but that's bad practice, because it's copypasting code, and what if I want to change that condition later? Do I re-do it in 10 places?... Surely there's a proper way to do this
PS: As advised, I have altered my code but it produces malformed SQL query again.
My code:
query = tables.Payments.select(fn.IF(tables.Payments.payment_status > 0, tables.Payments.payment_amount, -tables.Payments.payment_amount).alias("x")).where(tables.Payments.payment_amount > 200)
query2 = query.select_from(query.c.x).where(query.c.x < 0)
Resulting query:
SELECT `t1`.`x` FROM (SELECT IF((`t2`.`payment_status` > 0), `t2`.`payment_amount`, `t2`.`payment_amount` DESC) AS `x` FROM `payments` AS `t2` WHERE (`t2`.`payment_amount` > 200)) AS `t1` WHERE (`t1`.`x` < 0)
As you see, instead of doing a minus operation, it adds DESC which is obviously not right.
How to fix this?

Here is an example of wrapping a subquery using select_from():
db = SqliteDatabase(':memory:')
class Reg(db.Model):
key = TextField()
db.create_tables([Reg])
Reg.create(key='k1')
Reg.create(key='k2')
Reg.create(key='k3')
subq = Reg.select(Reg.key.alias('foo'), Reg)
query = subq.select_from(subq.c.foo).where(subq.c.foo.in_(['k1', 'k3']))
for row in query:
print(row.foo)
# k1
# k3
Another example, this is basically what select_from() does under-the-hood:
query = Select([subq], [subq.c.foo]).bind(db)
for row in query:
print(row)
# {'foo': 'k1'}
# {'foo': 'k2'}
# {'foo': 'k3'}
For your last/most-recent edit to your issue, replace unary minus with X * -1:
query = (Payments.select(fn.IF(
Payments.payment_status > 0,
Payments.payment_amount,
Payments.payment_amount * -1).alias("x")
).where(Payments.payment_amount > 200)
query2 = query.select_from(query.c.x).where(query.c.x < 0)

Related

Python Peewee EXISTS Subquery not working as expected

I am using the peewee ORM for a python application and I am trying to write code to fetch batches of records from a SQLite database. I have a subquery that seems to work by itself but when added to an update query the fn.EXISTS(sub_query) seems to have no effect as every record in the database is updated.
Note: I am using the APSW extension for peewee.
def batch_logic(self, id_1, path_1, batch_size=1000, **kwargs):
sub_query = (self.select(ModelClass.granule_id).distinct().where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == collection_id) &
(ModelClass.name.contains(provider_path))
).order_by(ModelClass.discovered_date.asc()).limit(batch_size)).limit(batch_size))
print(f'len(sub_query): {len(sub_query)}')
fb_st_2 = time.time()
updated_records= list(
(self.update(status='new_status').where(fn.EXISTS(sub_query)).returning(ModelClass))
)
print(f'update {len(updated_records)}: {time.time() - fb_st_2}')
db.close()
return updated_records
Below is output from testing locally:
id_1: id_1_1676475997_PQXYEQGJWR
len(sub_query): 2
update 20000: 1.0583274364471436
fetch_batch 20000: 1.1167597770690918
count_things 0: 0.02147078514099121
processed_things: 20000
The subquery is correctly returning 2 but the update query where(fn.EXISTS(sub_query)) seems to be ignored. Have I made a mistake in my understanding of how this works?
Edit 1: I believe GROUP BY is needed as rows can have the same granule_id and I need to fetch rows up to batch_size granule_ids

I think your use of UPDATE...WHERE EXISTS is incorrect or inappropriate here. This may work better for you:
# Unsure why you have a GROUP BY with no aggregation, that seems
# incorrect possibly, so I've removed it.
sub_query = (self.select(ModelClass.id)
.where(
(ModelClass.status == 'old_status') &
(ModelClass.collection_id == id_1) &
(ModelClass.name.contains(path_1)))
.order_by(ModelClass.discovered_date.asc())
.limit(batch_size))
update = (self.update(status='new_status')
.where(self.id.in_(sub_query))
.returning(ModelClass))
cursor = update.execute() # It's good to explicitly execute().
updated_records = list(cursor)
The key idea, at any rate, is I'm correlating the update with the subquery.

Dynamically search for null in sqlite select query using python

I'm new to python and I want to do a similar query to this one:
_c.execute('select * from cases where bi = ? and age = ? and
shape = ? and margin = ? and density = ?',(obj['bi'],
obj['age'], obj['margin'], obj['density']))
When some of the parameters are None, for example obj['bi'] = None, the query searches for the row when bi = 'None'. But I want it to search for the row when: 'bi is NULL'
A possible solution is to verify the values of the parameters one by one in a sequence of if-elses. For example:
query = 'select * from cases where'
if obj['bi'] is None:
query += ' bi is null'
else:
query += ' bi = ' + str(obj['bi']) + ' and '
...
# do the same if-else for the other parameters
...
_c.execute(query)
But, it doesn't seem to me as the best solution.
The question is, what is the best solution to the given problem and how to avoid SQL injections.

Okay, after firing up a python REPL and playing around with it a bit, it's simpler than I thought. The Python sqlite bindings turn a Python None into a SQL NULL, not into a string 'None' like it sounded like from your question. In SQL, = doesn't match NULL values, but IS will. So...
Given a table foo looking like:
a | b
--------------
NULL | 1
Dog | 2
Doing:
c = conn.cursor()
c.execute('SELECT * FROM foo WHERE a IS ?', (None,))
print(c.fetchone())
will return the (NULL, 1) row, and
c.execute('SELECT * FROM foo WHERE a IS ?', ('Dog',))
print(c.fetchone())
will return the ('Dog', 2) row.
In other words, use IS not = in your query.

SQLAlchemy Left join WHERE clause being converted to zeros and ones

Howdie do,
I have the following SQL, that I'm converting to SQLAlchemy:
select t1.`order_id`, t1.`status_type`
from `tracking_update` AS t1 LEFT JOIN `tracking_update` AS t2
ON (t1.`order_id` = t2.`order_id` AND t1.`last_updated` < t2.`last_updated`)
where t1.`order_id` = '21757'and t2.`last_updated` IS NULL
The SQL is just returning the latest tracking update for order id 21757. I'm accomplishing this by doing a left join back to the same table. In order to do this, I'm aliasing the table first:
tUAlias1 = aliased(TrackingUpdate)
tUalias2 = aliased(TrackingUpdate)
So far, this is what I have for my conversion to SQLAlchemy:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(and_(tUAlias1.order_id == '21757', tUalias2.last_updated is None))
And this is the result of the SQLAlchemy code that is executed on the server via log:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE 0 = 1
As you can see, the filter(WHERE clause) is now 0 = 1.
Now, if I remove the and_ statement and try two filters like so:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated is None)
I receive the same result. I know the SQL itself is fine as I can run it with no issue via MySQL workbench.
When SQL run directly, I will receive the following
order ID | Status
21757 D
Also, if I remove the tUalias2.last_updated is None, I actually receive some results, but they are not correct. This is the SQL Log for that:
Python code
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757')
SQLAlchemy run:
SELECT tracking_update_1.order_id AS tracking_update_1_order_id, tracking_update_1.status_type AS tracking_update_1_status_type
FROM tracking_update AS tracking_update_1 LEFT OUTER JOIN tracking_update AS tracking_update_2 ON tracking_update_1.order_id = tracking_update_2.order_id AND tracking_update_1.last_updated < tracking_update_2.last_updated
WHERE tracking_update_1.order_id = '21757'
Any ideas?

Howdie do,
I figured it out
The Python 'is' operator doesn't play nice with SQLAlchemy
I found this out thanks to the following S/O question:
Selecting Null values SQLAlchemy
I've since updated my query to the following:
tracking_updates = db.session.query(tUAlias1.order_id, tUAlias1.status_type).\
outerjoin(tUalias2, (tUAlias1.order_id == tUalias2.order_id) & (tUAlias1.last_updated < tUalias2.last_updated)).\
filter(tUAlias1.order_id == '21757').filter(tUalias2.last_updated == None)

The problem is not in how SqlAlchemy processes null values, the problem is that you use an operator which is not supported for instrumented' columns and thus the expressiontUalias2.last_updated is Noneevaluates to a value (False), which is then translated to eitherand 0=1. You should writetUalias2.last_updated.is_(None)instead oftUalias2.last_updated is None` to make your code work.

Knowing if the result of a SQL request must be a part of another SQL request result

Let's suppose I have the following table :
Id (int, Primary Key) | Value (varchar)
----------------------+----------------
1 | toto
2 | foo
3 | bar
I would like to know if giving two request, the result of the first must be contained in the result of the second without executing them.
Some examples :
# Obvious example
query_1 = "SELECT * FROM example;"
query_2 = "SELECT * FROM example WHERE id = 1;"
is_sub_part_of(query_2, query_1) # True
# An example we can't know before executing the two requests
query_1 = "SELECT * FROM example WHERE id < 2;"
query_2 = "SELECT * FROM example WHERE value = 'toto' or value = 'foo';"
is_sub_part_of(query_2, query_1) # False
# An example we can know before executing the two requests
query_1 = "SELECT * FROM example WHERE id < 2 OR value = 'bar';"
query_2 = "SELECT * FROM example WHERE id < 2 AND value = 'bar';"
is_sub_part_of(query_2, query_1) # True
# An example about columns
query_1 = "SELECT * FROM example;"
query_2 = "SELECT id FROM example;"
is_sub_part_of(query_2, query_1) # True
Do you know if there's a module in Python that is able to do that, or if it's even possible to do ?

Interesting problem. I don't know of any library that will do this for you. My thoughts:
Parse the SQL, see this for example.
Define which filtering operations can be added to a query that can only result in the same or a narrower result set. "AND x" can always be added, I think, without losing the property of being a subset. "OR x" can not. Anything else you can do to the query? For example "SELECT *", vs "SELECT x", vs "SELECT x, y".
Except for that, I can only say it's an interesting idea. You might get some more input on DBA. Is this an idea you're researching or is it related to a real-world problem you are solving, like optimizing a DB query? Maybe your question could be updated with information about this, since this is not a common way to optimize queries (unless you're working on the DB engine itself, I guess).

database field value that matches to every query

I would like to insert records into a sqlite database with fields such that every query that specifies a value for that field does not disqualify the record.
Make Model Engine Parameter
Ford * * 1
Ford Taurus * 2
Ford Escape * 3
So a query = (database.table.Make == Ford') & (database.table.Model == 'Taurus') would return the first two records
EDIT: thanks to woot, I decided to use the following: (database.table.Make.belongs('Ford','')) & (database.table.Model.belongs('Taurus','')) which is the syntax for the IN operator in web2py

Are you looking for something like this? It won't perform well due to the ORs if you have a lot of rows.
SELECT *
FROM Cars
WHERE ( Cars.Make = 'Ford' OR Cars.Make = '*' )
AND ( Cars.Model = 'Taurus' OR Cars.Model = '*' )
Here is a SQL Fiddle example.
If you meant to use NULL, you can just replace that and replace the OR condition with OR Cars.Make IS NULL, etc.
Or to make it maybe a little less verbose:
SELECT *
FROM Cars
WHERE Cars.Make IN ('Ford','*')
AND Cars.Model IN ('Taurus','*')
But you wouldn't be able to use NULL in this case and would have to use the * token.
SQL Fiddle

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to achieve simple subquery in peewee without join - python

Related

Python Peewee EXISTS Subquery not working as expected

Dynamically search for null in sqlite select query using python

SQLAlchemy Left join WHERE clause being converted to zeros and ones

Knowing if the result of a SQL request must be a part of another SQL request result

database field value that matches to every query

Categories

Resources