Dynamically add filter to SQLAlchemy TextClause

Dynamically add filter to SQLAlchemy TextClause - python

Assume I have a SQLAlchemy table which looks like:
class Country:
name = VARCHAR
population = INTEGER
continent = VARCHAR
num_states = INTEGER
My application allow seeing name and population for all Countries. So I have a TextClause which looks like
"select name, population from Country"
I allow raw queries in my application so I don't have option to change this to selectable.
At runtime, I want to allow my users to choose a field name and put a field value on which I want to allow filtering. eg: User can say I only want to see name and population for countries where Continent is Asia. So I dynamically want to add the filter
.where(Country.c.continent == 'Asia')
But I can't add .where to a TextClause.
Similarly, my user may choose to see name and population for countries where num_states is greater than 10. So I dynamically want to add the filter
.where(Country.c.num_states > 10)
But again I can't add .where to a TextClause.
What are the options I have to solve this problem?
Could subquery help here in any way?

Please add a filter based on the conditions. filter is used for adding where conditions in sqlalchemy.
Country.query.filter(Country.num_states > 10).all()
You can also do this:
query = Country.query.filter(Country.continent == 'Asia')
if user_input == 'states':
query = query.filter(Country.num_states > 10)
query = query.all()

This is not doable in a general sense without parsing the query. In relational algebra terms, the user applies projection and selection operations to a table, and you want to apply selection operations to it. Since the user can apply arbitrary projections (e.g. user supplies SELECT id FROM table), you are not guaranteed to be able to always apply your filters on top, so you have to apply your filters before the user does. That means you need to rewrite it to SELECT id FROM (some subquery), which requires parsing the user's query.
However, we can sort of cheat depending on the database that you are using, by having the database engine do the parsing for you. The way to do this is with CTEs, by basically shadowing the table name with a CTE.
Using your example, it looks like the following. User supplies query
SELECT name, population FROM country;
You shadow country with a CTE:
WITH country AS (
SELECT * FROM country
WHERE continent = 'Asia'
) SELECT name, population FROM country;
Unfortunately, because of the way SQLAlchemy's CTE support works, it is tough to get it to generate a CTE for a TextClause. The solution is to basically generate the string yourself, using a custom compilation extension, something like this:
class WrappedQuery(Executable, ClauseElement):
def __init__(self, name, outer, inner):
self.name = name
self.outer = outer
self.inner = inner
#compiles(WrappedQuery)
def compile_wrapped_query(element, compiler, **kwargs):
return "WITH {} AS ({}) {}".format(
element.name,
compiler.process(element.outer),
compiler.process(element.inner))
c = Country.__table__
cte = select(["*"]).select_from(c).where(c.c.continent == "Asia")
query = WrappedQuery("country", cte, text("SELECT name, population FROM country"))
session.execute(query)
From my tests, this only works in PostgreSQL. SQLite and SQL Server both treat it as recursive instead of shadowing, and MySQL does not support CTEs.

I couldn't find anything nice for this in the documentation for this. I ended up resorting to pretty much just string processing.... but at least it works!
from sqlalchemy.sql import text
query = """select name, population from Country"""
if continent is not None:
additional_clause = """WHERE continent = {continent};"""
query = query + additional_clause
text_clause = text(
query.format(
continent=continent,
),
)
else:
text_clause = text(query)
with sql_connection() as conn:
results = conn.execute(text_clause)
You could also chain this logic with more clauses, although you'll have to create a boolean flag for the first WHERE clause and then use AND for the subsequent ones.

Related

How to use variable column name in filter in Django ORM?

I have two tables BloodBank(id, name, phone, address) and BloodStock(id, a_pos, b_pos, a_neg, b_neg, bloodbank_id). I want to fetch all the columns from two tables where the variable column name (say bloodgroup) which have values like a_pos or a_neg... like that and their value should be greater than 0. How can I write ORM for the same?
SQL query is written like this to get the required results.
sql="select * from public.bloodbank_bloodbank as bb, public.bloodbank_bloodstock as bs where bs."+blood+">0 and bb.id=bs.bloodbank_id order by bs."+blood+" desc;"
cursor = connection.cursor()
cursor.execute(sql)
bloodbanks = cursor.fetchall()

You could be more specific in your questions, but I believe you have a variable called blood which contains the string name of the column and that the columns a_pos, b_pos, etc. are numeric.
You can use a dictionary to create keyword arguments from strings:
filter_dict = {bloodstock__blood + '__gt': 0}
bloodbanks = Bloodbank.objects.filter(**filter_dict)
This will get you Bloodbank objects that have a related bloodstock with a greater than zero value in the bloodgroup represented by the blood variable.
Note that the way I have written this, you don't get the bloodstock columns selected, and you may get duplicate bloodbanks. If you want to get eliminate duplicate bloodbanks you can add .distinct() to your query. The bloodstocks are available for each bloodbank instance using .bloodstock_set.all().
The ORM will generate SQL using a join. Alternatively, you can do an EXISTS in the where clause and no join.
from django.db.models import Exists, OuterRef
filter_dict = {blood + '__gt': 0}
exists = Exists(Bloodstock.objects.filter(
bloodbank_id=OuterRef('id'),
**filter_dict
)
bloodbanks = Bloodbank.objects.filter(exists)
There will be no need for a .distinct() in this case.

function to reduce redundancy for reading database in sqlite3

Hi guys:) I'm a newbie at programming and would like to ask for help in creating a function to help reduce redundancy in my code. I have successfully created a database holding 5 different tables for data of different countries. All tables have the same structure (see attached screenshots for reference). My objective is to calculate the summation of all rows within all the different tables for a particular parameter (type of pollution).
I have managed to write code to only select the particular data I need of one country (I tried writing code to calculate the summation but I can't figure that out, so I decided to just select the data and then manually calculate the values myself with a calculator -I know that sort of defeats the purpose of programming but at my programming level (beginner) I feel like it's the only way that I can do the code) my issue is that I have five countries, so I don't want to repeat the same block of code for the different countries. this is my code for one country:
def read_MaltaData():
conn = sqlite3.connect('FinalProjectDatabase.sqlite3')
Malta = conn.cursor()
Malta.execute("SELECT * FROM MaltaData WHERE AirPollutant = 'PM10'")
result = Malta.fetchall()
print(result)
my result is this:
[('Malta', 'Valletta', 'MT00005', 'Msida', 'PM10', 64.3, 'ug/m3', 'Traffic', 'urban', 14.489985999999998, 35.895835999489535, 2.0), ('Malta', None, etc.
(I am going to manually calculate the data I require -in this case 64.3 + the value from the next row- as I don't know how to do it in python)
To clarify, my aim isn't to have a sum total across all the tables as one whole value (i.e. I don't want to add the values of all the countries all together). My desired output should look something like this:
Malta summation value
italy summation value
france summation value
and not like this
countries all together = one whole value (i.e. all summation values added together)
I would greatly appreciate any help I can get. Unfortunately I am not able to share the database with you, which is why I am sharing screenshots of it instead.
image of all 5 different tables in one database:
image of one table (all tables look the same, just with different values)

You can use UNION ALL to get a row for each country:
SELECT 'France' country, SUM(AirPolutionLevel) [summation value] FROM FranceData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Germany' country, SUM(AirPolutionLevel) [summation value] FROM GermanyData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Italy' country, SUM(AirPolutionLevel) [summation value] FROM ItalyData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Malta' country, SUM(AirPolutionLevel) [summation value] FROM MaltaData WHERE AirPollutant = 'PM10'
UNION ALL
SELECT 'Poland' country, SUM(AirPolutionLevel) [summation value] FROM PolandData WHERE AirPollutant = 'PM10'

If you pass the country name as argument to the data retrieval function, you can generate the table names dynamically (note the f-string arguments in execute and print):
First draft
def print_CountryData(country):
conn = sqlite3.connect('FinalProjectDatabase.sqlite3')
cur = conn.cursor()
cur.execute(f"SELECT SUM(AirPollutionLevel) FROM {country}Data WHERE AirPollutant = 'PM10'")
sumVal = cur.fetchone()[0]
print(f"{country} {sumVal}")
# example call:
for country in ('France', 'Germany', 'Italy', 'Malta', 'Poland'):
print_CountryData(country)
While building query strings your own with simple string functions is discouraged in the sqlite3 documentation for security reasons, in your very case where you have total control of the actual arguments I'd consider it as safe.
This answer adapts the summation from the great answer given by forpas but refuses to move the repetition to SQL. It also shows both integration with python and output formatting.
MRE-style version
This is an improved version of my first answer, transformed into a Minimal, Reproducible Example and combined with output. Also, some performance improvements were made, for instance opening the database only once.
import sqlite3
import random # to simulate actual pollution values
# Countries we have data for
countries = ('France', 'Germany', 'Italy', 'Malta', 'Poland')
# There is one table for each country
def tableName(country):
return country+'Data'
# Generate minimal version of tables filled with random data
def setup_CountryData(cur):
for country in countries:
cur.execute(f'''CREATE TABLE {tableName(country)}
(AirPollutant text, AirPollutionLevel real)''')
for i in range(5):
cur.execute(f"""INSERT INTO {tableName(country)} VALUES
('PM10', {100*random.random()})""")
# Get sum up pollution data for each country
def print_CountryData(cur):
for country in countries:
cur.execute(f"""SELECT SUM(AirPollutionLevel) FROM
{tableName(country)} WHERE AirPollutant = 'PM10'""")
sumVal = cur.fetchone()[0]
print(f"{country:10} {sumVal:9.5f}")
# For testing, we use an in-memory database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()
setup_CountryData(cur)
# The functionality actually required
print_CountryData(cur)
Sample output:
France 263.79430
Germany 245.20942
Italy 225.72068
Malta 167.72690
Poland 290.64190
It's often hard to evaluate a solution without actually trying it. That's the reason why questioners on StackOverflow are constantly encouraged to ask in this style: it makes it much more likely someone will understand and fix the problem ... quickly

If the database is not too big you could use pandas.
This approach is less efficient than using SQL queries directly but can be used if you want to explore the data interactively in a notebook for example.
You can create a dataframe from your SQLite db using pandas.read_sql_query
and then perform your calculation using pandas.DataFrame methods, which are designed for this type of tasks.
For your specific case:
import sqlite3
import pandas as pd
conn = sqlite3.connect(db_file)
query = "SELECT * FROM MaltaData WHERE AirPollutant = 'PM10'"
df = pd.read_sql_query(query, conn)
# check dataframe content
print(df.head())
If I understood and then you want to compute the sum of the values in a given column:
s = df['AirPollutionLevel'].sum()
If you have missing values you might want to fill them with 0s before summing:
s = df['AirPollutionLevel'].fillna(0).sum()

How to convert SQL scalar subquery to SQLAlchemy expression

I need a litle help with expressing in SQLAlchemy language my code like this:
SELECT
s.agent_id,
s.property_id,
p.address_zip,
(
SELECT v.valuation
FROM property_valuations v WHERE v.zip_code = p.address_zip
ORDER BY ABS(DATEDIFF(v.as_of, s.date_sold))
LIMIT 1
) AS back_valuation,
FROM sales s
JOIN properties p ON s.property_id = p.id
Inner subquery aimed to get property value from table propert_valuations with columns (zip_code INT, valuation DECIMAL, as_if DATE) closest to the date of sale from table sales. I know how to rewrite it but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
Currently I have following queries:
subquery = (
session.query(PropertyValuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale).join(Sale.property_)
How to combine these queries together?

How to combine these queries together?
Use as_scalar(), or label():
subquery = (
session.query(PropertyValuation.valuation)
.filter(PropertyValuation.zip_code == Property.address_zip)
.order_by(func.abs(func.datediff(PropertyValuation.as_of, Sale.date_sold)))
.limit(1)
)
query = session.query(Sale.agent_id,
Sale.property_id,
Property.address_zip,
# `subquery.as_scalar()` or
subquery.label('back_valuation'))\
.join(Property)
Using as_scalar() limits returned columns and rows to 1, so you cannot get the whole model object using it (as query(PropertyValuation) is a select of all the attributes of PropertyValuation), but getting just the valuation attribute works.
but I completely stuck on order_by expression - I cannot prepare subquery to pass ordering member later.
There's no need to pass it later. Your current way of declaring the subquery is fine as it is, since SQLAlchemy can automatically correlate FROM objects to those of an enclosing query. I tried creating models that somewhat represent what you have, and here's how the query above works out (with added line-breaks and indentation for readability):
In [10]: print(query)
SELECT sale.agent_id AS sale_agent_id,
sale.property_id AS sale_property_id,
property.address_zip AS property_address_zip,
(SELECT property_valuations.valuation
FROM property_valuations
WHERE property_valuations.zip_code = property.address_zip
ORDER BY abs(datediff(property_valuations.as_of, sale.date_sold))
LIMIT ? OFFSET ?) AS back_valuation
FROM sale
JOIN property ON property.id = sale.property_id

sqlalchemy query: result column names

I have a model (declared using Declarative base) called DevicesGpsTelemetry. I make query like this:
models = session.query(
DevicesGps.ReceivedDateUtc,
DevicesGps.ReceivedTimeUtc,
DevicesGps.Latitude,
DevicesGps.Longitude)
And it renders as:
SELECT
devices_gps."ReceivedDateUtc" AS "devices_gps_ReceivedDateUtc",
devices_gps."ReceivedTimeUtc" AS "devices_gps_ReceivedTimeUtc",
devices_gps."Latitude" AS "devices_gps_Latitude",
devices_gps."Longitude" AS "devices_gps_Longitude"
FROM devices_gps
My question: how to change the names which go after AS statement (like "gps_telemetry_ReceivedDateUtc") to something I want?
Background: these names are important for me because I do pandas.read_sql with this query and the names become DataFrame's column names

Add .label('desired_name') after each column. In your case it would look like
models = session.query(
DevicesGps.ReceivedDateUtc.label("gps_telemetry_ReceivedDateUtc"),
DevicesGps.ReceivedTimeUtc.label("gps_telemetry_ReceivedTimeUtc"),
DevicesGps.Latitude.label("gps_telemetry_Latitude"),
DevicesGps.Longitude.label("gps_telemetry_Longitude")
)

How to count rows with SELECT COUNT(*) with SQLAlchemy?

I'd like to know if it's possible to generate a SELECT COUNT(*) FROM TABLE statement in SQLAlchemy without explicitly asking for it with execute().
If I use:
session.query(table).count()
then it generates something like:
SELECT count(*) AS count_1 FROM
(SELECT table.col1 as col1, table.col2 as col2, ... from table)
which is significantly slower in MySQL with InnoDB. I am looking for a solution that doesn't require the table to have a known primary key, as suggested in Get the number of rows in table using SQLAlchemy.

Query for just a single known column:
session.query(MyTable.col1).count()

I managed to render the following SELECT with SQLAlchemy on both layers.
SELECT count(*) AS count_1
FROM "table"
Usage from the SQL Expression layer
from sqlalchemy import select, func, Integer, Table, Column, MetaData
metadata = MetaData()
table = Table("table", metadata,
Column('primary_key', Integer),
Column('other_column', Integer) # just to illustrate
)
print select([func.count()]).select_from(table)
Usage from the ORM layer
You just subclass Query (you have probably anyway) and provide a specialized count() method, like this one.
from sqlalchemy.sql.expression import func
class BaseQuery(Query):
def count_star(self):
count_query = (self.statement.with_only_columns([func.count()])
.order_by(None))
return self.session.execute(count_query).scalar()
Please note that order_by(None) resets the ordering of the query, which is irrelevant to the counting.
Using this method you can have a count(*) on any ORM Query, that will honor all the filter andjoin conditions already specified.

I needed to do a count of a very complex query with many joins. I was using the joins as filters, so I only wanted to know the count of the actual objects. count() was insufficient, but I found the answer in the docs here:
http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
The code would look something like this (to count user objects):
from sqlalchemy import func
session.query(func.count(User.id)).scalar()

Addition to the Usage from the ORM layer in the accepted answer: count(*) can be done for ORM using the query.with_entities(func.count()), like this:
session.query(MyModel).with_entities(func.count()).scalar()
It can also be used in more complex cases, when we have joins and filters - the important thing here is to place with_entities after joins, otherwise SQLAlchemy could raise the Don't know how to join error.
For example:
we have User model (id, name) and Song model (id, title, genre)
we have user-song data - the UserSong model (user_id, song_id, is_liked) where user_id + song_id is a primary key)
We want to get a number of user's liked rock songs:
SELECT count(*)
FROM user_song
JOIN song ON user_song.song_id = song.id
WHERE user_song.user_id = %(user_id)
AND user_song.is_liked IS 1
AND song.genre = 'rock'
This query can be generated in a following way:
user_id = 1
query = session.query(UserSong)
query = query.join(Song, Song.id == UserSong.song_id)
query = query.filter(
and_(
UserSong.user_id == user_id,
UserSong.is_liked.is_(True),
Song.genre == 'rock'
)
)
# Note: important to place `with_entities` after the join
query = query.with_entities(func.count())
liked_count = query.scalar()
Complete example is here.

If you are using the SQL Expression Style approach there is another way to construct the count statement if you already have your table object.
Preparations to get the table object. There are also different ways.
import sqlalchemy
database_engine = sqlalchemy.create_engine("connection string")
# Populate existing database via reflection into sqlalchemy objects
database_metadata = sqlalchemy.MetaData()
database_metadata.reflect(bind=database_engine)
table_object = database_metadata.tables.get("table_name") # This is just for illustration how to get the table_object
Issuing the count query on the table_object
query = table_object.count()
# This will produce something like, where id is a primary key column in "table_name" automatically selected by sqlalchemy
# 'SELECT count(table_name.id) AS tbl_row_count FROM table_name'
count_result = database_engine.scalar(query)

I'm not clear on what you mean by "without explicitly asking for it with execute()" So this might be exactly what you are not asking for.
OTOH, this might help others.
You can just run the textual SQL:
your_query="""
SELECT count(*) from table
"""
the_count = session.execute(text(your_query)).scalar()

def test_query(val: str):
query = f"select count(*) from table where col1='{val}'"
rtn = database_engine.query(query)
cnt = rtn.one().count
but you can find the way if you checked debug watch

query = session.query(table.column).filter().with_entities(func.count(table.column.distinct()))
count = query.scalar()
this worked for me.
Gives the query:
SELECT count(DISTINCT table.column) AS count_1
FROM table where ...

Below is the way to find the count of any query.
aliased_query = alias(query)
db.session.query(func.count('*')).select_from(aliased_query).scalar()
Here is the link to the reference document if you want to explore more options or read details.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Dynamically add filter to SQLAlchemy TextClause - python

Related

How to use variable column name in filter in Django ORM?

function to reduce redundancy for reading database in sqlite3

How to convert SQL scalar subquery to SQLAlchemy expression

sqlalchemy query: result column names

How to count rows with SELECT COUNT(*) with SQLAlchemy?

Categories

Resources