How to correctly use SQL joins/subqueries in Sqlalchemy - python

Say I have the following SQL code and I want to change it to Sqlalchemy:
SELECT amount FROM table1
JOIN table2
ON table2.id = table1.b_id
JOIN (SELECT id FROM table3 WHERE val1 = %s AND val2 = %s) inst
ON inst.id = table1.i_id
WHERE
val3 = %s
I've tried making a subquery for the SELECT id FROM table3 clause as follows:
subq = session.query(table3.id).filter(and_(table3.val1 == 'value', table3.val2 == 'value')).subquery()
And then putting everything together:
query = session.query(table1).join(table2).filter(table2.id == table1.b_id).\
join(subq).filter(table1.val3 == 'value')
When I ouput query.first().amount, this works for a few examples, but for some queries I'm getting no results when there should be something there, so I must be messing up somewhere. Any ideas where I'm going wrong? Thanks

Query below should produce exactly the SQL you have. It is not much different from your, but removes some unnecessary things.
So if it does not work, then also your original SQL might not work. Therefore, I assume that your issue is not SQL but either data or the parameters for that query. And you can always print out the query itself by engine.echo = True.
val1, val2, val3 = 'value', 'value', 'value' # #NOTE: specify filter values
subq = (session.query(table3.id)
.filter(and_(table3.val1 == val1, table3.val2 == val2))
).subquery(name='inst')
quer = (
session.query(table1.amount) # #NOTE: select only one column
.join(table2) # #NOTE: no need for filter(...)
.join(subq)
.filter(table1.val3 == val3)
).first()
print(quer and quer.amount)

Related

SQL query f string formatting in Python script

I have been trying to apply an f string formatting for the colname parameter inside the SQL query for a script I am building, but I keep getting a parse exception error.
def expect_primary_key_have_relevant_foreign_key(spark_df1, spark_df2, colname):
'''
Check that all the primary keys have a relevant foreign key
'''
# Create Temporary View
spark_df1.createOrReplaceTempView("spark_df1")
spark_df2.createOrReplaceTempView("spark_df2")
# Wrap Query in spark.sql
result = spark.sql("""
select df1.*
from spark_df1 df1
left join
spark_df2 df2
f"on trim(upper(df1.{colname})) = trim(upper(df2.{colname}))"
f"where df2.{colname} is null"
""")
if result == 0:
print("Validation Passed!")
else:
print("Validation Failed!")
return result
I found the solution, the f goes before the triple quotes """ as:
# Wrap Query in spark.sql
result = spark.sql(f"""
select df1.*
from spark_df1 df1
left join
spark_df2 df2
on trim(upper(df1.{colname})) = trim(upper(df2.{colname}))
where df2.{colname} is null
""")

Can query name variable be parameterized in Python API?

The following API function works. But I would like to parameterize the query name so that I don't have to use if...else. Instead, I would like to be able to take the parameter from the query url, concatenate it to the query name variable and execute the query.
I would like to be able to stick "qry_+ <reportId>" and use it as the query name variables like qry_R01, qry_R02, or qry_R03. Is it possible?
def get_report():
reportId = request.args.get('reportId', '')`
qry_R01 = """
SELECT
column1,
column2,
column3,
FROM
table1
"""
qry_R02 = """
SELECT
column1,
column2,
column3,
FROM
table2
"""
qry_R03 = """
SELECT
column1,
column2,
column3,
FROM
table3
"""
db = OracleDB('DB_RPT')
if (rptId == 'R01'):
db.cursor.execute(qry_R01,
)
elif (rptId == 'R02'):
db.cursor.execute(qry_R02,
)
elif (rptId == 'R03'):
db.cursor.execute(qry_R03,
)
json_data = db.render_json_data('json_arr')
db.connection.close()
return json_data
It seems what you need in this case is to map from reportId to the table, the rest of the query is identical. The below solutions uses a dictionary and str.format():
def get_report():
reportId = request.args.get('reportId', '')
# this maps ours possible report IDs to their relevant query suffixes
reportTableMap = {
'R01': 'table1',
'R02': 'table2',
'R03': 'table3',
}
# ensure this report ID is valid, else we'll end up with a KeyError later on
if reportId not in reportTableMap:
return 'Error: invalid report'
baseQuery = '''
SELECT
column1,
column2,
column3,
FROM {table}
'''
db = OracleDB('DB_RPT')
db.cursor.execute(baseQuery.format(table=reportTableMap[reportId]))
json_data = db.render_json_data('json_arr')
db.connection.close()
return json_data
This solution only works for fairly simple cases, though, and does risk leaving open a SQL injection attack. A better solution would be to use prepared statements but the exact code to do that depends on the database driver being used.

Get primary keys after update in SQLAlchemy Core?

I've been using .inserted_primary_key in SQLAlchemy to get primary keys after inserts, for example:
my_id = sql_conn.execute(my_table.insert(my_data_dict)).inserted_primary_key
Is there a way to get the same thing after an update? Like:
my_id = sql_conn.execute(
my_table.update().where(some_cond).values(my_data_dict)).updated_primary_key
I'm on MySQL and I could do this with actual SQL like:
SET #update_id := 0;
UPDATE some_table SET column_name = 'value', id = (SELECT #update_id := id)
WHERE some_other_column = 'blah' LIMIT 1;
SELECT #update_id;
Any way to mimic that, or something like it?

How can I get the youngest objects from SQLAlchemy?

Each row in my table has a date. The date is not unique. The same date is present more than one time.
I want to get all objects with the youngest date.
My solution work but I am not sure if this is a elegent SQLAlchemy way.
query = _session.query(Table._date) \
.order_by(Table._date.desc()) \
.group_by(Table._date)
# this is the younges date (type is date.datetime)
young = query.first()
query = _session.query(Table).filter(Table._date==young)
result = query.all()
Isn't there a way to put all this in one query object or something like that?
You need a having clause, and you need to import the max function
then your query will be:
from sqlalchemy import func
stmt = _session.query(Table) \
.group_by(Table._date) \
.having(Table._date == func.max(Table._date)
This produces a sql statement like the following.
SELECT my_table.*
FROM my_table
GROUP BY my_table._date
HAVING my_table._date = MAX(my_table._date)
If you construct your sql statement with a select, you can examine the sql produced in your case using. *I'm not sure if this would work with statements query
str(stmt)
Two ways of doing this using a sub-query:
# #note: do not need to alias, but do in order to specify `name`
T1 = aliased(MyTable, name="T1")
# version-1:
subquery = (session.query(func.max(T1._date).label("max_date"))
.as_scalar()
)
# version-2:
subquery = (session.query(T1._date.label("max_date"))
.order_by(T1._date.desc())
.limit(1)
.as_scalar()
)
qry = session.query(MyTable).filter(MyTable._date == subquery)
results = qry.all()
The output should be similar to:
# version-1
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT max("T1"._date) AS max_date
FROM my_table AS "T1")
# version-2
SELECT my_table.id AS my_table_id, my_table.name AS my_table_name, my_table._date AS my_table__date
FROM my_table
WHERE my_table._date = (
SELECT "T1"._date AS max_date
FROM my_table AS "T1"
ORDER BY "T1"._date DESC LIMIT ? OFFSET ?
)

When add four parameter of values, annotate sum not work

I'm using Django 1.4 and Python 2.7.
I'm doing a Sum of some values... when I do this, this work perfect:
CategoryAnswers.objects.using('mam').filter(category=cat["category"], brand=cat["brand"], category__segment_category=cat["category__segment_category"]).values('category__name', 'brand__name','brand__pk').annotate(total=Sum('answer'))
And generate a query:
SELECT `category`.`name`, `brand`.`name`, `category_answers`.`brand_id`, SUM(`category_answers`.`answer`) AS `total`
FROM `category_answers`
INNER JOIN `category`
ON (`category_answers`.`category_id` = `category`.`id`)
INNER JOIN `brand`
ON (`category_answers`.`brand_id` = `brand`.`id`)
WHERE (`category_answers`.`category_id` = 6 AND
`category_answers`.`brand_id` = 1 AND
`category`.`segment_category_id` = 1 )
GROUP BY `category`.`name`, `brand`.`name`, `category_answers`.`brand_id`
ORDER BY NULL
But when I add a new value, this not work:
CategoryAnswers.objects.using('mam').order_by().filter(category=cat["category"], brand=cat["brand"], category__segment_category=cat["category__segment_category"]).values('category__name','category__pk','brand__name','brand__pk').annotate(total=Sum('answer'))
Seeing the query that is returned, the problem is django add on group by a wrong field (category_answers.id):
SELECT `category`.`name`, `category_answers`.`category_id`, `brand`.`name`, `category_answers`.`brand_id`,
SUM(`category_answers`.`answer`) AS `total`
FROM `category_answers`
INNER JOIN `category`
ON (`category_answers`.`category_id` = `category`.`id`)
INNER JOIN `brand`
ON (`category_answers`.`brand_id` = `brand`.`id`)
WHERE (`category_answers`.`category_id` = 6 AND
`category_answers`.`brand_id` = 1 AND
`category`.`segment_category_id` = 1 )
GROUP BY `category_answers`.`id`, `category`.`name`, `category_answers`.`category_id`, `brand`.`name`, `category_answers`.`brand_id`
ORDER BY NULL
If I remove any parameter this work, so I do not believe this to be problem specific parameter... Am I doing something wrong?
I can't resolve this, so... I do this with raw SQL query:
cursor = connections["mam"].cursor()
cursor.execute("SELECT B.name, A.category_id, A.brand_id, SUM(A.answer) AS total, C.name FROM category_answers A INNER JOIN category B ON A.category_id = B.id INNER JOIN brand C ON A.brand_id = C.id WHERE A.brand_id = %s AND A.category_id = %s AND B.segment_category_id = %s", [cat["brand"],cat["category"],cat["category__segment_category"]])
c_answers = cursor.fetchone()
This is not the best way, but it's works. :)

Categories