Error with params in Pandas read_sql_query - python

I have a simple SQl query and i want to use params, but it returns a database error when I add the param. If I change the dynamic param to a fixed value, it works.
My code is quite simple:
sql = """ SELECT p.description, SUM(p.price) as price
FROM product p
WHERE p.extenal_id = ?
GROUP BY p.description """
result = pd.read_sql_query(sql, con=conn, params=[202101])
The error is:
SyntaxError: syntax error at or near "GROUP"
LINE 4: p.external_id = ? GROUP BY p.description...
The problem can't be the query since if I change the ? by 202101 and remove the params in read_sql_query it works.
What am I doing wrong?
I'm using Google Colab

#use f-string is better
params=[202101]
sql = f""" SELECT p.description, SUM(p.price) as price
FROM product p
WHERE p.extenal_id = {params}
GROUP BY p.description """
result = pd.read_sql_query(sql, con=conn)
#use tuple for more items
params=tuple([202101, 202102, 202103])
sql = f""" SELECT p.description, SUM(p.price) as price
FROM product p
WHERE p.extenal_id in {params}
GROUP BY p.description """
result = pd.read_sql_query(sql, con=conn)

Related

Prestodb + Python: Using a List as Query Argument

I'm trying to use prestodb in Python and pass a list of numbers as an argument in a query and it's giving this error:
PrestoUserError: PrestoUserError(type=USER_ERROR, name=TYPE_MISMATCH, message="line 208:33: IN value and list items must be the same type: bigint", query_id=20211122_175131_24052_rruhu)
The code is similar to this:
import prestodb
from prestodb import dbapi
import os
conn=prestodb.dbapi.connect(
host=os.environ['aa'],
port=os.environ['bb'],
user=os.environ['cc'],
password=os.environ['dd'],
catalog='hive'
)
date_start = '2021-10-10'
date_end = '2021-10-15'
list_id = (1,2,3,4)
sql = '''
SELECT
*
FROM
table
WHERE
DATE BETWEEN '{date_start}'
AND '{date_end}'
AND ID in ({list_id})
'''.format(date_start=date_start,date_end=date_end,list_id=list_id)
cur = conn.cursor()
cur.execute(sql)
query_result = cur.fetchall()
format will not join the list_id correctly. Try combining ids into comma separated strings with ','.join(map(str, list_id)):
sql = '''
SELECT
*
FROM
table
WHERE
DATE BETWEEN '{date_start}'
AND '{date_end}'
AND ID in ({list_id})
'''.format(date_start=date_start,date_end=date_end,list_id=','.join(map(str, list_id)))
UPD
Or, as suggested by #Tomerikoo - just str(list_id) and remove extra parenthesis from the format:
sql = '''
SELECT
*
FROM
table
WHERE
DATE BETWEEN '{date_start}'
AND '{date_end}'
AND ID in {list_id}
'''.format(date_start=date_start,date_end=date_end,list_id=str(list_id))

Python Script to get multi table counts

I'm trying to write a python script to get a count of some tables for monitoring which looks a bit like the code below. I'm trying to get an output such as below and have tried using python multi-dimensional arrays but not having any luck.
Expected Output:
('oltptransactions:', [(12L,)])
('oltpcases:', [(24L,)])
Script:
import psycopg2
# Connection with the DataBase
conn = psycopg2.connect(user = "appuser", database = "onedb", host = "192.168.1.1", port = "5432")
cursor = conn.cursor()
sql = """SELECT COUNT(id) FROM appuser.oltptransactions"""
sql2 = """SELECT count(id) FROM appuser.oltpcases"""
sqls = [sql,sql2]
for i in sqls:
cursor.execute(i)
result = cursor.fetchall()
print('Counts:',result)
conn.close()
Current output:
[root#pgenc python_scripts]# python multi_getrcount.py
('Counts:', [(12L,)])
('Counts:', [(24L,)])
Any help is appreciated.
Thanks!
I am a bit reluctant to show this way, because best practices recommend to never build a dynamic SQL string but always use a constant string and parameters, but this is one use case where computing the string is legit:
a table name cannot be a parameter in SQL
the input only comes from the program itself and is fully mastered
Possible code:
sql = """SELECT count(*) from appuser.{}"""
tables = ['oltptransactions', 'oltpcases']
for t in tables:
cursor.execute(sql.format(t))
result = cursor.fetchall()
print("('", t, "':,", result, ")")
I believe something as below, Unable to test code because of certificate issue.
sql = """SELECT 'oltptransactions', COUNT(id) FROM appuser.oltptransactions"""
sql2 = """SELECT 'oltpcases', COUNT(id) FROM appuser.oltpcases"""
sqls = [sql,sql2]
for i in sqls:
cursor.execute(i)
for name, count in cursor:
print ("")
Or
sql = """SELECT 'oltptransactions :'||COUNT(id) FROM appuser.oltptransactions"""
sql2 = """SELECT 'oltpcases :'||COUNT(id) FROM appuser.oltpcases"""
sqls = [sql,sql2]
for i in sqls:
cursor.execute(i)
result = cursor.fetchall()
print(result)

Bad Request error while querying data from bigquery in a loop

I am querying data from bigquery using get_data_from_bq method mentioned below in a loop:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=job_config)
return query_job.result()
While for the first query(iteration) data returned is correct, all the subsequent queries are throwing the below-mentioned exception
results = query_job.result()
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
super(QueryJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f
Edit 1:
Below is a sample query which is throwing the above exception. Also, this is running smoothly in bigquery console.
select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;
I had the exact same issue. The problem is not the query itself, it's that you are most likely reusing the same QueryJobConfig. When you perform a query, unless you set a destination, BigQuery stores the result in an anonymous table which is stated in the QueryJobConfig object. If you reuse this configuration, BigQuery tries to store the new result in the same anonymous table, hence the error.
I don't particularly like this behaviour, to be honest.
You should rewrite your code like that:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=QueryJobConfig())
return query_job.result()
Hope this helps!
Edited:
Federico Bertola is correct on the solution and the temporary table that is written to by BigQuery see this link.
I did not get an error with my sample code querying from a public table last time, but I can reproduce the error today, so it is possible this symptom can appear intermittent. I can confirm the error is resolved with Federico’s suggestion.
You can get the “super(QueryJob, self).result(timeout=timeout)” error when the query string lacks quotes around the parameters in the query. It seems you have made a similar mistake with the parameter format_strings in your query. You can fix this problem by ensuring there is quotes escaped around the parameter:
(" + myparam + ")
, should be written as
(\"" + myparam + "\")
You should examine your query string where you use parameters, and start with a simpler query such as
select productId, eventType, count(*) as count from `xyz:xyz.abc`
, and grow your query as you go.
For the record, here is what worked for me:
from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig()
def get_data_from_bq(myparam):
query = "SELECT word, SUM(word_count) as count FROM `publicdata.samples.shakespeare` WHERE word IN (\""+myparam+"\") GROUP BY word;"
query_job = client.query(query, job_config=job_config)
return query_job.result()
mypar = "raisin"
x = 1
while (x<9):
iterator = get_data_from_bq(mypar)
print "==%d iteration==" % x
x += 1

How to execute join queries of MySQL in Django?

My SQL query which runs perfectly in terminal looks like this:
select t.txid, t.from_address, t.to_address, t.value, t.timestamp,
t.conformations, t.spent_flag,t.spent_txid from transaction_details t
where t.to_address =(select distinct a.address from address_master a
inner join panel_user p on a.user = p.user and a.user= "auxesis");
Now I tried using it in Django like this:
sql = """ select t.txid, t.from_address, t.to_address,t.value, t.timestamp, t.conformations, t.spent_flag,t.spent_txid from
transaction_details t where t.to_address =(select distinct a.address from
address_master a inner join panel_user p on a.user = p.user and a.user= "%s" """),%(user)
cursor.execute(sql)
res = cursor.fetchall()
But ya its not working. So any one please help me with it?
You're trying to use string formatting to build an SQL query. Don't do that, use parameterized queries. If you do that, you don't add quotes around the placeholders, the database connector will handle escaping of the parameters for you. Just pass the arguments as a tuple:
sql = """ select t.txid, t.from_address, t.to_address,t.value, t.timestamp, t.conformations, t.spent_flag,t.spent_txid from
transaction_details t where t.to_address =(select distinct a.address from
address_master a inner join panel_user p on a.user = p.user and a.user = %s """)
cursor.execute(sql, (user,))
res = cursor.fetchall()

How can I get int result from web python sql query?

My code is as follows. My question is, how can I get the int result from the query?
import web
db = web.database(...)
rs = db.query('select max(id) from tablename')
The query method returns a list. Then you can access your calculated column by using an alias. If you don't want to use an alias, then I think you can do rs[0]['max(id)']
rs = db.query('select max(id) as max_value from tablename')
print rs[0].max_value
This is based on the example used in the web.py docs: http://webpy.org/cookbook/query
Here is the example from the link:
import web
db = web.database(dbn='postgres', db='mydata', user='dbuser', pw='')
results = db.query("SELECT COUNT(*) AS total_users FROM users")
print results[0].total_users # -> prints number of entries in 'users' table

Categories