I have the following Python code :
params = {}
query = 'SELECT * FROM LOGS '
if(date_from and date_to):
query += ' WHERE LOG_DATE BETWEEN TO_DATE(:date_start, "MM-DD-YYYY") AND LOG_DATE <= TO_DATE(:date_end, "MM-DD-YYYY")'
params['date_start'] = date_from
params['date_end'] = date_to
if(structure):
query += ' AND STRUCTURE=:structure_val'
params['structure_val'] = structure
if(status):
query += ' AND STATUS =:status'
params['status'] = status
cursor.execute(query, params)
Here I am conditionally adding the WHERE clause to the query. But there is an issue when I don't have value for the dates as it will not take the WHERE and will add AND without WHERE. If I add the where clause with the query, if there is no filter, then it will give the wrong query. Is there any better way to do this ? I have been using Laravel for sometime and it's query builder has a method when, which will help to add conditional where clauses. Anything like this in Python for cx_Oracle ?
params = {}
query = 'SELECT * FROM LOGS '
query_conditions = []
if(date_from and date_to):
query_conditions.apend(' WHERE LOG_DATE BETWEEN TO_DATE(:date_start, "MM-DD-YYYY") AND LOG_DATE <= TO_DATE(:date_end, "MM-DD-YYYY")')
params['date_start'] = date_from
params['date_end'] = date_to
if(structure):
query_conditions.append('STRUCTURE=:structure_val')
params['structure_val'] = structure
if(status):
query_conditions.append('STATUS =:status')
params['status'] = status
if query_conditions:
query += " AND ".join(query_conditions)
cursor.execute(query, params)
add them in list and join values with AND
Related
Here is the code:
#run cassandra query to get date value
result = session.execute('select max(process_date) as process_date_max from keyspace1.job_run ')
for row in result:
date_last_run = row.process_date_max
date_last_run = str(date_last_run)
sql = """
select * from table where modifieddate > {last_run}
""".format(last_run=date_last_run))
df = mssql.get_pandas_df(sql)
I get error: cannot compare value of datetime2 with int. Please help here as i didn't find any solution on internet so far.
The solution in very simple you just need to put the ' before and after the {last_run} and your code will work
See the below example
result = session.execute('select max(process_date) as process_date_max from keyspace1.job_run ')
for row in result:
date_last_run = row.process_date_max
date_last_run = str(date_last_run)
sql = """
select * from table where modifieddate > '{last_run}'
""".format(last_run=date_last_run))
df = mssql.get_pandas_df(sql)
Method2:
sql = f" select * from table where modifieddate > '{last_run}'"
The above function has parameters endTime, startTime, list1 and column_filter to it and I am trying to read a query by making the WHERE clause conditions parameterized.
endT = endTime
startT = startTime
myList = ",".join("'" + str(i) + "'" for i in list1)
queryArgs = {'db': devDB,
'schema': dbo,
'table': table_xyz,
'columns': ','.join(column_filter)}
query = '''
WITH TIME_SERIES AS
(SELECT ROW_NUMBER() OVER (PARTITION BY LocId ORDER BY Created_Time DESC) RANK, {columns}
from {schema}.{table}
WHERE s_no in ? AND
StartTime >= ? AND
EndTime <= ? )
SELECT {columns} FROM TIME_SERIES WHERE RANK = 1
'''.format(**queryArgs)
args = (myList, startT, endT)
return self.read(query, args)
The below is my read which connects to the DB to fetch records and a condition is also added to check if its parameterized or not.
def read(self, query, parameterValues = None):
cursor = self.connect(cursor=True)
if parameterValues is not None:
rows = cursor.execute(query, parameterValues)
else:
rows = cursor.execute(query)
df = pd.DataFrame.from_records(rows.fetchall())
if len(df.columns) > 0:
df.columns = [x[0] for x in cursor.description]
cursor.close()
return df
The query args are getting picked up but not the parameterized values. In my case, it is going inside the read method with parameter values of (myList, startT ,endT) as a tuple. The query in WHERE clause remains unchanged (parameters not able to replace ? ), and as a result I am not able to fetch any records. Can you specify where I might be going wrong?
I'm using Python 2 and have the following code:
with conn.cursor() as cursor:
info("Updating {} records".format(len(records_to_update)))
for record in records_to_update:
query = "UPDATE my_table SET "
params_setters = []
# Process all fields except wsid when updating
for index, header in enumerate(DB_COLUMNS_IN_ORDER[1:]):
if record[index] is not None:
params_setters.append("{} = '{}' ".format(header, record[index]))
query += " , ".join(params_setters)
query += " WHERE id = '{}'".format(record[0])
cursor.execute(query)
How can I use query params for escaping here and not have to do it manually in places like:
params_setters.append("{} = '{}' ".format(header, record[index]))
If I understand your question, you want to use a prepared statement. If you are using a driver where %s is used to represent a query parameter (SQLite uses ?), then:
with conn.cursor() as cursor:
info("Updating {} records".format(len(records_to_update)))
params = []
for record in records_to_update:
query = "UPDATE my_table SET "
params_setters = []
# Process all fields except wsid when updating
for index, header in enumerate(DB_COLUMNS_IN_ORDER[1:]):
if record[index] is not None:
params_setters.append("{} = %s ".format(header))
params.append(record[index])
query += " , ".join(params_setters)
query += " WHERE id = %s"
params.append(record[0])
cursor.execute(query, params)
I am querying data from bigquery using get_data_from_bq method mentioned below in a loop:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=job_config)
return query_job.result()
While for the first query(iteration) data returned is correct, all the subsequent queries are throwing the below-mentioned exception
results = query_job.result()
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
super(QueryJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f
Edit 1:
Below is a sample query which is throwing the above exception. Also, this is running smoothly in bigquery console.
select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;
I had the exact same issue. The problem is not the query itself, it's that you are most likely reusing the same QueryJobConfig. When you perform a query, unless you set a destination, BigQuery stores the result in an anonymous table which is stated in the QueryJobConfig object. If you reuse this configuration, BigQuery tries to store the new result in the same anonymous table, hence the error.
I don't particularly like this behaviour, to be honest.
You should rewrite your code like that:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=QueryJobConfig())
return query_job.result()
Hope this helps!
Edited:
Federico Bertola is correct on the solution and the temporary table that is written to by BigQuery see this link.
I did not get an error with my sample code querying from a public table last time, but I can reproduce the error today, so it is possible this symptom can appear intermittent. I can confirm the error is resolved with Federico’s suggestion.
You can get the “super(QueryJob, self).result(timeout=timeout)” error when the query string lacks quotes around the parameters in the query. It seems you have made a similar mistake with the parameter format_strings in your query. You can fix this problem by ensuring there is quotes escaped around the parameter:
(" + myparam + ")
, should be written as
(\"" + myparam + "\")
You should examine your query string where you use parameters, and start with a simpler query such as
select productId, eventType, count(*) as count from `xyz:xyz.abc`
, and grow your query as you go.
For the record, here is what worked for me:
from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig()
def get_data_from_bq(myparam):
query = "SELECT word, SUM(word_count) as count FROM `publicdata.samples.shakespeare` WHERE word IN (\""+myparam+"\") GROUP BY word;"
query_job = client.query(query, job_config=job_config)
return query_job.result()
mypar = "raisin"
x = 1
while (x<9):
iterator = get_data_from_bq(mypar)
print "==%d iteration==" % x
x += 1
Today i am using the sql_debug(True) that helps me to see the queries but without the values.
How could i see how ponyorm translate the query with values ?
Thank you very much.
This is an example of query i'm using.
with db_session:
access = select(p for p in Access if raw_sql('( lower(first_name) = lower($first_name) and lower(last_name) = lower($last_name) ) '
'or ( lower(first_name) = lower($last_name) and lower(last_name) = lower($first_name) ) '
'or (lower(facebook_url) = lower($facebook_url)) '
'or (lower(twitter_url) = lower($twitter_url)) '
'or (lower(linkedin_url) = lower($linkedin_url)) '))
.order_by(desc(Access.twitter_url),desc(Access.facebook_url),desc(Access.linkedin_url),
desc(Access.facebook_url))
print(access.get_sql())
I use
logging.getLogger(__name__).debug('SQL:\n\n\t\t\t%s\n', '\n'.join(unicode(x) for x in request._construct_sql_and_arguments()[:2]).replace('\n', '\n\t\t\t'))
for that.
For example,
19:30:01.902 data.py:231 [DEBUG] SQL:
SELECT "x"."_id", "x"."filename", "x"."_created", "x"."_updated"
FROM "reports" "x"
WHERE "x"."_id" <= ?
AND "x"."_created" >= ?
(50, '2019-04-17 19:30:01.900028')
will be printed out.
You can use set_sql_debug(debug=True, show_values=True).
Reference here.
There is a method called get_sql()
query_obj = select(c for c in Category if c.name.startswith('v'))
sql = query_obj.get_sql()
print(sql)
output:
SELECT "c"."id", "c"."name"
FROM "category" "c"
WHERE "c"."name" LIKE 'v%%'
code continue:
for obj in query_obj:
print('id:', obj.id, 'name:', obj.name)
output:
id: 1 name: viki
here is a link to the docs https://docs.ponyorm.com/api_reference.html#Query.get_sql
You can log the sql or simply print it.
Update:
OP updated the question:
If the sql query has a variable like $name it is passed as a sql parameter.
first_name = 'viki'
query = select(c for c in Category if raw_sql('( lower(name) = lower($first_name))'))
query.get_sql()
so get_sql() will return the value with a placeholder, and the output will look like this:
'SELECT "c"."id", "c"."name", "c"."age"\nFROM "Category" "c"\nWHERE ( lower(name) = lower(?))'
If we want no placeholders should be there in the query then we can avoid passing direct sql to query and instead build it separately in python.
Like this:
query = select(c for C in Category if c.name == 'viki')
query.get_sql()
output:
'SELECT "c"."id", "c"."name", "c"."age"\nFROM "Category" "c"\nWHERE "c"."name" = \'viki\''