Bad Request error while querying data from bigquery in a loop - python

I am querying data from bigquery using get_data_from_bq method mentioned below in a loop:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=job_config)
return query_job.result()
While for the first query(iteration) data returned is correct, all the subsequent queries are throwing the below-mentioned exception
results = query_job.result()
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 2415, in result
super(QueryJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/cloud/bigquery/job.py", line 660, in result
return super(_AsyncJob, self).result(timeout=timeout)
File "/home/ishank/.local/lib/python2.7/site-packages/google/api_core/future/polling.py", line 120, in result
raise self._exception
google.api_core.exceptions.BadRequest: 400 Cannot explicitly modify anonymous table xyz:_bf4dfedaed165b3ee62d8a9efa.anon1db6c519_b4ff_dbc67c17659f
Edit 1:
Below is a sample query which is throwing the above exception. Also, this is running smoothly in bigquery console.
select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in ("168561","175936","161684","161681","161686") and eventTime > CAST("2018-05-30 11:21:19" as DATETIME) group by eventType, productId order by productId;

I had the exact same issue. The problem is not the query itself, it's that you are most likely reusing the same QueryJobConfig. When you perform a query, unless you set a destination, BigQuery stores the result in an anonymous table which is stated in the QueryJobConfig object. If you reuse this configuration, BigQuery tries to store the new result in the same anonymous table, hence the error.
I don't particularly like this behaviour, to be honest.
You should rewrite your code like that:
def get_data_from_bq(product_ids):
format_strings = ','.join([("\"" + str(_id) + "\"") for _id in product_ids])
query = "select productId, eventType, count(*) as count from [xyz:xyz.abc] where productId in (" + format_strings + ") and eventTime > CAST(\"" + time_thresh +"\" as DATETIME) group by eventType, productId order by productId;"
query_job = bigquery_client.query(query, job_config=QueryJobConfig())
return query_job.result()
Hope this helps!

Edited:
Federico Bertola is correct on the solution and the temporary table that is written to by BigQuery see this link.
I did not get an error with my sample code querying from a public table last time, but I can reproduce the error today, so it is possible this symptom can appear intermittent. I can confirm the error is resolved with Federico’s suggestion.
You can get the “super(QueryJob, self).result(timeout=timeout)” error when the query string lacks quotes around the parameters in the query. It seems you have made a similar mistake with the parameter format_strings in your query. You can fix this problem by ensuring there is quotes escaped around the parameter:
(" + myparam + ")
, should be written as
(\"" + myparam + "\")
You should examine your query string where you use parameters, and start with a simpler query such as
select productId, eventType, count(*) as count from `xyz:xyz.abc`
, and grow your query as you go.
For the record, here is what worked for me:
from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.QueryJobConfig()
def get_data_from_bq(myparam):
query = "SELECT word, SUM(word_count) as count FROM `publicdata.samples.shakespeare` WHERE word IN (\""+myparam+"\") GROUP BY word;"
query_job = client.query(query, job_config=job_config)
return query_job.result()
mypar = "raisin"
x = 1
while (x<9):
iterator = get_data_from_bq(mypar)
print "==%d iteration==" % x
x += 1

Related

Populating a QTableWidget in a PyQt5 GUI with a result returned from a stored procedure in MySQL

I have made a GUI in PyQt5 that allows you to deal with a database. There is an insert button which allows you to insert data into a database and then using a stored procedure whose parameter is a MySQL query in string format, it passes a select query to the stored procedure whose where clause consists of values just entered.
`
def insert(self):
try:
self.table.setRowCount(0)
QEmpID = self.lineEmpID.text() + "%"
QFName = self.lineFName.text() + "%"
QLName = self.lineLName .text() + "%"
QSalary = self.lineSalary.text() + "%"
QTask = self.lineTask.text() + "%"
mydb = mc.connect(host="localhost",username="root",password="",database="Office")
mycursor = mydb.cursor()
selectQuery = "SELECT * From Employee WHERE EmpID like '{}' and FirstName like '{}' and LastName like '{}' and Salary like '{}' and Task like '{}'".format(QEmpID, QFName,QLName,QSalary,QTask)
QEmpID = self.lineEmpID.text()
QFName = self.lineFName.text()
QLName = self.lineLName.text()
QSalary = self.lineSalary.text()
QTask = self.lineTask.text()
insertQuery = "INSERT INTO Employee Values({},'{}','{}',{},'{}')".format(QEmpID,QFName, QLName, QSalary, QTask)
mycursor.execute(insertQuery)
mydb.commit()
insertResult = mycursor.fetchall()
mycursor.callProc('fetchData',[selectQuery])
for result in mycursor.stored_results():
selectResult = result.fetchall()
for row_number,row_data in enumerate(selectResult):
self.table.insertRow(row_number)
for column_number,data in enumerate(row_data):
self.table.setItem(row_number,column_number,QTableWidgetItem(str(data)))
except mc.Error as e:
print(e)
The above is my python code for the insert function which is then connected to the insert button.
`
DELIMITER $$
CREATE DEFINER=`root`#`localhost` PROCEDURE `fetchData`(in query1 varchar(1000))
begin
set #q = query1;
PREPARE stmt from #q;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
end$$
DELIMITER ;
The above is my stored procedure which executes a query passed to it in string format.
However, when I type in the record to be inserted into the fields and then press Insert, the following shows up without any tracebacks or error reports in the IDLE Shell:
The thing is, the record does get inserted into the database and I think the issue is with the calling of stored procedure with a select query passed to it and whose result can then be populated into the QTableWidget.
I can't think of anything right now. Help is needed.
Thank you!

Why Sqlite3 Python library is giving different result of Group by Query than the original one?

I am having a weird issue. I am using Sqlite for storing Stock data and a native sqlite3 library to interact with it. When I run on TablePlus it returns the following:
But when I run the code it returns the following:
I am unable to figure out why is it happening. The count is correct but it is pulling the wrong symbols.
The code that is doing all this is given below:
def fetch_data_by_date(day):
results = None
conn = get_connection()
print(day)
sql = "select DISTINCT(created_at), symbol,count(stock_id) as total from mentions" \
" inner join symbols on symbols.id = mentions.stock_id WHERE DATE(created_at) = '{}'" \
" group by symbol order by total DESC LIMIT 5".format(day)
print(sql)
cursor = conn.cursor()
cursor.execute(sql)
results = cursor.fetchall()
return results

How to write a mysql prepared SELECT statement in python 3 using LIMIT?

idValue = 100
limitValue = 10000
query = "SELECT count(*) as count FROM oneTable WHERE id = (%s) LIMIT (%s)";
cursor.execute(query, (idValue, limitValue ))
This doesn't seem to be working. It fetches only 1 record corresponding to the id.
I think that this should work as you want. If you print result, you can see result of your query.
idValue = 100
limitValue = 10000
query = "SELECT count(*) as count FROM oneTable WHERE id = {0} limit {1}".format(idValue,limitValue)
cursor.execute(query)
result = cursor.fetchall()
The original query in the question does work. There was some other bug in my code and hence it ceased to work then. Thanks #Baran

Get MSSQL table column names using pyodbc in python

I am trying to get the mssql table column names using pyodbc, and getting an error saying
ProgrammingError: No results. Previous SQL was not a query.
Here is my code:
class get_Fields:
def GET(self,r):
web.header('Access-Control-Allow-Origin', '*')
web.header('Access-Control-Allow-Credentials', 'true')
fields = []
datasetname = web.input().datasetName
tablename = web.input().tableName
cnxn = pyodbc.connect(connection_string)
cursor = cnxn.cursor()
query = "USE" + "[" +datasetname+ "]" + "SELECT COLUMN_NAME,* FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = " + "'"+ tablename + "'"
cursor.execute(query)
DF = DataFrame(cursor.fetchall())
columns = [column[0] for column in cursor.description]
return json.dumps(columns)
how to solve this?
You can avoid this by using some of pyodbc's built in methods. For example, instead of:
query = "USE" + "[" +datasetname+ "]" + "SELECT COLUMN_NAME,* FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = " + "'"+ tablename + "'"
cursor.execute(query)
DF = DataFrame(cursor.fetchall())
Try:
column_data = cursor.columns(table=tablename, catalog=datasetname, schema='dbo').fetchall()
print(column_data)
That will return the column names (and other column metadata). I believe the column name is the fourth element per row. This also relieves the very valid concerns about SQL injection. You can then figure out how to build your DataFrame from the resulting data.
Good luck!
Your line
query = "USE" + "[" +datasetname+ "]" + "SELECT COLUMN_NAME,*...
Will produce something like
USE[databasename]SELECT ...
In SSMS this would work, but I'd suggest to look on proper spacing and to separate the USE-statement with a semicolon:
query = "USE " + "[" +datasetname+ "]; " + "SELECT COLUMN_NAME,*...
Set the database context using the Database attribute when building the connection string
Use parameters any time you are passing user input (especially from HTTP requests!) to a WHERE clause.
These changes eliminate the need for dynamic SQL, which can be insecure and difficult to maintain.

PyMYSQL - Select Where Value Can Be NULL

I'm using PyMYSQL to query data from a MySQL database. An example query that I want to use is:
SELECT count(*) AS count
FROM example
WHERE item = %s
LIMTI 1
which would be run using
query = "SELECT COUNT(*) as count FROM example WHERE item = %s LIMIT 1"
cur.execute(query,(None))
or
cur.execute(query,(10))
The issue is that item can have a value of NULL which results in
WHERE item = NULL
Which doesn't work in MySQL. Rather it should read
WHERE item IS NULL
However the value can be integer also. How can I make PyMySQL adjust the query to handle both situations?
you can make a simple function to catch the None situation:
def null_for_sql(a):
return "is NULL" if not a else "= " + str(a)
and then call the query like that:
query = "SELECT COUNT(*) as count FROM example WHERE item %s LIMIT 1"
cur.execute(query,null_for_sql(None))
Note there is no "=" after "item" in the query
Edit after comments:
def sql_none_safe_generator(query,parameter):
if parameter:
return query.replace(%param, "= " + str(parameter))
else:
return query.replace(%param, is NULL)
query = "SELECT COUNT(*) as count FROM example WHERE item %param LIMIT 1"
cur.execute(sql_none_safe_generator(query,a))

Categories