How to retrieve numRows, resultSize from query in BigQuery - python

Is it possible to retrieve the total number of rows that a query has returned without downloading all the results? For example, here is what I'm currently doing:
client = bigquery.Client()
res = client.query("SELECT funding_round_type FROM `investments`")
results = res.result()
>>> results.num_results
0
>>> records = [_ for _ in results]
>>> results.num_results
168647
In other words, without downloading the results, I cannot get the numResults. Is there another way to get the total number of results / number of MB in the resultant query set without having to download all the data?

Result of any query is stored in so called anonymous table. You can retrieve the reference to this table using jobs.get API. and then you can use tables.get API to retrieve info about that table - rows and size in particular. For example, in python:
>>> table = client.get_table(res.destination)
>>> print (table.num_rows, table.num_bytes)
168647 1451831

Related

I Made Tow function in pymongo but the out but that i want is different from i get from the function any ideas how can i fix it?

Funtion that save Close,Symbol, Timeframe
def Save_(self,collection,symbol,price,TF):
db = self.get_db('MTF')[collection]
B = {'ts':time.time(),"Symbol":symbol,
"Price":price,'TimeFrame':TF}
data = db.insert_one(B)
return data
Function to get data from mongodb
def find_all(self,collection):
db = self.get_db('MTF')[collection]
Symbols ={}
data = db.find({})
for i in data:
Symbols[i['Symbol']] = [i['Price'],i['TimeFrame']]
return Symbols
images from mongodb
[2]: https://i.stack.imgur.com/RLtnz.png
images from B Function
[1]: https://i.stack.imgur.com/AtwSy.png
if u see the image from Function B only gave me on timeframe but Function Save have 4 timeframe
Looking at this loop:
for i in data:
Symbols[i['Symbol']] = [i['Price'],i['TimeFrame']]
If you have the same Symbol coming from MongoDB, it will overwrite any previous value, so you will only get the final value for each Symbol which is what you are seeing.
To fix it you have a few options: you could check the key and either create or append the values to Symbols; or you could use $push in an aggregate query.

Making lists store all data of the loop and not only last one

I want to store the JSON I get from an API, but only get the JSON of the last loop. How to get the lists dynamic? Also I need to use the last query (Pandas) but it's not working.
Last how to make an API to :
List latest forecast for each location for every day.
List average the_temp of last 3 forecasts for each location for every day.
Get the top n locations based on each available metric where n is a parameter given in the API call.
import requests
import json
import sqlite3
import pandas as pd #library for data frame
print(sqlite3.sqlite_version)
for x in range(20,28): # i need to get LONDjson/BERLjson/SANjson lists dynamic to store bot 7 jsons from each urls
r = requests.get('https://www.metaweather.com/api/location/44418/2021/4/'+str(x)+'/') #GET request from the source url
LONDjson=r.json() #JSON object of the result
r2 = requests.get('https://www.metaweather.com//api/location/2487956/2021/4/'+str(x)+'/')
SANjson=r2.json()
r3 = requests.get('https://www.metaweather.com//api/location/638242/2021/4/'+str(x)+'/')
BERLjson=r3.json()
conn= sqlite3.connect('D:\weatherdb.db') #create db in path
cursor = conn.cursor()
#import pprint
#pprint.pprint(LONDjson)
cursor.executescript('''
DROP TABLE IF EXISTS LONDjson;
DROP TABLE IF EXISTS SANjson;
DROP TABLE IF EXISTS BERLjson;
CREATE TABLE LONDjson (id int, data json);
''');
for LOND in LONDjson:
cursor.execute("insert into LONDjson values (?, ?)",
[LOND['id'], json.dumps(LOND)])
conn.commit()
z=cursor.execute('''select json_extract(data, '$.id', '$.the_temp', '$.weather_state_name', '$.applicable_date' ) from LONDjson;
''').fetchall() #query the data
hint: in your initial for loop you are not storing the results of api call. you are storing in variable but that is just getting re-written each loop.
a common solution for this starting with empty list that you append to. where perhaps if storing mutliple variables you are storing a dictionary as elements of list
example
results = []
for x in range(10):
results.append(
{
'x': x,
'x_sqaured': x*x,
'abs_x': abs(x)
}
)
print(results)
It looks like there's at least two things that can be improved in the data manipulation part of your code.
Using an array to store the retrieved data
LONDjson = []
SANjson = []
BERLjson = []
for x in range(20,28):
r = requests.get('https://www.metaweather.com/api/location/44418/2021/4/'+str(x)+'/')
LONDjson.append(r.json())
r2 = requests.get('https://www.metaweather.com//api/location/2487956/2021/4/'+str(x)+'/')
SANjson.append(r2.json())
r3 = requests.get('https://www.metaweather.com//api/location/638242/2021/4/'+str(x)+'/')
BERLjson.append(r3.json())
Retrieving the data from the array
# The retrieved data is a dictionary inside a list with only one entry
for LOND in LONDjson:
print(LOND[0]['id'])
Hope this helps you out.

How to get column names in a SQLAlchemy query?

I have a function (remote database, without models in my app), I am making a request to it. Is it possible to get the column names using query rather than execute?
session = Session(bind=engine)
data = session.query(func.schema.func_name())
I am getting an array of strings with values, how do I get the keys? I want to generate a dict.
When I make a request with an execute, the dictionary is generated fine.
data = session.execute("select * from schema.func_name()")
result = [dict(row) for row in data]
You can do something like:
keys = session.execute("select * from schema.func_name()").keys()
Or try accessing it after the query:
data = session.query(func.schema.func_name()).all()
data[0].keys()
You can also use: data.column_descriptions
Documention:
https://docs.sqlalchemy.org/en/14/orm/query.html

How to update a parameter in query (python + bigquery)

I am trying to make multiple calls to export a large data set form Bigquery into csv, via python. (e.g. 0-10000th row, 10001th-20000th row etc). But I am not sure how to set a dynamic param correctly. i.e. keep updating a and b.
The reason why I need to put the query into a loop is because the dataset is too big for a one time extraction.
a = 0
b = 10000
while a <= max(counts): #i.e. counts = 7165920
query = """
SELECT *
FROM `bigquery-public-data.ethereum_blockchain.blocks`
limit #a, #b
"""
params = [
bigquery.ScalarQueryParameter('a', 'INT', a),
bigquery.ScalarQueryParameter('b', 'INT', b) ]
query_job = client.query(query)
export_file = open("output.csv","a")
output = csv.writer(export_file, lineterminator='\n')
for rows in query_job:
output.writerow(rows)
export_file.close()
a = b +1
b = b+b
For a small data set without using a loop, I am able to get the output without any params (I just limit to 10 but that is for a single pull).
But when I tried the above method, I keep getting errors.
Suggestion of another approach
To export a table
As you want to export the whole content of the table as a CSV, I would advise you to use an ExtractJob. It is meant to send the content of a table to Google Cloud Storage, as a CSV or JSON. Here's a nice example from the docs:
destination_uri = 'gs://{}/{}'.format(bucket_name, 'shakespeare.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)
extract_job = client.extract_table(
table_ref,
destination_uri,
# Location must match that of the source table.
location='US') # API request
extract_job.result() # Waits for job to complete.
print('Exported {}:{}.{} to {}'.format(
project, dataset_id, table_id, destination_uri))
For a query
Pandas has a read_gbq function to load the result of a query in a DataFrame. If the result of the query fits in memory, you could use this then call to_csv() on the resulting DataFrame. Be sure to install the pandas-gbq package to do this.
If the query result is too big, add a destination to your QueryJobConfig, so it writes to Google Cloud Storage.
Answer to your question
You could simply use string formatting:
query = """
SELECT *
FROM `bigquery-public-data.ethereum_blockchain.blocks`
WHERE some_column = {}
LIMIT {}
"""
query_job = client.query(query.format(desired_value, number_lines))
(This places desired_value in the WHERE and number_lines in the LIMIT)
If you want to use the scalar query parameters you'll have to create a job config:
my_config = bigquery.job.QueryJobConfig()
my_config.query_parameters = params # this is the list of ScalarQueryParameter's
client.query(query, job_config=my_config)

Get all rows in Json format with API-Rest Cassandra

I have the following code that allows me to retrieve the first keyspace:
def Query(str):
auth_provider = PlainTextAuthProvider(username='admin', password='root')
cluster = Cluster(['hostname'], auth_provider=auth_provider)
session = cluster.connect('system')
rows = session.execute(str)
keyspaces = []
row_list = list(rows)
for x in range(len(row_list)):
return row_list[0]
#app.route('/keyspaces')
def all():
return Query('select json * from schema_keyspaces')
I would like not only get all the keyspaces, but also their attributes and that in JSON document, how I can proceed ?
Thanks,
Instead of a loop that only runs once, you need to collect all the elements
rows = session.execute(str)
return jsonify(list(rows))
Note that you should ideally not be creating a new cassandra connection for each query you need to make, but that's unrelated to the current problem

Categories