I am trying to query Google BigQuery using the Pandas/Python client interface. I am following the tutorial here: https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas. I was able to get it to work but I want to query the data as the JSON format that can be downloaded directly from the WebUI (see screenshot). Is there a way to download data as the JSON structure pictured instead of converting it to the data frame object?
I imagine the command would be somewhere around this part of the code from the tutorial:
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(bqstorage_client=bqstorageclient)
)
Just add .to_json(orient='records') call after converting to dataframe:
json_data = bqclient.query(query_string).result().to_dataframe(bqstorage_client=bqstorageclient).to_json(orient='records')
pandas docs
Related
Suppose I have list of API's like the following...
https://developer.genesys.cloud/devapps/api-explorer#get-api-v2-alerting-alerts-active
https://developer.genesys.cloud/devapps/api-explorer#get-api-v2-alerting-interactionstats-rules
https://developer.genesys.cloud/devapps/api-explorer#get-api-v2-analytics-conversations-details
I want to read this API's one by one and store the data to snowflake using Pandas and SQLalchemy.
Do you have any ideas for reading the API's one by one in my python script?
-Read the API's one by one from a file.
Load the data to snowflake table directly.
I need to do an exploratory analysis using python over two tables that are in a Google BigQuery database.
The only thing I was provided is a JSON file containing some credentials.
How can I access the data using this JSON file?
This is the first time I try to do something like this, so I have no idea on how to do it.
I tried reading different tutorials and documentations, but nothing worked.
I'm a Ruby dev doing a lot of data work that's decided to switch to Python. I'm enjoying making the transition so far and have been blown away by Pandas, Jupyter Notebooks etc.
My current task is to write a lightweight RESTful API that under the hood is running queries against Google BigQuery.
I have a really simple test running in Flask, this works fine, but I did have trouble rendering the BigQuery response as JSON. To get around this, I used Pandas and then converted the dataframe to JSON. While it works, this feels like an unnecessary step and I'm not even sure this is a legitimate use case for Pandas. I have also read that creating a dataframe can be slow as data volume increases.
Below is my little mock up test in Flask. It would be really helpful to hear from experienced Python Devs how you'd approach this and if there are any other libraries I should be looking at here.
from flask import Flask
from google.cloud import bigquery
import pandas
app = Flask(__name__)
#app.route("/bq_test")
def bq_test():
client = bigquery.Client.from_service_account_json('/my_creds.json')
sql = """select * from `my_dataset.my_table` limit 1000"""
query_job = client.query(sql).to_dataframe()
return query_job.to_json(orient = "records")
if __name__ == "__main__":
app.run()
From the BigQuery documentation-
BigQuery supports functions that help you retrieve data stored in
JSON-formatted strings and functions that help you transform data into
JSON-formatted strings:
JSON_EXTRACT or JSON_EXTRACT_SCALAR
JSON_EXTRACT(json_string_expr, json_path_string_literal), which returns JSON values as STRINGs.
JSON_EXTRACT_SCALAR(json_string_expr, json_path_string_literal), which returns scalar JSON values as STRINGs.
Description
The json_string_expr parameter must be a JSON-formatted string. ...
The json_path_string_literal parameter identifies the value or values you want to obtain from the JSON-formatted string. You construct this parameter using the JSONPath format.
https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions
In my python project, I need to fill a bigquery table with a relational dataframe. I'm having a lot of trouble at creating a new table from scratch and being sure that the first data I upload to it are actually put into the table.
I've read the page https://cloud.google.com/bigquery/streaming-data-into-bigquery#dataconsistency and have seen that applying a insertId to the insert query would solve the problem, but since I use pandas's dataframes, the function to_gbq of the pandas-gbq package seems to be perfect for this task. Yet, when using to_gbq function and a new table is created/replaced, sometimes (apparently randomly) the first data chunk is not written into the table.
Does anybody know how to ensure the complete insertion of a DataFrame into a bigquery new created table? Thanks
I believe you are encountering https://github.com/pydata/pandas-gbq/issues/75. Basically, Pandas using the BigQuery streaming API to write data into tables, but the streaming API has a delay after table creation to when it starts working.
Edit: Version 0.3.0 of pandas-gbq fixes this issue by using a load job to upload data frames to BigQuery instead of streaming.
In the meantime, I'd recommend using a "load job" to create the tables. For example, using the client.load_table_from_file method in the google-cloud-bigquery package.
from google.cloud.bigquery import LoadJobConfig
from six import StringIO
destination_table = client.dataset(dataset_id).table(table_id)
job_config = LoadJobConfig()
job_config.write_disposition = 'WRITE_APPEND'
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
rows = []
for row in maybe_a_dataframe:
row_json = row.to_json(force_ascii=False, date_unit='s', date_format='iso')
rows.append(row_json)
body = StringIO('{}\n'.format('\n'.join(rows)))
client.load_table_from_file(
body,
destination_table,
job_config=job_config).result()
Edit: This code sample fails for columns containing non-ASCII characters. See https://github.com/pydata/pandas-gbq/pull/108
I have a function in Python, defined in an API from my broker, "getCalendar" which recieves a list of news announcements and expected impact on market. how can I transform this list which arrives as a JSON object to a pandas dataframe, so I can analyze it?
P.S.: the API is a connection to a server, which is first established and only then can data be streamed from there, so using url address and converting that to pandas dataframe is not possible.
Thanks in advance for your help.
Seems your getCalendar is writing stuff to stdout and you first need to capture that into a string variable. Use this post solution to capture: Can I redirect the stdout in python into some sort of string buffer?
(e.g. write a wrapper like getCalendarStdout() used below)
Once you got the json output into a variable (say calendar), try this:
calendar = getCalendarStdout()
import json
import pandas as pd
data=json.loads(calendar)
dafr = pd.DataFrame(data,columns=['col1','col2'])
Here we are trying to get only certain fields from the json output into the dafr DataFrame. If you can paste the calendar json data (part of it) people can help you get desired dataframe.