How to run a bigquery SQL query in python jupyter notebook - python

I try to run SQL queries from Google BigQuery in the Jupyter notebook.
I do everything as written here https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#download_query_results_using_the_client_library.
I opened a Client Account and download the JSON file.
Now I try to run the script :
from google.cloud import bigquery
bqclient = bigquery.Client('c://folder/client_account.json')
# Download query results.
query_string = """
SELECT * from `project.dataset.table`
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())
But I keep getting an error:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
I do not understand what I am doing wrong, because the JSON file looks fine and the path to the file is correct.

The error suggests that your GCP environment is not able to identify and configure the required application credentials.
To authenticate using service account follow the below approach :
from google.cloud import bigquery
from google.oauth2 import service_account
# TODO(developer): Set key_path to the path to the service account key
# file.
key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
bqclient = bigquery.Client(credentials=credentials, project=credentials.project_id,)
query_string = """
SELECT * from `project.dataset.table`
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())

Related

403 Request Failure Despite working Service Account google.oauth2

I am consistently running into problems querying in python using the following libraries. I am given a 403 error, that the "user does not have 'bigquery.readsessions.create' permissions for the project I am accessing.
#BQ libs
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path.json')
#BigQuery Connection and query execution
bqProjectId = 'poject_id'
project_id = bqProjectId
client = bigquery.Client(credentials= credentials,project=project_id)
query = client.query("SELECT * FROM `table`")
Output = Query.to_dataframe()
I am using the same service account json file, and same query in Java, R, and even on a BI tool. All three successfully retreived the data. So this seems to be python specific.
I have tried starting with a clean environment. I even reinstalled anaconda. Nothing seems to work. What are some possible culprits here?
*Obviously my path, query, and creds are different for that actual script.
You can try the below code by including access scope https://www.googleapis.com/auth/cloud-platform for your requirement.
from google.cloud import bigquery
from google.oauth2 import service_account
key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
project_id = "project-id"
client = bigquery.Client(
credentials=credentials,
project=credentials.project_id,
)
sql_query ="SELECT * FROM table"
query_job = client.query(sql_query)
results = query_job.result()
df = results.to_dataframe()
print (df)
As per the error message you are getting, service account is missing the BigQuery Admin Role which includes the permission bigquery.readsessions.create.
For more information regarding BigQuery IAM roles you can refer to this document.

`DefaultCredentialsError` when attempting to import google cloud libraries in python

I am attempting to import google-cloud and big-query libraries and running into default credentials error. I have attempted to set the credentials by downloading the json file from cloud portal and specifying the path to the file.
## Google Big Query
%reload_ext google.cloud.bigquery
from google.cloud import bigquery
bqclient = bigquery.Client(project = "dat-exp")
os.environ.setdefault("GCLOUD_PROJECT", "dat-exp")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/xxxxx.json"
---------------------------------------------------------------------------
DefaultCredentialsError Traceback (most recent call last)
/tmp/ipykernel_2944/2163850103.py in <cell line: 81>()
79 get_ipython().run_line_magic('reload_ext', 'google.cloud.bigquery')
80 from google.cloud import bigquery
---> 81 bqclient = bigquery.Client(project = "dat-exp")
82 os.environ.setdefault("GCLOUD_PROJECT", "dat-exp")
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
Default Credentials (ADC) is a method of searching for credentials.
Your code is setting the environment after the client has attempted to locate credentials. That means the client failed to locate credentials before you set up credentials. A quick solution is to move the line with bigquery.Client(...) to be after the os.environ(...) lines.
os.environ.setdefault("GCLOUD_PROJECT", "dat-exp")
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/xxxxx.json"
bqclient = bigquery.Client(project = "dat-exp")
I do not recommend the method that you are using (modify the environment inside the program). Either modify the environment before the program starts or specify the credentials to use when creating the client bigquery.Client().
from google.cloud import bigquery
from google.oauth2 import service_account
key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"])
client = bigquery.Client(credentials=credentials, project='dat-exp')
Provide credentials for Application Default Credentials
However, the correct method of specifying credentials depends on where you are deploying your code. For example, applications can fetch credentials from the compute metadata service when deployed in Google Cloud.

Vertex AI scheduled notebook doesn't work, though working manually

There is a scheduled notebook, that uses BigQuery client and service account with Owner rights. When I run the cells manually, it makes an update to BQ table. There is one project for both BQ and Vertex AI.
I've found a similar question, but there is no output in bucket folder:
Google Cloud Vertex AI Notebook Scheduled Runs Aren't Running Code?
In schedules section this notebook is stuck on Initializing:
Here's the notebook:
Update: I've tried to schedule cells one by one, and all of the stuck attempts cannot get through BigQuery:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'dialogflow-293713-f89fd8f4ed2d.json'
bigquery_client = bigquery.Client()
QUERY = f"""
INSERT `dialogflow-293713.chats.Ежедневная сводка маркетплейса` (date, effectiveness, operatorWorkload)
VALUES({period}, {effectiveness}, {redirectedToSales}, {operatorWorkload})
"""
Query_Results = bigquery_client.query(QUERY)
This way of authorization worked!
from google.cloud import bigquery
from google.oauth2 import service_account
import json
raw_credential = { "dictionary. copy the dict elements of your credential.json file" }
service_account_info = json.loads(json.dumps(raw_credential))
credentials = service_account.Credentials.from_service_account_info(service_account_info)
client = bigquery.Client(credentials=credentials)
query = """ Your Query """
df = client.query(query).to_dataframe()
#see some results. remove if its not needed.
print(df.head())
# OPTIONAL: If you want to move data to a google cloud storage bucket
from google.cloud import storage
client = storage.Client()
bucket_name = 'my-bucket-id'
bucket = client.get_bucket(bucket_name)
# if folder `output` does not exist it will be created. You can use the name as you want.
bucket.blob("output/output.csv").upload_from_string(df.to_csv(), 'text/csv')
Resolved on Issue Tracker in this thread.

GCP Python Compute Engine - list VM's

I have the following Python3 script:
import os, json
import googleapiclient.discovery
from google.oauth2 import service_account
from google.cloud import storage
storage_client = storage.Client.from_service_account_json('gcp-sa.json')
buckets = list(storage_client.list_buckets())
print(buckets)
compute = googleapiclient.discovery.build('compute', 'v1')
def list_instances(compute, project, zone):
result = compute.instances().list(project=project, zone=zone).execute()
return result['items'] if 'items' in result else None
list_instances(compute, "my-project", "my-zone")
Only listing buckets without the rest works fine, that tells me that my service account (with has read access to the whole project) should work. How can I now list VM's? Using the code above, I get
raise exceptions.DefaultCredentialsError(_HELP_MESSAGE)
google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
So that tells me that I somehow have to pass the service account json. How is that possible?
Thanks!!

Avoiding DefaultCredentialsError when using google bigquery API

I'm trying to execute an SQL query on some bigquery table. I keep getting a DefaultCredentialsError when trying to instantiate a bigquery client object. For example by doing this:
from google.cloud import bigquery
client = bigquery.Client.from_service_account_json('service_account_key.json')
Or by doing this:
from oauth2client.service_account import ServiceAccountCredentials
key = open('service_account_key.json', 'rb').read()
credentials = ServiceAccountCredentials(
'my_email',
key,
scope='https://www.googleapis.com/auth/bigquery')
client = bigquery.Client(credentials=credentials)
Could there be a problem with my .json credentals file? I created a service account key:
Any other suggestions?
You are most likely hitting a bug with using the from_service_account_json method.
Instead, try using the recommended way of authenticating by exporting the GOOGLE_APPLICATION_CREDENTIALS environment variable as decribed here.

Categories