403 Request Failure Despite working Service Account google.oauth2 - python

I am consistently running into problems querying in python using the following libraries. I am given a 403 error, that the "user does not have 'bigquery.readsessions.create' permissions for the project I am accessing.
#BQ libs
from google.cloud import bigquery
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file('path.json')
#BigQuery Connection and query execution
bqProjectId = 'poject_id'
project_id = bqProjectId
client = bigquery.Client(credentials= credentials,project=project_id)
query = client.query("SELECT * FROM `table`")
Output = Query.to_dataframe()
I am using the same service account json file, and same query in Java, R, and even on a BI tool. All three successfully retreived the data. So this seems to be python specific.
I have tried starting with a clean environment. I even reinstalled anaconda. Nothing seems to work. What are some possible culprits here?
*Obviously my path, query, and creds are different for that actual script.

You can try the below code by including access scope https://www.googleapis.com/auth/cloud-platform for your requirement.
from google.cloud import bigquery
from google.oauth2 import service_account
key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
project_id = "project-id"
client = bigquery.Client(
credentials=credentials,
project=credentials.project_id,
)
sql_query ="SELECT * FROM table"
query_job = client.query(sql_query)
results = query_job.result()
df = results.to_dataframe()
print (df)
As per the error message you are getting, service account is missing the BigQuery Admin Role which includes the permission bigquery.readsessions.create.
For more information regarding BigQuery IAM roles you can refer to this document.

Related

Vertex AI scheduled notebook doesn't work, though working manually

There is a scheduled notebook, that uses BigQuery client and service account with Owner rights. When I run the cells manually, it makes an update to BQ table. There is one project for both BQ and Vertex AI.
I've found a similar question, but there is no output in bucket folder:
Google Cloud Vertex AI Notebook Scheduled Runs Aren't Running Code?
In schedules section this notebook is stuck on Initializing:
Here's the notebook:
Update: I've tried to schedule cells one by one, and all of the stuck attempts cannot get through BigQuery:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'dialogflow-293713-f89fd8f4ed2d.json'
bigquery_client = bigquery.Client()
QUERY = f"""
INSERT `dialogflow-293713.chats.Ежедневная сводка маркетплейса` (date, effectiveness, operatorWorkload)
VALUES({period}, {effectiveness}, {redirectedToSales}, {operatorWorkload})
"""
Query_Results = bigquery_client.query(QUERY)
This way of authorization worked!
from google.cloud import bigquery
from google.oauth2 import service_account
import json
raw_credential = { "dictionary. copy the dict elements of your credential.json file" }
service_account_info = json.loads(json.dumps(raw_credential))
credentials = service_account.Credentials.from_service_account_info(service_account_info)
client = bigquery.Client(credentials=credentials)
query = """ Your Query """
df = client.query(query).to_dataframe()
#see some results. remove if its not needed.
print(df.head())
# OPTIONAL: If you want to move data to a google cloud storage bucket
from google.cloud import storage
client = storage.Client()
bucket_name = 'my-bucket-id'
bucket = client.get_bucket(bucket_name)
# if folder `output` does not exist it will be created. You can use the name as you want.
bucket.blob("output/output.csv").upload_from_string(df.to_csv(), 'text/csv')
Resolved on Issue Tracker in this thread.

How to run a bigquery SQL query in python jupyter notebook

I try to run SQL queries from Google BigQuery in the Jupyter notebook.
I do everything as written here https://cloud.google.com/bigquery/docs/bigquery-storage-python-pandas#download_query_results_using_the_client_library.
I opened a Client Account and download the JSON file.
Now I try to run the script :
from google.cloud import bigquery
bqclient = bigquery.Client('c://folder/client_account.json')
# Download query results.
query_string = """
SELECT * from `project.dataset.table`
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())
But I keep getting an error:
DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started
I do not understand what I am doing wrong, because the JSON file looks fine and the path to the file is correct.
The error suggests that your GCP environment is not able to identify and configure the required application credentials.
To authenticate using service account follow the below approach :
from google.cloud import bigquery
from google.oauth2 import service_account
# TODO(developer): Set key_path to the path to the service account key
# file.
key_path = "path/to/service_account.json"
credentials = service_account.Credentials.from_service_account_file(
key_path, scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
bqclient = bigquery.Client(credentials=credentials, project=credentials.project_id,)
query_string = """
SELECT * from `project.dataset.table`
"""
dataframe = (
bqclient.query(query_string)
.result()
.to_dataframe(
# Optionally, explicitly request to use the BigQuery Storage API. As of
# google-cloud-bigquery version 1.26.0 and above, the BigQuery Storage
# API is used by default.
create_bqstorage_client=True,
)
)
print(dataframe.head())

BigQuery: Permission denied while getting Drive credentials - Unable to resolve the error

I was hoping to get some help with this error code I have been coming across.
Context:
The company I work for use the GSUITE product.
My team have their own Cloud Project setup.
Google Drive isn't a "personal" drive.
We utilise Airflow to refresh our BigQuery tables on a
daily/weekly/monthly basis.
I have followed these solutions
Access Denied: Permission denied while getting Drive credentials
"Encountered an error while globbing file pattern" error when using BigQuery API w/ Google Sheets
And also referenced
https://cloud.google.com/bigquery/external-data-drive#python_3
Problem
Cloud Composer : v 1.12.0
I have recently setup an external Bigquery table that reads a tab within a Google Sheet. My Airflow DAG has been failing to complete due to the access restriction to Drive.
I have added the following to the Airflow connection scopes:
airflow scopes
And also added the service account e-mail address to the Google Sheet the table is referencing via Share. I have also updated the Service account IAM roles to BigQuery admin. After following these steps, I still receive the error BigQuery: Permission denied while getting Drive credentials.
Problem2
Following the above, I found it easier to trouble shoot in local, so I created a VENV on my machine because its where im most comfortable troubleshooting. The goal is to simply query a Bigquery table that reads a Google sheet. However, after following the same above steps, I am still unable to get this to work.
My local code:
import dotenv
import pandas as pd
from google.cloud import bigquery
import google.auth
def run_BigQuery_table(sql):
dotenv.load_dotenv()
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
bigquery.Client(project, credentials)
output = pd.read_gbq(sql, project_id=project, dialect='standard')
return output
script_variable = "SELECT * FROM `X` LIMIT 10"
bq_output = run_BigQuery_table(script_variable)
print(bq_output)
My error:
raise self._exception
google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied > while getting Drive credentials.
raise GenericGBQException("Reason: {0}".format(ex))
pandas_gbq.gbq.GenericGBQException: Reason: 403 Access Denied: BigQuery BigQuery: Permission > denied while getting Drive credentials.
Is anyone able to help?
Cheers
So a colleague suggested that I explore the default pandas_gbq credentials, as this might be using default credentials to access the data.
Turns out, it worked.
You can manually set the pandas-gbq credentials by following this:
https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html
https://pandas-gbq.readthedocs.io/en/latest/api.html#pandas_gbq.Context.credentials
I simply added the following to my code
pdgbq.context.credentials = credentials
The final output:
import dotenv
import pandas as pd
from google.cloud import bigquery
import google.auth
import pandas_gbq as pdgbq
def run_BigQuery_table(sql):
dotenv.load_dotenv()
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
pdgbq.context.credentials = credentials
bigquery.Client(project, credentials)
output = pd.read_gbq(sql, project_id=project, dialect='standard')
return output
script_variable4 = "SELECT * FROM `X` LIMIT 10"
bq_output = run_BigQuery_table(script_variable3)
print(bq_output)
I often get these errors, and the vast majority were solved through creating and sharing service accounts. However I recently had a case where our gsuite administrator updated security settings so that only our employees could access gsuite related things (spreadsheets, storage etc). It was an attempt to plug a security gap, but in doing so, any email address or service account which did not have #ourcompany.com was blocked from using BigQuery.
I recommend you explore your company gsuite settings, and see if external access is blocked. I cannot say this is the fix for your case, but it was for me, so could be worth trying..

Connecting aws backend to firebase database

I'm currently running python code in my aws server and trying to connect to my friend's firebase database. I read the documentation provided by firebase to connect to aws server.
https://firebase.google.com/docs/admin/setup
I have followed every step but I'm getting an error when I try to connect to my server. I have added google-service.json for credential.
Error that I get :
ValueError: Invalid service account certificate. Certificate must
contain a "type" field set to "service_account".
Do I need to modify the google-services.json ?
My code:
import firebase_admin
from firebase_admin import credentials
cred = credentials.Certificate('/home/ec2-user/google-services.json')
#default_app = firebase_admin.initialize_app(cred)
other_app = firebase_admin.initialize_app(cred, name='other')
ault_app = firebase_admin.initialize_app()
google-services.json is typically the name of an Android app configuration file. That's not the same as a service account. To get a hold of the credentials for a service account for your project, you'll need to generate one from the Firebase console from Project Settings -> Service Accounts. The documentation is here. Once you have this file, you can initialize the Admin SDK with it to begin accessing the data in your project.
Better way would be to store credentials on s3 (encrypted) with a IAM role attached to lambda function.
import os
import firebase_admin
from firebase_admin import credentials
import boto3
from settings.local_settings import AWS_REGION, ENVIRONMENT
import json
firebase_config_file = 'app-admin-config-{}.json'.format(ENVIRONMENT)
firebase_admin_creds_file = 'app-admin-sdk-{}.json'.format(ENVIRONMENT)
current_dir = os.path.abspath(os.path.dirname(__file__))
files = [f for f in os.listdir(current_dir) if os.path.isfile(f)]
if firebase_config_file not in files and firebase_admin_creds_file not in files:
s3 = boto3.resource('s3', region_name=AWS_REGION)
bucket = s3.Bucket('app-s3-secrets')
firebase_config = json.loads(
bucket.Object('app-admin-config-{}.json'.format(ENVIRONMENT)).get()['Body'].read())
firebase_admin_creds = json.loads(
bucket.Object('app-admin-sdk-{}.json'.format(ENVIRONMENT)).get()['Body'].read().decode())
class Firebase:
#staticmethod
def get_connection():
cred = credentials.Certificate(firebase_admin_creds)
return firebase_admin.initialize_app(cred, firebase_config)
app = Firebase.get_connection()

How to create a Google BigQuery service account with access to a single dataset?

Is there any way of granting readonly access to a specific BigQuery Dataset to a given Client ID ?
I've tried using a service account, but this gives full access to all datasets.
Also tried creating a service account from a different application, and added the email address generated together with the certificate to the BigQuery > Some Dataset > Share Dataset > Can view, but this always results in a 403 "Access not Configured" error.
I'm using the server to server flow described in the documentation :
import httplib2
from apiclient.discovery import build
from oauth2client.client import SignedJwtAssertionCredentials
# REPLACE WITH YOUR Project ID
PROJECT_NUMBER = 'XXXXXXXXXXX'
# REPLACE WITH THE SERVICE ACCOUNT EMAIL FROM GOOGLE DEV CONSOLE
SERVICE_ACCOUNT_EMAIL = 'XXXXX#developer.gserviceaccount.com'
f = file('key.p12', 'rb')
key = f.read()
f.close()
credentials = SignedJwtAssertionCredentials(
SERVICE_ACCOUNT_EMAIL,
key,
scope='https://www.googleapis.com/auth/bigquery.readonly')
http = httplib2.Http()
http = credentials.authorize(http)
service = build('bigquery', 'v2')
tables = service.tables()
response = tables.list(projectId=PROJECT_NUMBER, datasetId='SomeDataset').execute(http)
print(response)
I'm basically trying to provide readonly access to an external server based application to a single dataset.
As pointed out by Fh, it is required to activate the BigQuery API in the Google Account where the service account is created, regardless of the fact that it will be querying a BigQuery endpoint bound to a different application ID.

Categories