Query shared BigQuery dataset with Python - python

I want to extract data from a BigQuery shared dataset. The dataset was shared with me through my gmail account, but I created credentials for my GCS Project. Then, I am having an error since these credentials are unassociated to my gmail account. I need credentials for my gmail account so I can query the shared dataset with Python BigQuery.
I created the client for my Project (but the shared dataset is in another Project):
client = bigquery.Client(credentials= credentials,project=project_id)
And then run a query getting this error:
"Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied for table:..."

Related

Python connection to google big query using ADC

I am trying to get data from Google big query table using python. I dont have a service account access,but i have individual access to bigquery using gcloud. i have application default credentials Json file. I need to how to make a connection to bigquery usinG ADC.
code snippet:
from google.cloud import bigquery
conn=bigquery.Client()
query="select * from my_data.test1"
conn.query(query)
When i run above code snippet i am getting error saying:
NewConnectionError: <urllib3.connection.HttpsConnection object at 0x83dh46bdu640>: Failed to establish a new connection:[Error -2] Name or Service not known
Note: ENVIRONMENT Variable GOOGLE APPLICATION CREDENTIALS is not set and empty
Your script works for me because I authenticated using end user credentials from Google Cloud SDK, once you have the SDK installed you can simply run:
gcloud auth application-default login
The credentials from your json file are not being passed to the bigquery client, e.g.:
client = bigquery.Client(project=project, credentials=credentials)
to set that up you can follow these steps: https://cloud.google.com/bigquery/docs/authentication/end-user-installed
or this thread has some good details on setting the credentials environment variable: Setting GOOGLE_APPLICATION_CREDENTIALS for BigQuery Python CLI

google.cloud.bigquery.Client() ignoring provided scopes, resulting in Permission denied while getting Drive credentials

I am trying to query data stored in Drive via the google.cloud.bigquery Python library.
I've followed Google's guide for Querying Drive Data.
Thus, my code looks like this:
import google.auth
from google.cloud import bigquery
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
client = bigquery.Client(project, credentials)
query = client.query("""MY SQL HERE""")
query_results = query.result()
The issue is: The credentials object and bigquery client ignores the provided scopes, resulting in google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials. To clarify, neither credentials nor client include the drive scope provided.
What can I do to properly pass the drive scope to my bigquery client?
My application default credentials for my local environment is my authorized user, which is the owner of the project.
Once you've set your application default credentials as an authorized user, you cannot request additional scopes.
To request additional scopes, do so during activation of your authorized user.
More plainly, when you run gcloud auth application-default login, provide the --scopes option, followed by your desired scopes. For me, that was gcloud auth application-default login --scopes=openid,https://www.googleapis.com/auth/userinfo.email,https://www.googleapis.com/auth/cloud-platform,https://www.googleapis.com/auth/drive,https://www.googleapis.com/auth/bigquery
You should share the spreadsheet to the service account email, which you are trying to reach by.

BigQuery: Permission denied while getting Drive credentials - Unable to resolve the error

I was hoping to get some help with this error code I have been coming across.
Context:
The company I work for use the GSUITE product.
My team have their own Cloud Project setup.
Google Drive isn't a "personal" drive.
We utilise Airflow to refresh our BigQuery tables on a
daily/weekly/monthly basis.
I have followed these solutions
Access Denied: Permission denied while getting Drive credentials
"Encountered an error while globbing file pattern" error when using BigQuery API w/ Google Sheets
And also referenced
https://cloud.google.com/bigquery/external-data-drive#python_3
Problem
Cloud Composer : v 1.12.0
I have recently setup an external Bigquery table that reads a tab within a Google Sheet. My Airflow DAG has been failing to complete due to the access restriction to Drive.
I have added the following to the Airflow connection scopes:
airflow scopes
And also added the service account e-mail address to the Google Sheet the table is referencing via Share. I have also updated the Service account IAM roles to BigQuery admin. After following these steps, I still receive the error BigQuery: Permission denied while getting Drive credentials.
Problem2
Following the above, I found it easier to trouble shoot in local, so I created a VENV on my machine because its where im most comfortable troubleshooting. The goal is to simply query a Bigquery table that reads a Google sheet. However, after following the same above steps, I am still unable to get this to work.
My local code:
import dotenv
import pandas as pd
from google.cloud import bigquery
import google.auth
def run_BigQuery_table(sql):
dotenv.load_dotenv()
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
bigquery.Client(project, credentials)
output = pd.read_gbq(sql, project_id=project, dialect='standard')
return output
script_variable = "SELECT * FROM `X` LIMIT 10"
bq_output = run_BigQuery_table(script_variable)
print(bq_output)
My error:
raise self._exception
google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied > while getting Drive credentials.
raise GenericGBQException("Reason: {0}".format(ex))
pandas_gbq.gbq.GenericGBQException: Reason: 403 Access Denied: BigQuery BigQuery: Permission > denied while getting Drive credentials.
Is anyone able to help?
Cheers
So a colleague suggested that I explore the default pandas_gbq credentials, as this might be using default credentials to access the data.
Turns out, it worked.
You can manually set the pandas-gbq credentials by following this:
https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html
https://pandas-gbq.readthedocs.io/en/latest/api.html#pandas_gbq.Context.credentials
I simply added the following to my code
pdgbq.context.credentials = credentials
The final output:
import dotenv
import pandas as pd
from google.cloud import bigquery
import google.auth
import pandas_gbq as pdgbq
def run_BigQuery_table(sql):
dotenv.load_dotenv()
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
pdgbq.context.credentials = credentials
bigquery.Client(project, credentials)
output = pd.read_gbq(sql, project_id=project, dialect='standard')
return output
script_variable4 = "SELECT * FROM `X` LIMIT 10"
bq_output = run_BigQuery_table(script_variable3)
print(bq_output)
I often get these errors, and the vast majority were solved through creating and sharing service accounts. However I recently had a case where our gsuite administrator updated security settings so that only our employees could access gsuite related things (spreadsheets, storage etc). It was an attempt to plug a security gap, but in doing so, any email address or service account which did not have #ourcompany.com was blocked from using BigQuery.
I recommend you explore your company gsuite settings, and see if external access is blocked. I cannot say this is the fix for your case, but it was for me, so could be worth trying..

How to connect AMLS to ADLS Gen 2?

I would like to register a dataset from ADLS Gen2 in my Azure Machine Learning workspace (azureml-core==1.12.0). Given that service principal information is not required in the Python SDK documentation for .register_azure_data_lake_gen2(), I successfully used the following code to register ADLS gen2 as a datastore:
from azureml.core import Datastore
adlsgen2_datastore_name = os.environ['adlsgen2_datastore_name']
account_name=os.environ['account_name'] # ADLS Gen2 account name
file_system=os.environ['filesystem']
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name,
filesystem=file_system
)
However, when I try to register a dataset, using
from azureml.core import Dataset
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
data = Dataset.Tabular.from_delimited_files((adls_ds, 'folder/data.csv'))
I get an error
Cannot load any data from the specified path. Make sure the path is accessible and contains data.
ScriptExecutionException was caused by StreamAccessException.
StreamAccessException was caused by AuthenticationException.
'AdlsGen2-ReadHeaders' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID <CLIENT_REQUEST_ID>, request ID <REQUEST_ID>. Error message: [REDACTED]
| session_id=<SESSION_ID>
Do I need the to enable the service principal to get this to work? Using the ML Studio UI, it appears that the service principal is required even to register the datastore.
Another issue I noticed is that AMLS is trying to access the dataset here:
https://adls_gen2_account_name.**dfs**.core.windows.net/container/folder/data.csv whereas the actual URI in ADLS Gen2 is: https://adls_gen2_account_name.**blob**.core.windows.net/container/folder/data.csv
According to this documentation,you need to enable the service principal.
1.you need to register your application and grant the service principal with Storage Blob Data Reader access.
2.try this code:
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name,
filesystem=file_system,
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret
)
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
dataset = Dataset.Tabular.from_delimited_files((adls_ds,'sample.csv'))
print(dataset.to_pandas_dataframe())
Result:

I can not consult bigquery table in datalab?

I'm working on datalab but when I try to query a table in bigquery I get the following error:
Exception: invalid: Error while reading table: .... error message: Failed to read the spreadsheet. Errors: No OAuth token with Google Drive scope was found.
This only happens with the tables that are linked with google drive sheet.
now enable the google drive app in gcp
from google.cloud import bigquery
client = bigquery.Client()
sql = """
SELECT * FROM `proyect-xxxx.set_xxx.table_x` LIMIT 1000
"""
df = client.query(sql).to_dataframe()
project_id = 'proyect-xxxx'
df = client.query(sql, project=project_id).to_dataframe()
df.head(3)
Exception: invalid: Error while reading table: .... error message: Failed to read the spreadsheet. Errors: No OAuth token with Google Drive scope was found.
As state by the error you are trying to access Google Drive, Which store you BigQuery external table, without providing permission to your oAuth token
You will need to go to Google Console and enable this access to solve your problem.
You can use this link which provide a how-to explanation on this subject
Visit the Google API Console to obtain OAuth 2.0 credentials such as a client ID and client secret that are known to both Google and your application. The set of values varies based on what type of application you are building. For example, a JavaScript application does not require a secret, but a web server application does.

Categories