I can not consult bigquery table in datalab? - python

I'm working on datalab but when I try to query a table in bigquery I get the following error:
Exception: invalid: Error while reading table: .... error message: Failed to read the spreadsheet. Errors: No OAuth token with Google Drive scope was found.
This only happens with the tables that are linked with google drive sheet.
now enable the google drive app in gcp
from google.cloud import bigquery
client = bigquery.Client()
sql = """
SELECT * FROM `proyect-xxxx.set_xxx.table_x` LIMIT 1000
"""
df = client.query(sql).to_dataframe()
project_id = 'proyect-xxxx'
df = client.query(sql, project=project_id).to_dataframe()
df.head(3)

Exception: invalid: Error while reading table: .... error message: Failed to read the spreadsheet. Errors: No OAuth token with Google Drive scope was found.
As state by the error you are trying to access Google Drive, Which store you BigQuery external table, without providing permission to your oAuth token
You will need to go to Google Console and enable this access to solve your problem.
You can use this link which provide a how-to explanation on this subject
Visit the Google API Console to obtain OAuth 2.0 credentials such as a client ID and client secret that are known to both Google and your application. The set of values varies based on what type of application you are building. For example, a JavaScript application does not require a secret, but a web server application does.

Related

Python: AWS Aurora Serverless Data API: password authentication failed for user

I am running out of ideas.
I have created a Aurora Serverless RDS (Version 1) with Data API enabled. I now wish to execute SQL statements against it using the Data API (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html)
I have made a small test script using the provided guidelines (https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/data-api.html#data-api.calling:~:text=Calling%20the%20Data%20API%20from%20a%20Python%20application)
import boto3
session = boto3.Session(region_name="eu-central-1")
rds = session.client("rds-data")
secret = session.client("secretsmanager")
cluster_arn = "arn:aws:rds:eu-central-1:<accountID>:cluster:aurorapostgres"
secret_arn = "arn:aws:secretsmanager:eu-central-1:<accountID>:secret:dbsecret-xNMeQc"
secretvalue = secret.get_secret_value(
SecretId = secret_arn
)
print(secretvalue)
SQL = "SELECT * FROM pipelinedb.dataset"
res = rds.execute_statement(
resourceArn = cluster_arn,
secretArn = secret_arn,
database = "pipelinedb",
sql = SQL
)
print(res)
However I get the error message:
BadRequestException: An error occurred (BadRequestException) when calling the ExecuteStatement operation: FATAL: password authentication failed for user "bjarki"; SQLState: 28P01
I have verified the following:
Secret value is correct
Secret JSON structure is correctly following recommended structure (https://docs.aws.amazon.com/secretsmanager/latest/userguide/reference_secret_json_structure.html)
IAM user running the python script has Admin access to the account, and thus is privileged enough
Cluster is running in Public Subnets (internet gateways attached to route tables) and ACL and security groups are fully open.
The user "bjarki" is the master user and thus should have the required DB privileges to run the query
I am out of ideas on why this error is appearing - any good ideas?
Try this AWS tutorial that is located in the AWS Examples
Code Library. It shows how to use the AWS SDK for Python (Boto3) to create a web application that tracks work items in an Amazon Aurora database and emails reports by using Amazon Simple Email Service (Amazon SES). This example uses a front end built with React.js to interact with a Flask-RESTful Python backend.
Integrate a React.js web application with AWS services.
List, add, and update items in an Aurora table.
Send an email report of filtered work items by using Amazon SES.
Deploy and manage example resources with the included AWS CloudFormation script.
https://docs.aws.amazon.com/code-library/latest/ug/cross_RDSDataTracker_python_3_topic.html
Try running the CDK to properly setup the database too.
Once you successfully implemented this example, you wil get this front end with a Python backend.

Python connection to google big query using ADC

I am trying to get data from Google big query table using python. I dont have a service account access,but i have individual access to bigquery using gcloud. i have application default credentials Json file. I need to how to make a connection to bigquery usinG ADC.
code snippet:
from google.cloud import bigquery
conn=bigquery.Client()
query="select * from my_data.test1"
conn.query(query)
When i run above code snippet i am getting error saying:
NewConnectionError: <urllib3.connection.HttpsConnection object at 0x83dh46bdu640>: Failed to establish a new connection:[Error -2] Name or Service not known
Note: ENVIRONMENT Variable GOOGLE APPLICATION CREDENTIALS is not set and empty
Your script works for me because I authenticated using end user credentials from Google Cloud SDK, once you have the SDK installed you can simply run:
gcloud auth application-default login
The credentials from your json file are not being passed to the bigquery client, e.g.:
client = bigquery.Client(project=project, credentials=credentials)
to set that up you can follow these steps: https://cloud.google.com/bigquery/docs/authentication/end-user-installed
or this thread has some good details on setting the credentials environment variable: Setting GOOGLE_APPLICATION_CREDENTIALS for BigQuery Python CLI

BigQuery: Permission denied while getting Drive credentials - Unable to resolve the error

I was hoping to get some help with this error code I have been coming across.
Context:
The company I work for use the GSUITE product.
My team have their own Cloud Project setup.
Google Drive isn't a "personal" drive.
We utilise Airflow to refresh our BigQuery tables on a
daily/weekly/monthly basis.
I have followed these solutions
Access Denied: Permission denied while getting Drive credentials
"Encountered an error while globbing file pattern" error when using BigQuery API w/ Google Sheets
And also referenced
https://cloud.google.com/bigquery/external-data-drive#python_3
Problem
Cloud Composer : v 1.12.0
I have recently setup an external Bigquery table that reads a tab within a Google Sheet. My Airflow DAG has been failing to complete due to the access restriction to Drive.
I have added the following to the Airflow connection scopes:
airflow scopes
And also added the service account e-mail address to the Google Sheet the table is referencing via Share. I have also updated the Service account IAM roles to BigQuery admin. After following these steps, I still receive the error BigQuery: Permission denied while getting Drive credentials.
Problem2
Following the above, I found it easier to trouble shoot in local, so I created a VENV on my machine because its where im most comfortable troubleshooting. The goal is to simply query a Bigquery table that reads a Google sheet. However, after following the same above steps, I am still unable to get this to work.
My local code:
import dotenv
import pandas as pd
from google.cloud import bigquery
import google.auth
def run_BigQuery_table(sql):
dotenv.load_dotenv()
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
bigquery.Client(project, credentials)
output = pd.read_gbq(sql, project_id=project, dialect='standard')
return output
script_variable = "SELECT * FROM `X` LIMIT 10"
bq_output = run_BigQuery_table(script_variable)
print(bq_output)
My error:
raise self._exception
google.api_core.exceptions.Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied > while getting Drive credentials.
raise GenericGBQException("Reason: {0}".format(ex))
pandas_gbq.gbq.GenericGBQException: Reason: 403 Access Denied: BigQuery BigQuery: Permission > denied while getting Drive credentials.
Is anyone able to help?
Cheers
So a colleague suggested that I explore the default pandas_gbq credentials, as this might be using default credentials to access the data.
Turns out, it worked.
You can manually set the pandas-gbq credentials by following this:
https://pandas-gbq.readthedocs.io/en/latest/howto/authentication.html
https://pandas-gbq.readthedocs.io/en/latest/api.html#pandas_gbq.Context.credentials
I simply added the following to my code
pdgbq.context.credentials = credentials
The final output:
import dotenv
import pandas as pd
from google.cloud import bigquery
import google.auth
import pandas_gbq as pdgbq
def run_BigQuery_table(sql):
dotenv.load_dotenv()
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/cloud-platform",
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
pdgbq.context.credentials = credentials
bigquery.Client(project, credentials)
output = pd.read_gbq(sql, project_id=project, dialect='standard')
return output
script_variable4 = "SELECT * FROM `X` LIMIT 10"
bq_output = run_BigQuery_table(script_variable3)
print(bq_output)
I often get these errors, and the vast majority were solved through creating and sharing service accounts. However I recently had a case where our gsuite administrator updated security settings so that only our employees could access gsuite related things (spreadsheets, storage etc). It was an attempt to plug a security gap, but in doing so, any email address or service account which did not have #ourcompany.com was blocked from using BigQuery.
I recommend you explore your company gsuite settings, and see if external access is blocked. I cannot say this is the fix for your case, but it was for me, so could be worth trying..

How to connect AMLS to ADLS Gen 2?

I would like to register a dataset from ADLS Gen2 in my Azure Machine Learning workspace (azureml-core==1.12.0). Given that service principal information is not required in the Python SDK documentation for .register_azure_data_lake_gen2(), I successfully used the following code to register ADLS gen2 as a datastore:
from azureml.core import Datastore
adlsgen2_datastore_name = os.environ['adlsgen2_datastore_name']
account_name=os.environ['account_name'] # ADLS Gen2 account name
file_system=os.environ['filesystem']
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name,
filesystem=file_system
)
However, when I try to register a dataset, using
from azureml.core import Dataset
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
data = Dataset.Tabular.from_delimited_files((adls_ds, 'folder/data.csv'))
I get an error
Cannot load any data from the specified path. Make sure the path is accessible and contains data.
ScriptExecutionException was caused by StreamAccessException.
StreamAccessException was caused by AuthenticationException.
'AdlsGen2-ReadHeaders' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID <CLIENT_REQUEST_ID>, request ID <REQUEST_ID>. Error message: [REDACTED]
| session_id=<SESSION_ID>
Do I need the to enable the service principal to get this to work? Using the ML Studio UI, it appears that the service principal is required even to register the datastore.
Another issue I noticed is that AMLS is trying to access the dataset here:
https://adls_gen2_account_name.**dfs**.core.windows.net/container/folder/data.csv whereas the actual URI in ADLS Gen2 is: https://adls_gen2_account_name.**blob**.core.windows.net/container/folder/data.csv
According to this documentation,you need to enable the service principal.
1.you need to register your application and grant the service principal with Storage Blob Data Reader access.
2.try this code:
adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
datastore_name=adlsgen2_datastore_name,
account_name=account_name,
filesystem=file_system,
tenant_id=tenant_id,
client_id=client_id,
client_secret=client_secret
)
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
dataset = Dataset.Tabular.from_delimited_files((adls_ds,'sample.csv'))
print(dataset.to_pandas_dataframe())
Result:

Google AppEngine to Fusion Tables with Service Accounts

Late to the game on migrating to the /v1 Fusion Table API but no holding off any longer.
I'm using Python on AppEngine and trying to connect to Google Fusion Tables with Google Service Accounts (the more complicated cousin of OAuth2 for server side apps that uses JSON Web Tokens)
I found another question that pointed me to some documentation for using Service Accounts with Google Prediction API.
Fusion Table and Google Service Accounts
So far I've got
import httplib2
from oauth2client.appengine import AppAssertionCredentials
from apiclient.discovery import build
credentials = AppAssertionCredentials(scope='https://www.googleapis.com/auth/fusiontables')
http = credentials.authorize(httplib2.Http(memcache)) #Http(memcache)
service = build("fusiontables", "v1", http=http)
# list the tables
tables = service.table().list().execute() # <-- ERROR 401 invalid credentials here
Does anyone have an example of connecting to Fusion Tables on AppEngine using Service Accounts they might be able to share? Or something nice online?
Thanks
This actually does work. The important parts are you have to give the app engine service account access to your fusion table. If you are writing then the account needs write access. For help see: https://developers.google.com/api-client-library/python/start/installation (look for Getting started: Quickstart)
Your app engine service account will be something like your-app-id#appspot.gserviceaccount.com
You must also make the app engine service account a team member in the api console and give it "can edit" privilege.
SCOPE='https://www.googleapis.com/auth/fusiontables'
PROJECT_NUMBER = 'XXXXXXXX' # REPLACE WITH YOUR Project ID
# Create a new API service for interacting with Fusion Tables
credentials = AppAssertionCredentials(scope=SCOPE)
http = credentials.authorize(httplib2.Http())
logging.info('QQQ: accountname: %s' % app_identity.get_service_account_name())
service = build('fusiontables', 'v1', http=http, developerKey='YOUR KEY HERE FROM API CONSOLE')
def log(value1,value2=None):
tableid='YOUR TABLE ID FROM FUSION TABLES'
now = strftime("%Y-%m-%d %H:%M:%S", gmtime())
service.query().sql(sql="INSERT INTO %s (Temperature,Date) values(%s,'%s')" % (tableid,value1,now)).execute()
to clarify Ralph Yozzo's answer: you need to add the value of 'client_email' from the json file you downloaded when you created your service_account credentials (the same file you load when using ServiceAccountCredentials.from_json_keyfile_name('service_acct.json') with the new oauth2client library), to your table's sharing dialog screen (click 1 then enter the email address in 2)
Since Fusion Tables' tables are owned by individual Gmail accounts rather than the service account associated with an API console project, the AppAssertionCredentials probably won't work. It would make for an interesting feature request, though:
http://code.google.com/p/fusion-tables/issues/list
The best online resource I have found for help connecting Python AppEngine to Fusion Tables API with Oauth2 is
Google APIs Client Library for Python
The slide presentation is helpful to understanding the online samples, why decorators are used.
Also useful for understanding whether to use the app's Service Acount or User Accounts to authenticate is:
Using OAuth 2.0 to Access Google APIs
Consider installing the Google APIs Client Library for Python
Apart from the scope, the Oauth2 is more or less common to all Google APIs not just fusion tables.
Once oauth2 is working, see the Google Fusion Tables API
In case you want it to work from another host than Google App Engine or Google Compute Engine (e.g. from localhost for testing) then you should use ServiceAccountCredentials created from a json key file that you can generate and download from your service account page.
scopes = ['https://www.googleapis.com/auth/fusiontables']
keyfile = 'PATH TO YOUR SERVICE ACCOUNT KEY FILE'
FTID = 'FUSION TABLE ID'
credentials = ServiceAccountCredentials.from_json_keyfile_name(keyfile, scopes)
http_auth = credentials.authorize(Http(memcache))
service = build('fusiontables', 'v2', http=http_auth)
def insert(title, description):
sqlInsert = "INSERT INTO {0} (Title,Description) values('{1}','{2}')".format(FTID, title, description)
service.query().sql(sql=sqlInsert).execute()
Refer to Google's page on service accounts for explanations.

Categories