Permissions issues when querying a table created from Google Sheets - python

I have created a Big Query table using a Google Sheet as a source.
I am trying to query the table with some Python script.
from google.cloud import bigquery
from google.oauth2 import service_account
from google.auth.transport import requests
credentials = service_account.Credentials.from_service_account_file(
r"[key location]")
project_id = '[PROJECT]'
client = bigquery.Client(credentials= credentials,project=project_id)
query_job = client.query("""
SELECT *
FROM [TABLENAME]
LIMIT 10""")
results = query_job.result()
However, I am receiving the following error.
Forbidden: 403 Access Denied: BigQuery BigQuery: Permission denied while getting Drive credentials.
I have successfully used the above code to query another table (not from a Sheet source), so the issue is specifically to do with the table sourced from Sheets. I have tried running the code both on a cloud resource (using a service account) and locally.
Does anyone know the fix?

You must enable drive access of BigQuery by using the code from this documentation to create credentials for both BigQuery and Google Drive. Please do note that both BigQuery API and Google Drive API must be enabled before running the code.
I updated the code from your question and used it on my testing as shown below:
from google.cloud import bigquery
import google.auth
# Create credentials with Drive & BigQuery API scopes.
# Both APIs must be enabled for your project before running this code.
credentials, project = google.auth.default(
scopes=[
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
)
project_id = '[PROJECT]'
client = bigquery.Client(credentials= credentials,project=project_id)
query_job = client.query("""
SELECT *
FROM [TABLENAME]
LIMIT 10""")
results = query_job.result()
#Printing results for my testing
for row in results:
row1 = row['string_field_0']
row2 = row['string_field_1']
print(f'{row1} | {row2}')
Output:

Related

Crating table in BigQuery from csv file located in Google Storage

I need to create a Google Cloud function that automatically creates a table from simple .csv file located in bucket on Google Storage. I created new function and I wrote Python script according to the schema below. It seems to be correct, but when I'm trying to implement the function, I see the error.here is the error screenshot. I really don't know what is wrong with my code. Please help.
from google.cloud import bigquery
client = bigquery.Client()
table_id = 'myprojectname.newdatasetname.newtablename'
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField('A', 'INTEGER'),
bigquery.SchemaField('B', 'INTEGER'),
bigquery.SchemaField('C', 'INTEGER')
],
skip_leading_rows=0,
)
uri = 'gs://my-bucket-name/*.csv'
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
)
load_job.result()
destination_table = client.get_table(table_id)
print('Loaded {} rows.'.format(destination_table.num_rows))

Cannot connect to query from python using Service account with BigQuery Job User role (in IAM)

Cloud team granted to me a Service account (SA) with BigQuery Job User role (IAM) for query from python.
But i got that issue
403 request failed: the user does not have
'bigquery.readsessions.create' permission to project
But this SA is worked in Java. Looking for ideas to solve this, thanks ^^
My simple code here:
import pandas as pd
import google.cloud.bigquery as gbq
gbq_client = gbq.Client.from_service_account_json('credentials/credentials_bigquery.json')
#function query
def query_job(client,query):
query_job = client.query(query) # Make an API request.
df_from_bq = query_job.to_dataframe()
return df_from_bq
#query
qr_user = """
SELECT user_id FROM `CUSTOMER_DATA_PLATFORM.CDP_TBL` LIMIT 1000
"""
user = query_job(gbq_client,qr_user)

Use Managed Identity to authenticate Azure App Service to SQL Database

I am trying to connect a Python Flask app running in Azure App Service Web App to an Azure SQL Database.
The works just fine when I use SQL authentication with username and password.
Now I want to move to using the Web Apps managed identity.
I have activated the system-assigned managed identity, created a user for it in SQL and added it to the db_datareader role.
I am connecting with SqlAlchemy using a connection string like this
params = urllib.parse.quote_plus(os.environ['SQL_CONNECTION_STRING'])
conn_str = 'mssql+pyodbc:///?odbc_connect={}'.format(params)
engine_azure = db.create_engine(conn_str,echo=True)
The connection string is stored as an application setting, and its value is
"Driver={ODBC Driver 17 for SQL Server};Server=tcp:<server>.database.windows.net,1433;Database=<database>;Authentication=ActiveDirectoryMsi;"
I expected this to be all I need to do, but now my app is not starting.
The logs report a timeout when connecting to the database.
How can I fix this?
I know this is quite an old post, but it may help people like me who are looking for a solution.
You could modify the connection string by adding "Authentication" parameters as "ActiveDirectoryMsi", no need to use endpoint and headers.
(Works with Azure SQL, for other databases like Postgress you may need to use the struct token)
import pyodbc
pyodbc.connect(
"Driver="
+ driver
+ ";Server="
+ server
+ ";PORT=1433;Database="
+ database
+ ";Authentication=ActiveDirectoryMsi")
I wrote a quick article for those who are interested in Azure MSI:
https://hedihargam.medium.com/python-sql-database-access-with-managed-identity-from-azure-web-app-functions-14566e5a0f1a
If you want to connect Azure SQL database with Azure MSI in python application, we can use the SDK pyodbc to implement it.
For example
Enable system-assigned identity for your Azure app service
Add the MSi as contained database users in your database
a. Connect your SQL database with Azure SQL AD admin (I use SSMS to do it)
b. run the following the script in your database
CREATE USER <your app service name> FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER <your app service name>
ALTER ROLE db_datawriter ADD MEMBER <your app service name>
ALTER ROLE db_ddladmin ADD MEMBER <your app service name>
Code
import os
import pyodbc
import requests
import struct
#get access token
identity_endpoint = os.environ["IDENTITY_ENDPOINT"]
identity_header = os.environ["IDENTITY_HEADER"]
resource_uri="https://database.windows.net/"
token_auth_uri = f"{identity_endpoint}?resource={resource_uri}&api-version=2019-08-01"
head_msi = {'X-IDENTITY-HEADER':identity_header}
resp = requests.get(token_auth_uri, headers=head_msi)
access_token = resp.json()['access_token']
accessToken = bytes(access_token, 'utf-8');
exptoken = b"";
for i in accessToken:
exptoken += bytes({i});
exptoken += bytes(1);
tokenstruct = struct.pack("=i", len(exptoken)) + exptoken;
conn = pyodbc.connect("Driver={ODBC Driver 17 for SQL Server};Server=tcp:andyserver.database.windows.net,1433;Database=database2", attrs_before = { 1256:bytearray(tokenstruct) });
cursor = conn.cursor()
cursor.execute("select ##version")
row = cursor.fetchall()
For more details, please refer to the
https://github.com/AzureAD/azure-activedirectory-library-for-python/wiki/Connect-to-Azure-SQL-Database
https://learn.microsoft.com/en-us/azure/app-service/overview-managed-identity
https://learn.microsoft.com/en-us/azure/sql-database/sql-database-aad-authentication-configure

Error: Trying to use PyAthena to access an Athena

I'm currently trying to build a data pipeline from an AWS Athena database so my team can query information using Python. However, I'm running into an issue with insufficient permissions.
We are able to query the data in Tableau, but we wanted to integrate it into an app we are developing.
Here is the code we followed from PyAthena's documentation.
from pyathena import connect
import pandas as pd
conn = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
region_name='us-west-2')
df = pd.read_sql("SELECT * FROM many_rows", conn)
print(df.head())
Here is the resulting error.
OperationalError: Insufficient permissions to execute the query. User: arn:aws:iam::OUR_ADDRESS:user/USER is not authorized to perform: glue:GetTable on resource: arn:aws:glue:us-west-2:OUR_ADDRESS:table/default/OUR_DATABASE
I'm guessing that this is an issue with IAM permissions on the Server side with respect to Amazon Glue. But I'm not sure how to resolve it.

Access data via BigQuery in python

I am trying to access data in python using bigquery api , here is my code.
I have placed the pem file inside the same folder but script returns an error "googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/digin-1086/queries?alt=json returned "Not found: Table digin-1086:dataset.my_table">
from bigquery import get_client
# BigQuery project id as listed in the Google Developers Console.
project_id = 'digin-1086'
# Service account email address as listed in the Google Developers Console.
service_account = '77441948210-4fhu1kc1driicjecriqupndkr60npnh#developer.gserviceaccount.com'
# PKCS12 or PEM key provided by Google.
key = 'Digin-d6387c00c5a'
client = get_client(project_id, service_account=service_account,
private_key_file=key, readonly=True)
# Submit an async query.
job_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')
# Check if the query has finished running.
complete, row_count = client.check_job(job_id)
# Retrieve the results.
results = client.get_query_rows(job_id)
The error says it can't find your table, nothing to do with the pem file. You need to make the table exits in the dataset.
To access data via BigQuery in python you can do the following:
from google.cloud import bigquery
from google.oauth2 import service_account
from google.auth.transport import requests
credentials = service_account.Credentials.from_service_account_file(
r'filelocation\xyz.json')
project_id = 'abc'
client = bigquery.Client(credentials= credentials,project=project_id)
query_job = client.query("""
SELECT *
FROM tabename
LIMIT 10""")
results = query_job.result()
for row in results:
print(row)}

Categories