Error load data local CSV to BigQuery using Python - python

I'm new into data engineering field, and want to create table and inserting the data to BigQuery using Python, but in the process I got error message
even though I already set the google_application_credential through the shell, the error message still appear
here is my code
from google.cloud import bigquery
from google.cloud import language
from google.oauth2 import service_account
import os
os.environ["GOOGLE_APPLICATION_CREDENTIAL"]=r"C:/Users/Pamungkas/Downloads/testing-353407-a3c774efeb5a.json"
client = bigquery.Client()
table_id="testing-353407.testing_field.sales"
file_path=r"C:\Users\Pamungkas\Downloads\sales.csv"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE #added to have truncate and insert load
)
with open(file_path, "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)

As #p13rr0m suggested, you should have to use the environment variable as GOOGLE_APPLICATION_CREDENTIALS instead of GOOGLE_APPLICATION_CREDENTIAL to resolve your issue.

Related

Error while implementing BigQuery function

I created function that creates new table in BigQuery from .csv file located in bucket on Google Storage according to the schema below. When I tested this function, it creates new table, but I also see the following error in logs:
line 720, in create_table dataset_id = table.dataset_id AttributeError: 'LoadJob' object has no attribute 'dataset_id'
I spent a lot of time trying to find what's going on, but I have no idea what can be the reason of this error. Could someone help me, please?
def hello_gcs(table_id, uri):
from google.cloud import bigquery
client = bigquery.Client()
uri = 'gs://bucket_name/file_name.csv'
table_id = 'project_name.dataset_name.new_table_name'
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1,autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE
)
table = client.load_table_from_uri(uri,table_id,job_config=job_config)
table = client.create_table(table)
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)

Crating table in BigQuery from csv file located in Google Storage

I need to create a Google Cloud function that automatically creates a table from simple .csv file located in bucket on Google Storage. I created new function and I wrote Python script according to the schema below. It seems to be correct, but when I'm trying to implement the function, I see the error.here is the error screenshot. I really don't know what is wrong with my code. Please help.
from google.cloud import bigquery
client = bigquery.Client()
table_id = 'myprojectname.newdatasetname.newtablename'
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField('A', 'INTEGER'),
bigquery.SchemaField('B', 'INTEGER'),
bigquery.SchemaField('C', 'INTEGER')
],
skip_leading_rows=0,
)
uri = 'gs://my-bucket-name/*.csv'
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
)
load_job.result()
destination_table = client.get_table(table_id)
print('Loaded {} rows.'.format(destination_table.num_rows))

Insert CSV data into Bigquery table using Python

I'm trying to insert csv file into bigquery using python, but I think I have missed something since the result is replace,
from google.cloud import bigquery
from google.oauth2 import service_account
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=r"C:/Users/Pamungkas/Documents/Dump_Data/testing-353407-a3c774efeb5a.json"
client = bigquery.Client()
table_id="testing-353407.testing_coba.Sales_Menu_COGS_Detail_Report"
file_path=r"C:\Users\Pamungkas\Downloads\Sales_Menu_COGS_Detail_Report_Jan.csv"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE
)
with open(file_path, "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
I guess the problem is in job_config, but I still didn't get it,
can anyone help me on this?
As mentioned by #RiccoD, since you're appending the data from csv into BigQuery, you'll have to change the write disposition in job config to WRITE_APPEND.
So change the job config part as:
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_APPEND

Cloud Storage to BigQuery via Python

I am receiving a data drop into my GCS bucket daily and have a cloud function that moves said csv data to a BigQuery Table (see code below).
import datetime
def load_table_uri_csv(table_id):
# [START bigquery_load_table_gcs_csv]
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "dataSet.dataTable"
job_config = bigquery.LoadJobConfig(
write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
)
uri = "gs://client-data/team/looker-client-" + str(datetime.date.today()) + ".csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Loaded {} rows.".format(destination_table.num_rows))
# [END bigquery_load_table_gcs_csv]
However, the data comes with a 2 day look back resulting in repeated data in the BigQuery table.
Is there a way for me to update this cloud function to only pull in the most recent date from the csv once it is dropped off? This way I can easily avoid duplicative data within the reporting.
Or, maybe theres a way for me to run a scheduled query via BigQuery to resolve this?
For reference, the date column within the CSV comes in a TIMESTAMP schema.
Any and all help is appreciated!
There is seems to be no way to do this directly from Google Cloud Platform, unfortunately. You will need filter your information somehow before loading it.
You could review the information from the CSV in your code or through another medium.
It's also possible to submit a feature request for Google to consider this functionality.

Access data via BigQuery in python

I am trying to access data in python using bigquery api , here is my code.
I have placed the pem file inside the same folder but script returns an error "googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/digin-1086/queries?alt=json returned "Not found: Table digin-1086:dataset.my_table">
from bigquery import get_client
# BigQuery project id as listed in the Google Developers Console.
project_id = 'digin-1086'
# Service account email address as listed in the Google Developers Console.
service_account = '77441948210-4fhu1kc1driicjecriqupndkr60npnh#developer.gserviceaccount.com'
# PKCS12 or PEM key provided by Google.
key = 'Digin-d6387c00c5a'
client = get_client(project_id, service_account=service_account,
private_key_file=key, readonly=True)
# Submit an async query.
job_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')
# Check if the query has finished running.
complete, row_count = client.check_job(job_id)
# Retrieve the results.
results = client.get_query_rows(job_id)
The error says it can't find your table, nothing to do with the pem file. You need to make the table exits in the dataset.
To access data via BigQuery in python you can do the following:
from google.cloud import bigquery
from google.oauth2 import service_account
from google.auth.transport import requests
credentials = service_account.Credentials.from_service_account_file(
r'filelocation\xyz.json')
project_id = 'abc'
client = bigquery.Client(credentials= credentials,project=project_id)
query_job = client.query("""
SELECT *
FROM tabename
LIMIT 10""")
results = query_job.result()
for row in results:
print(row)}

Categories