I'm trying to insert csv file into bigquery using python, but I think I have missed something since the result is replace,
from google.cloud import bigquery
from google.oauth2 import service_account
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]=r"C:/Users/Pamungkas/Documents/Dump_Data/testing-353407-a3c774efeb5a.json"
client = bigquery.Client()
table_id="testing-353407.testing_coba.Sales_Menu_COGS_Detail_Report"
file_path=r"C:\Users\Pamungkas\Downloads\Sales_Menu_COGS_Detail_Report_Jan.csv"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE
)
with open(file_path, "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
I guess the problem is in job_config, but I still didn't get it,
can anyone help me on this?
As mentioned by #RiccoD, since you're appending the data from csv into BigQuery, you'll have to change the write disposition in job config to WRITE_APPEND.
So change the job config part as:
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_APPEND
Related
I created function that creates new table in BigQuery from .csv file located in bucket on Google Storage according to the schema below. When I tested this function, it creates new table, but I also see the following error in logs:
line 720, in create_table dataset_id = table.dataset_id AttributeError: 'LoadJob' object has no attribute 'dataset_id'
I spent a lot of time trying to find what's going on, but I have no idea what can be the reason of this error. Could someone help me, please?
def hello_gcs(table_id, uri):
from google.cloud import bigquery
client = bigquery.Client()
uri = 'gs://bucket_name/file_name.csv'
table_id = 'project_name.dataset_name.new_table_name'
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1,autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE
)
table = client.load_table_from_uri(uri,table_id,job_config=job_config)
table = client.create_table(table)
print(
"Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id)
)
I need to create a Google Cloud function that automatically creates a table from simple .csv file located in bucket on Google Storage. I created new function and I wrote Python script according to the schema below. It seems to be correct, but when I'm trying to implement the function, I see the error.here is the error screenshot. I really don't know what is wrong with my code. Please help.
from google.cloud import bigquery
client = bigquery.Client()
table_id = 'myprojectname.newdatasetname.newtablename'
job_config = bigquery.LoadJobConfig(
schema=[
bigquery.SchemaField('A', 'INTEGER'),
bigquery.SchemaField('B', 'INTEGER'),
bigquery.SchemaField('C', 'INTEGER')
],
skip_leading_rows=0,
)
uri = 'gs://my-bucket-name/*.csv'
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
)
load_job.result()
destination_table = client.get_table(table_id)
print('Loaded {} rows.'.format(destination_table.num_rows))
I'm new into data engineering field, and want to create table and inserting the data to BigQuery using Python, but in the process I got error message
even though I already set the google_application_credential through the shell, the error message still appear
here is my code
from google.cloud import bigquery
from google.cloud import language
from google.oauth2 import service_account
import os
os.environ["GOOGLE_APPLICATION_CREDENTIAL"]=r"C:/Users/Pamungkas/Downloads/testing-353407-a3c774efeb5a.json"
client = bigquery.Client()
table_id="testing-353407.testing_field.sales"
file_path=r"C:\Users\Pamungkas\Downloads\sales.csv"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE #added to have truncate and insert load
)
with open(file_path, "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
As #p13rr0m suggested, you should have to use the environment variable as GOOGLE_APPLICATION_CREDENTIALS instead of GOOGLE_APPLICATION_CREDENTIAL to resolve your issue.
I am receiving a data drop into my GCS bucket daily and have a cloud function that moves said csv data to a BigQuery Table (see code below).
import datetime
def load_table_uri_csv(table_id):
# [START bigquery_load_table_gcs_csv]
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "dataSet.dataTable"
job_config = bigquery.LoadJobConfig(
write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
)
uri = "gs://client-data/team/looker-client-" + str(datetime.date.today()) + ".csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Loaded {} rows.".format(destination_table.num_rows))
# [END bigquery_load_table_gcs_csv]
However, the data comes with a 2 day look back resulting in repeated data in the BigQuery table.
Is there a way for me to update this cloud function to only pull in the most recent date from the csv once it is dropped off? This way I can easily avoid duplicative data within the reporting.
Or, maybe theres a way for me to run a scheduled query via BigQuery to resolve this?
For reference, the date column within the CSV comes in a TIMESTAMP schema.
Any and all help is appreciated!
There is seems to be no way to do this directly from Google Cloud Platform, unfortunately. You will need filter your information somehow before loading it.
You could review the information from the CSV in your code or through another medium.
It's also possible to submit a feature request for Google to consider this functionality.
This is a simple code to export from Biq Query to Google storage, in CSV format
def export_data():
client = bigquery.Client()
project = 'xxxxx'
dataset_id = 'xxx'
table_id = 'xxx'
bucket_name = 'xxx'
destination_uri = 'gs://{}/{}'.format(bucket_name, 'EXPORT_FILE.csv')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)
extract_job = client.extract_table(
table_ref,
destination_uri,
# Location must match that of the source table.
location='EU') # API request
extract_job.result() # Waits for job to complete.
print('Exported {}:{}.{} to {}'.format(
project, dataset_id, table_id, destination_uri))
It works perfectly for general tables, BUT when I try to export data from saved table VIEW, it failed with this error:
BadRequest: 400 Using table xxx:xxx.xxx#123456 is not allowed for this operation because of its type. Try using a different table that is of type TABLE.
Does exist any way to export data from table view?
What I'm trying to achieve is, to get the data from BigQuery in CSV format, and upload to Google analytics Product Data
BigQuery views are subject to a few limitations:
You cannot run a BigQuery job that exports data from a view.
There are more than 10+ other limitations which I didn't posted in the answer as they might change. Follow the link to read all of them.
You need to query your view and write the results to a destination table, and then issue an export job on the destination table.