I know that: The result of query will save in a temporary table. But how to get its name to use in other query.
from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from oauth2client.client import GoogleCredentials
def test():
project_id = "598330041668"
credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)
# [START run_query]
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * '
'FROM [test.names];')
}
query_response = query_request.query(
projectId=project_id,
body=query_data).execute()
# [END run_query]
Find the job ID of the query you ran. This is present in the response from jobs.query in the jobReference field.
Look up the query configuration for the job you ran using the jobs.get method. Inspect the field configuration.query.destinationTable.
However, note that if you didn't specify your own destination table, that BigQuery assigns a temporary one that should only be used in certain APIs, which does not include other queries:
The query results from this method are saved to a temporary table that is deleted approximately 24 hours after the query is run. You can read this results table by calling either bigquery.tabledata.list(table_id) or bigquery.jobs.getQueryResults(job_reference). The table and dataset name are non-standard, and cannot be used in any other other APIs, as the behavior may be unpredictable.
If you want to use the results of a query in a subsequent query, I suggest setting the configuration.query.destinationTable yourself to write to a table of your choice.
Related
I'm using the azure python api (https://github.com/microsoft/azure-devops-python-api) and I need to be able to query & find a specific work item based on a custom field value.
The closest thing I can find is the function create_query, but Im hoping to be able to run a query such as
queryRsp = wit_5_1_client.run_query(
posted_query='',
project=project.id,
query='Custom.RTCID=282739'
)
I just need to find my azure devops work item where the custom field RTCID has a certain specific unique value.
Do i need to create a query with the api, run it, get results, then delete the query? Or is there any way I can run this simple query and get the results using the azure devops api?
Your Requirement can be achieved.
For example, on my side, there is two workitems that have custom field 'RTCID':
The Below is how to use python to design this feature(On my side, both organization name and project name named 'BowmanCP'):
#query workitems from azure devops
from azure.devops.connection import Connection
from msrest.authentication import BasicAuthentication
from azure.devops.v5_1.work_item_tracking.models import Wiql
import pprint
# Fill in with your personal access token and org URL
personal_access_token = '<Your Personal Access Token>'
organization_url = 'https://dev.azure.com/BowmanCP'
# Create a connection to the org
credentials = BasicAuthentication('', personal_access_token)
connection = Connection(base_url=organization_url, creds=credentials)
# Get a client (the "core" client provides access to projects, teams, etc)
core_client = connection.clients.get_core_client()
#query workitems, custom field 'RTCID' has a certain specific unique value
work_item_tracking_client = connection.clients.get_work_item_tracking_client()
query = "SELECT [System.Id], [System.WorkItemType], [System.Title], [System.AssignedTo], [System.State], [System.Tags] FROM workitems WHERE [System.TeamProject] = 'BowmanCP' AND [Custom.RTCID] = 'xxx'"
#convert query str to wiql
wiql = Wiql(query=query)
query_results = work_item_tracking_client.query_by_wiql(wiql).work_items
#get the results via title
for item in query_results:
work_item = work_item_tracking_client.get_work_item(item.id)
pprint.pprint(work_item.fields['System.Title'])
Successfully got them on my side:
SDK source code is here:
https://github.com/microsoft/azure-devops-python-api/blob/451cade4c475482792cbe9e522c1fee32393139e/azure-devops/azure/devops/released/work_item_tracking/work_item_tracking_client.py#L704
You can refer to above source code.
I'm new into data engineering field, and want to create table and inserting the data to BigQuery using Python, but in the process I got error message
even though I already set the google_application_credential through the shell, the error message still appear
here is my code
from google.cloud import bigquery
from google.cloud import language
from google.oauth2 import service_account
import os
os.environ["GOOGLE_APPLICATION_CREDENTIAL"]=r"C:/Users/Pamungkas/Downloads/testing-353407-a3c774efeb5a.json"
client = bigquery.Client()
table_id="testing-353407.testing_field.sales"
file_path=r"C:\Users\Pamungkas\Downloads\sales.csv"
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE #added to have truncate and insert load
)
with open(file_path, "rb") as source_file:
job = client.load_table_from_file(source_file, table_id, job_config=job_config)
job.result() # Waits for the job to complete.
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns to {}".format(
table.num_rows, len(table.schema), table_id
)
)
As #p13rr0m suggested, you should have to use the environment variable as GOOGLE_APPLICATION_CREDENTIALS instead of GOOGLE_APPLICATION_CREDENTIAL to resolve your issue.
I am receiving a data drop into my GCS bucket daily and have a cloud function that moves said csv data to a BigQuery Table (see code below).
import datetime
def load_table_uri_csv(table_id):
# [START bigquery_load_table_gcs_csv]
from google.cloud import bigquery
# Construct a BigQuery client object.
client = bigquery.Client()
# TODO(developer): Set table_id to the ID of the table to create.
table_id = "dataSet.dataTable"
job_config = bigquery.LoadJobConfig(
write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True,
)
uri = "gs://client-data/team/looker-client-" + str(datetime.date.today()) + ".csv"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
) # Make an API request.
load_job.result() # Waits for the job to complete.
destination_table = client.get_table(table_id) # Make an API request.
print("Loaded {} rows.".format(destination_table.num_rows))
# [END bigquery_load_table_gcs_csv]
However, the data comes with a 2 day look back resulting in repeated data in the BigQuery table.
Is there a way for me to update this cloud function to only pull in the most recent date from the csv once it is dropped off? This way I can easily avoid duplicative data within the reporting.
Or, maybe theres a way for me to run a scheduled query via BigQuery to resolve this?
For reference, the date column within the CSV comes in a TIMESTAMP schema.
Any and all help is appreciated!
There is seems to be no way to do this directly from Google Cloud Platform, unfortunately. You will need filter your information somehow before loading it.
You could review the information from the CSV in your code or through another medium.
It's also possible to submit a feature request for Google to consider this functionality.
I am trying to access data in python using bigquery api , here is my code.
I have placed the pem file inside the same folder but script returns an error "googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/digin-1086/queries?alt=json returned "Not found: Table digin-1086:dataset.my_table">
from bigquery import get_client
# BigQuery project id as listed in the Google Developers Console.
project_id = 'digin-1086'
# Service account email address as listed in the Google Developers Console.
service_account = '77441948210-4fhu1kc1driicjecriqupndkr60npnh#developer.gserviceaccount.com'
# PKCS12 or PEM key provided by Google.
key = 'Digin-d6387c00c5a'
client = get_client(project_id, service_account=service_account,
private_key_file=key, readonly=True)
# Submit an async query.
job_id, _results = client.query('SELECT * FROM dataset.my_table LIMIT 1000')
# Check if the query has finished running.
complete, row_count = client.check_job(job_id)
# Retrieve the results.
results = client.get_query_rows(job_id)
The error says it can't find your table, nothing to do with the pem file. You need to make the table exits in the dataset.
To access data via BigQuery in python you can do the following:
from google.cloud import bigquery
from google.oauth2 import service_account
from google.auth.transport import requests
credentials = service_account.Credentials.from_service_account_file(
r'filelocation\xyz.json')
project_id = 'abc'
client = bigquery.Client(credentials= credentials,project=project_id)
query_job = client.query("""
SELECT *
FROM tabename
LIMIT 10""")
results = query_job.result()
for row in results:
print(row)}
We're using Google BigQuery via the Python API. How would I create a table (new one or overwrite old one) from query results? I reviewed the query documentation, but I didn't find it useful.
We want to simulate:
"SELEC ... INTO ..." from ANSI SQL.
You can do this by specifying a destination table in the query. You would need to use the Jobs.insert API rather than the Jobs.query call, and you should specify writeDisposition=WRITE_APPEND and fill out the destination table.
Here is what the configuration would look like, if you were using the raw API. If you're using Python, the Python client should give accessors to these same fields:
"configuration": {
"query": {
"query": "select count(*) from foo.bar",
"destinationTable": {
"projectId": "my_project",
"datasetId": "my_dataset",
"tableId": "my_table"
},
"createDisposition": "CREATE_IF_NEEDED",
"writeDisposition": "WRITE_APPEND",
}
}
The accepted answer is correct, but it does not provide Python code to perform the task. Here is an example, refactored out of a small custom client class I just wrote. It does not handle exceptions, and the hard-coded query should be customised to do something more interesting than just SELECT * ...
import time
from google.cloud import bigquery
from google.cloud.bigquery.table import Table
from google.cloud.bigquery.dataset import Dataset
class Client(object):
def __init__(self, origin_project, origin_dataset, origin_table,
destination_dataset, destination_table):
"""
A Client that performs a hardcoded SELECT and INSERTS the results in a
user-specified location.
All init args are strings. Note that the destination project is the
default project from your Google Cloud configuration.
"""
self.project = origin_project
self.dataset = origin_dataset
self.table = origin_table
self.dest_dataset = destination_dataset
self.dest_table_name = destination_table
self.client = bigquery.Client()
def run(self):
query = ("SELECT * FROM `{project}.{dataset}.{table}`;".format(
project=self.project, dataset=self.dataset, table=self.table))
job_config = bigquery.QueryJobConfig()
# Set configuration.query.destinationTable
destination_dataset = self.client.dataset(self.dest_dataset)
destination_table = destination_dataset.table(self.dest_table_name)
job_config.destination = destination_table
# Set configuration.query.createDisposition
job_config.create_disposition = 'CREATE_IF_NEEDED'
# Set configuration.query.writeDisposition
job_config.write_disposition = 'WRITE_APPEND'
# Start the query
job = self.client.query(query, job_config=job_config)
# Wait for the query to finish
job.result()
Create a table from query results in Google BigQuery. Assuming you are using Jupyter Notebook with Python 3 going to explain the following steps:
How to create a new dataset on BQ (to save the results)
How to run a query and save the results in a new dataset in table format on BQ
Create a new DataSet on BQ: my_dataset
bigquery_client = bigquery.Client() #Create a BigQuery service object
dataset_id = 'my_dataset'
dataset_ref = bigquery_client.dataset(dataset_id) # Create a DatasetReference using a chosen dataset ID.
dataset = bigquery.Dataset(dataset_ref) # Construct a full Dataset object to send to the API.
dataset.location = 'US' # Specify the geographic location where the new dataset will reside. Remember this should be same location as that of source data set from where we are getting data to run a query
# Send the dataset to the API for creation. Raises google.api_core.exceptions.AlreadyExists if the Dataset already exists within the project.
dataset = bigquery_client.create_dataset(dataset) # API request
print('Dataset {} created.'.format(dataset.dataset_id))
Run a query on BQ using Python:
There are 2 types here:
Allowing Large Results
Query without mentioning large result etc.
I am taking the Public dataset here: bigquery-public-data:hacker_news & Table id: comments to run a query.
Allowing Large Results
DestinationTableName='table_id1' #Enter new table name you want to give
!bq query --allow_large_results --destination_table=project_id:my_dataset.$DestinationTableName 'SELECT * FROM [bigquery-public-data:hacker_news.comments]'
This query will allow large query results if required.
Without mentioning --allow_large_results:
DestinationTableName='table_id2' #Enter new table name you want to give
!bq query destination_table=project_id:my_dataset.$DestinationTableName 'SELECT * FROM [bigquery-public-data:hacker_news.comments] LIMIT 100'
This will work for the query where the result is not going to cross the limit mentioned in Google BQ documentation.
Output:
A new dataset on BQ with the name my_dataset
Results of the queries saved as tables in my_dataset
Note:
These queries are Commands which you can run on the terminal(without ! in the beginning). But as we are using Python to run these commands/queries we are using !. This will enable us to use/run commands in the Python program as well.
Also please upvote the answer :). Thank You.