How do I update column description in BigQuery table using python script? - python

I can use
SchemaField(f"{field_name}", f"{field_type}", mode="NULLABLE", description=...) while making a new table.
But I want to update the description of the column of the already uploaded table.

Unfortunately, we don’t have such a mechanism available yet to update a column description of the table through the client library. As a workaround, you can try the following available options to update your table column level description:
Option 1: Using the following ALTER TABLE ALTER COLUMN SET OPTIONS data definition language (DDL) statement:
ALTER TABLE `projectID.datasetID.tableID`
ALTER COLUMN Name
SET OPTIONS (
description="Country Name"
);
Refer to this doc for more information about the ALTER COLUMN SET OPTIONS statement.
Option 2: Using the bq command-line tool's bq update command:
Step 1: Get the JSON schema by running the following bq show command:
bq show --format=prettyjson projectID:datasetID.tableID > table.json
Step 2: Then copy the schema from the table.json to schema.json file.
Note: Don’t copy the entire data from the ‘table.json’ file, copy only the schema, it will look something like below:
[
{
"description": "Country Name",
"mode": "NULLABLE",
"name": "Name",
"type": "STRING"
}
]
Step 3: In the ‘schema.json’ file, modify the description name as you like. Then, run the following bq update command to update a table column description.
bq update projectID:datasetID.tableID schema.json
Refer to this doc for more information about bq update command.
Option 3: Calling the tables.patch API method:
Refer to this doc for more information about tables.patch API method.
As per your requirement, I took the following Python code from this medium article and not from the Google Cloud official docs. So Google Cloud will not provide any support for this code.
Step 1: Add the schema in the ‘schema.py’ file and modify the column description name as per your requirement:
#Add field schema
TableObject = {
"tableReference": {
"projectId": "projectID",
"datasetId": "datasetID",
"tableId": "tableID",
},
"schema": {
"fields": [
{
"description": "Country Name",
"mode": "NULLABLE",
"name": "Name",
"type": "STRING"
}
],
},
}
Step 2: Run the following code to get the expected result:
Note: keep that schema.py and following code file in the same directory.
#!/usr/bin/env python
#https://developers.google.com/api-client-library/python/
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
from schema import TableObject
# [START Table Creator]
def PatchTable(bigquery):
tables = bigquery.tables()
tables.patch(
projectId=TableObject['tableReference']['projectId'],\
datasetId=TableObject['tableReference']['datasetId'],\
tableId=TableObject['tableReference']['tableId'], \
body=TableObject).execute()
print ("Table Patched")
# [END]
def main():
#To get credentials
credentials = GoogleCredentials.get_application_default()
# Construct the service object for interacting with the BigQuery API.
bigquery = discovery.build('bigquery', 'v2', credentials=credentials)
PatchTable(bigquery)
if __name__ == '__main__':
main()
print ("BigQuery Table Patch")

Related

How to search for a value in two different type of field or index or heading of mongodb using python?

I am new to any kind of programming. This is an issue I encountered when using mongodb. Below is the collection structure of the document I imported from two different csv files.
{
"_id": {
"$oid": "61bc4217ed94f9d5fe6a350c"
},
"Telephone Number": "8429950810",
"Date of Birth": "01/01/1945"
}
{
"_id": {
"$oid": "61bc4217ed94f9d5fe6a350c"
},
"Telephone Number": "8129437810",
"Date of Birth": "01/01/1998"
}
{
"_id": {
"$oid": "61bd98d36cc90a9109ab253c"
},
"TELEPHONE_NUMBER": "9767022829",
"DATE_OF_BIRTH": "16-Jun-98"
}
{
"_id": {
"$oid": "61bd98d36cc9090109ab253c"
},
"TELEPHONE_NUMBER": "9567085829",
"DATE_OF_BIRTH": "16-Jan-91"
}
The first two entries are from a csv and the next two entries from another csv file. Now I am creating a user interface where users can search for a telephone number. How to write the query to search the telephone number value in both the index ( Telephone Number and TELEPHONE_NUMBER) using find() in the above case. If not possible is there a way to change the index's to a desired format while importing csv to db. Or is there a way where I create two different collection and then import csv to each collections and then perform a collective search of both the collections. Or can we create a compound index and then search the compound index instead. I am using pymongo for all the operations.
Thankyou.
You can use or query if different key is used to store same type of data.
yourmongocoll.find({"$or":[ {"Telephone Number":"8429950810"}, {"TELEPHONE_NUMBER":8429950810}]})
Assuming you have your connection string to connect via pymongo. Then the following is an example of how to query for the telephone number "8429950810":
from pymongo import MongoClient
client = MongoClient("connection_string")
db = client["db"]
collection = db["collection"]
results = collection.find({"Telephone Number":"8429950810"})
Please note this will return as type cursor, if you would like your documents in a list consider wrapping the query in list() like so:
results = list(collection.find({"Telephone Number":"8429950810"}))

How to set schema in Python to use a json file on BigQuery?

I am looking for a way how the schema is set with a json file in Python on Big Query. The following document says I can set it with Schema field one by one, but I want to find out more efficient way.
https://cloud.google.com/bigquery/docs/schemas
Autodetect would be skeptical to make it in this case.
I will appreciate it if you helped me.
You can create a JSON file with columns/data types and use the below code to build BigQuery Schema.
JSON File (schema.json):
[
{
"name": "emp_id",
"type": "INTEGER"
},
{
"name": "emp_name",
"type": "STRING"
}
]
Python Code:
import json
from google.cloud import bigquery
bigquerySchema = []
with open('schema.json') as f:
bigqueryColumns = json.load(f)
for col in bigqueryColumns:
bigquerySchema.append(bigquery.SchemaField(col['name'], col['type']))
print(bigquerySchema)
Soumendra Mishra is already helpful, but here is a bit more general version that can optionally accept addition fields such as mode or description:
JSON File (schema.json):
[
{
"name": "emp_id",
"type": "INTEGER",
"mode": "REQUIRED"
},
{
"name": "emp_name",
"type": "STRING",
"description": "Description of this field"
}
]
Python Code:
import json
from google.cloud import bigquery
table_schema = []
# open JSON file read only
with open('schema.json', 'r') as f:
table_schema = json.load(f)
for entry in table_schema:
# rename key; bigquery.SchemaField expects `field` to be called `field_type`
entry["field_type"] = entry.pop("type")
# ** effectively provides data as argument:value pairs (e.g. name="emp_id")
table_schema.append(bigquery.SchemaField(**entry))

Google Sheet API - Get Data Validation

I'm trying to set data validation rules for my current spreadsheet. One thing that would help me would to be able to view the rules in JSON from data validation rules I have already set (In the spreadsheet UI or within an API call).
Example.
request = {
"requests": [
{
"setDataValidation": {
"range": {
"sheetId": SHEET_ID,
"startRowIndex": 1,
"startColumnIndex": 0,
"endColumnIndex":1
},
"rule": {
"condition": {
"type": "BOOLEAN"},
"inputMessage": "Value MUST BE BOOLEAN",
"strict": "True"
}
}
}
]
}
service.spreadsheets().batchUpdate(spreadsheetId=SPREADSHEET_ID body=request).execute()
But what API calls do I use to see the Data Validation on these range of cells? This is useful for if I set the Data Validation rules in the spreadsheet and I want to see how google interprets them. I'm having a lot of trouble setting complex Datavalidations through the API.
Thank you
To obtain only the "Data Validation" components of a given spreadsheet, you simply request the appropriate field in the call to spreadsheets.get:
service = get_authed_sheets_service_somehow()
params = {
spreadsheetId: 'your ssid',
#range: 'some range',
fields: 'sheets(data/rowData/values/dataValidation,properties(sheetId,title))' }
request = service.spreadsheets().get(**params)
response = request.execute()
# Example print code (not tested :p )
for sheet in response['sheets']:
for range in sheet['data']:
for r, row in enumerate(range['rowData']):
for c, col in enumerate(row['values']):
if 'dataValidation' in col:
# print "Sheet1!R1C1" & associated data validation object.
# Assumes whole grid was requested (add appropriate indices if not).
print(f'\'{sheet["properties"]["title"]}\'!R{r}C{c}', col['dataValidation'])
By specifying fields, includeGridData is not required to obtain data on a per-cell basis from the range you requested. By not supplying a range, we target the entire file. This particular fields specification requests the rowData.values.dataValidation object and the sheetId and title of the properties object, for every sheet in the spreadsheet.
You can use the Google APIs Explorer to interactively determine the appropriate valid "fields" specification, and additionally examine the response:
https://developers.google.com/apis-explorer/#p/sheets/v4/sheets.spreadsheets.get
For more about how "fields" specifiers work, read the documentation: https://developers.google.com/sheets/api/guides/concepts#partial_responses
(For certain write requests, field specifications are not optional so it is in your best interest to determine how to use them effectively.)
I think I found the answer. IncludeGridData=True in your spreadsheet().get
from pprint import pprint
response = service.spreadsheets().get(
spreadsheetId=SPREADSHEETID, fields='*',
ranges='InputWorking!A2:A',includeGridData=True).execute()
You get a monster datastructure back. So to look at the very first data in your range you could do.
pprint(response['sheets'][0]['data'][0]['rowData'][0]['values'][0]['dataValidation'])
{'condition': {'type': 'BOOLEAN'},
'inputMessage': 'Value MUST BE BOOLEAN',
'strict': True}

How do I create a partitioned collection in Cosmos DB with pydocumentdb?

The pydocumentdb.document_client.DocumentClient object has a CreateCollection() method, defined here.
When creating a collection with this method, one needs to specify the database link (already known), the collection (I don't know how to reference it if it hasn't been made) and options.
Parameters that I would like to control when creating the collection are:
name of collection
type of collection (fixed size vs. partitioned)
partition keys
RU value
Indexing policy (or at least be able to create a default template somewhere and automatically copy it to the newly created one)
Enums for some of these parameters seem to be defined here, but I don't see any potentially useful HTTP headers in http_constants.py, and I don't see where RUs come in to play or where a cohesive "Collection" object would be passed as a parameter.
You could refer to the source sample code from here and the rest api from here.
import pydocumentdb;
import pydocumentdb.errors as errors
import pydocumentdb.document_client as document_client
config = {
'ENDPOINT': 'https://***.documents.azure.com:443/',
'MASTERKEY': '***'
};
# Initialize the Python DocumentDB client
client = document_client.DocumentClient(config['ENDPOINT'], {'masterKey': config['MASTERKEY']})
databaseLink = "dbs/db"
coll = {
"id": "testCreate",
"indexingPolicy": {
"indexingMode": "lazy",
"automatic": False
},
"partitionKey": {
"paths": [
"/AccountNumber"
],
"kind": "Hash"
}
}
collection_options = { 'offerThroughput': 400 }
client.CreateCollection(databaseLink , coll, collection_options)
Hope it helps you.

Access GlobalParameters in Azure ML Python script

How can one access the global parameters ("GlobalParameters") sent from a web service in a Python script on Azure ML?
I tried:
if 'GlobalParameters' in globals():
myparam = GlobalParameters['myparam']
but with no success.
EDIT: Example
In my case, I'm sending a sound file over the web service (as a list of samples). I would also like to send a sample rate and the number of bits per sample. I've successfully configured the web service (I think) to take these parameters, so the GlobalParameters now look like:
"GlobalParameters": {
"sampleRate": "44100",
"bitsPerSample": "16",
}
However, I cannot access these variables from the Python script, neither as GlobalParameters["sampleRate"] nor as sampleRate. Is it possible? Where are they stored?
based on our understanding of your question, here may has a miss conception that Azure ML parameters are not “Global Parameters”, as a matter of fact they are just parameter substitution tied to a particular module. So in affect there are no global parameters that are accessible throughout the experiment you have mentioned. Such being the case, we think the experiment below accomplishes what you are asking for:
Please add an “Enter Data” module to the experiment and add Data in csv format. Then for the Data click the parameter to create a web service parameter. Add in the CSV data which will be substituted from data passed by the client application. I.e.
Please add an “Execute Python” module and hook up the “Enter Data” output to the “Execute Python” input1. Add the python code to take the dataframe1 and add it to a python list. Once you have it in a list you can use it anywhere in your python code.
Python code snippet
def azureml_main(dataframe1 = None, dataframe2 = None):
import pandas as pd
global_list = []
for g in dataframe1["Col3"]:
global_list.append(g)
df_global = pd.DataFrame(global_list)
print('Input pandas.DataFrame:\r\n\r\n{0}'.format(df_global))
return [df_global]
Once you publish your experiment, you can add in new values in the “Data”: “”, section below with the new values that you was substituted for the “Enter Data” values in the experiment.
data = {
"Inputs": {
"input1":
{
"ColumnNames": ["Col1", "Col2", "Col3"],
"Values": [ [ "0", "value", "0" ], [ "0", "value", "0" ], ]
}, },
"GlobalParameters": {
"Data": "1,sampleRate,44500\\n2,bitsPerSample,20",
}
}
Please feel free to let us know if this makes sense.
The GlobalParameters parameter can not be used in a Python script. It is used to override certain parameters in other modules.
If you, for example, take the 'Split Data' module, you'll find an option to turn a parameter into a web service parameter:
Once you click that, a new section appears titled "Web Service Parameters". There you can change the default parameter name to one of your choosing.
If you deploy your project as a web service, you can override that parameter by putting it in the GlobalParameters parameter:
"GlobalParameters": {
"myFraction": 0.7
}
I hope that clears things up a bit.
Although it is not possible to use GlobalParameters in the Python script (see my previous answer), you can however hack/abuse the second input of the Python script to pass in other parameters. In my example I call them metadata parameters.
To start, I added:
a Web service input module with name: "realdata" (for your real data off course)
a Web service input module with name: "metadata" (we will abuse this one to pass parameters to our Python).
a Web service output module with name: "computedMetadata"
Connect the modules as follows:
As you can see, I also added a real data set (Restaurant ratings) as wel as a dummy metadata csv (the Enter Data Manually) module.
In this manual data you will have to predefine your metadata parameters as if they were a csv with a header and a only a single row to hold the data:
In the example both sampleRate and bitsPerSample are set to 0.
My Python scripts then takes in that fake csv as metadata, does some dummy calculation with it and returns it as column name:
import pandas as pd
def azureml_main(realdata = None, metadata = None):
theSum = metadata["sampleRate"][0] + metadata["bitsPerSample"][0]
outputString = "The sum of the sampleRate and the bitsPerSecond is " + str(theSum)
print(outputString)
return pd.DataFrame([outputString])
I then published this as a web service and called it using Node.js like this:
httpreq.post('https://ussouthcentral.services.azureml.net/workspaces/xxx/services/xxx', {
headers: {
Authorization: 'Bearer xxx'
},
json: {
"Inputs": {
"realdata": {
"ColumnNames": [
"userID",
"placeID",
"rating"
],
"Values": [
[
"100",
"101",
"102"
],
[
"200",
"201",
"202"
]
]
},
"metadata": {
"ColumnNames": [
"sampleRate",
"bitsPerSample"
],
"Values": [
[
44100,
16
]
]
}
},
"GlobalParameters": {}
}
}, (err, res) => {
if(err) return console.log(err);
console.log(JSON.parse(res.body));
});
The output was as expected:
{ Results:
{ computedMetadata:
{ type: 'table',
value:
{ ColumnNames: [ '0' ],
ColumnTypes: [ 'String' ],
Values:
[ [ 'The sum of the sampleRate and the bitsPerSecond is 44116' ] ] } } } }
Good luck!

Categories