AutoML Vision metadata issue - python

I'm trying to use Raspi 3B+ and AutoML Vision to train a model for classification. However, when I try to create a dataset on Google Cloud Platform, it runs into a problem as follows:
Traceback (most recent call last):
File "/home/pi/.local/lib/python3.7/site-packages/google/api core/grpc helpers.py", line 57, in error remapped callable
return callable (*args, **kwargs)
File "/home/pi/.local/lib/python3.7/site-packages/grpc/ channel.py", line 826, in call __
return end_unary_response blocking({state, call, False, None)
File "/home/pi/.local/lib/python3.7/site-packages/grpc/ channel.py", line 729, in end unary response blocking
raise InactiveRpcError(state)
grpc. channel. InactiveRpcError: < InactiveRpcError of RPC that terminated with:
status = StatusCode. INVALID ARGUMENT
details = "List of found errors: 1.Field: parent; Message: Required field is invalid "
debug error string = "{"created":"G1604833054.567218256", "description":"Error received from peer ipv6: [2a00:1450:400a: 801: :200a] :443","file":"src/core/lib/surface/call.cc","file line":1056,"grpc_message":"List of found
errors:\tl.Field: parent; Nessage: Required field is invalid\t","grpc_ status":3}"
>
The creating-dataset code is
automl_client = automl.AutoMlClient()
project_location = automl_client.location_path(project_id, region_name)
bucket = storage_client.bucket(bucket_name)
# upload the images to google cloud bucket
upload_image_excel(bucket, bucket_name, dataset_name, status_list, csv_name)
# Create a new automl dataset programatically
classification_type = 'MULTICLASS'
dataset_metadata = {'classification_type': classification_type}
dataset_config = {
'display_name': dataset_name,
'image_classification_dataset_metadata': dataset_metadata
}
dataset = automl_client.create_dataset(project_location, dataset_config)
dataset_id = dataset.name.split('/')[-1]
dataset_full_id = automl_client.dataset_path(
project_id, region_name, dataset_id
)
# Read the *.csv file on Google Cloud
remote_csv_path = 'gs://{0}/{1}'.format(bucket_name, csv_name)
input_uris = remote_csv_path.split(',')
input_config = {'gcs_source': {'input_uris': input_uris}}
response = automl_client.import_data(dataset_full_id, input_config)
Does anyone know what's happening here?

Which region are you using? Be aware that for this feature, currently project resources must be in the us-central1 region to use this API [1].
The error promting is an INVALID ARGUMENT therefore I do not think the above mentioned is the issue. Looking at the GCP documentation on Creating a dataset [1] I see your code differs from what is done on that sample. The metadata and the configuration is set in a different way. Could you please try to recreate it using the same format as in the sample shared? I believe this should resolve the issue being experienced.
Here you have a code example:
from google.cloud import automl
# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "your_datasets_display_name"
client = automl.AutoMlClient()
# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Specify the classification type
# Types:
# MultiLabel: Multiple labels are allowed for one example.
# MultiClass: At most one label is allowed per example.
# https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#classificationtype
metadata = automl.ImageClassificationDatasetMetadata(
classification_type=automl.ClassificationType.MULTILABEL
)
dataset = automl.Dataset(
display_name=display_name,
image_classification_dataset_metadata=metadata,
)
# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)
created_dataset = response.result()
# Display the dataset information
print("Dataset name: {}".format(created_dataset.name))
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))
[1] https://cloud.google.com/vision/automl/docs/create-datasets#automl_vision_classification_create_dataset-python

Related

Google Cloud Analyze Sentiment in JupyterLab with Python

I am using Google Cloud / JupyterLab /Python
I'm trying to run a sample sentiment analysis, following the guide here
However, on running the example, I get this error:
AttributeError: 'SpeechClient' object has no attribute
'analyze_sentiment'
Below is the code I'm trying:
def sample_analyze_sentiment (gcs_content_uri):
gcs_content_uri = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = "en" document = {
"gcs_content_uri":'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt',
"type": 'enums.Document.Type.PLAIN_TEXT', "language": 'en'
}
response = client.analyze_sentiment(document,
encoding_type=encoding_type)
I had no problem generating the transcript using Speech to Text but no success getting a document sentiment analysis!?
I had no problem to perform analyze_sentiment following the documentation example.
I have some issues about your code. To me it should be
from google.cloud import language_v1
from google.cloud.language import enums
from google.cloud.language import types
def sample_analyze_sentiment(path):
#path = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
# if path is sent through the function it does not need to be specified inside it
# you can always set path = "default-path" when defining the function
client = language_v1.LanguageServiceClient()
document = types.Document(
gcs_content_uri = path,
type = enums.Document.Type.PLAIN_TEXT,
language = 'en',
)
response = client.analyze_sentiment(document)
return response
Therefore, I have tried the previous code with a path of my own to a text file inside a bucket in Google Cloud Storage.
response = sample_analyze_sentiment("<my-path>")
sentiment = response.document_sentiment
print(sentiment.score)
print(sentiment.magnitude)
I've got a successful run with sentiment score -0.5 and magnitude 1.5. I performed the run in JupyterLab with python3 which I assume is the set up you have.

How to call Azure Cognitive Services API?

I've created and trained an image classification model using Azure Custom Vision (Cognitive Services) and published the model with API.
Now, I've written a simple code in Python which takes an image from given URL and calls the API. However, I'm still getting this error even though the image surely exists:
with open(URL, "rb") as image_contents: FileNotFoundError: [Errno 2]
No such file or directory:
'https://upload.wikimedia.org/wikipedia/commons/5/55/Dalailama1_20121014_4639.jpg'
The code is as below:
from azure.cognitiveservices.vision.customvision.prediction import CustomVisionPredictionClient
ENDPOINT = "https://westeurope.api.cognitive.microsoft.com/"
PROJECT_ID = "bbed3f99-4199-4a17-81f2-df83f0659be3"
# Replace with a valid key
prediction_key = "<my prediction key>"
prediction_resource_id = "/subscriptions/97c4e143-9c0c-4f1e-b880-15492e327dd1/resourceGroups/WestEurope/providers/Microsoft.CognitiveServices/accounts/HappyAI"
publish_iteration_name = "Iteration5"
# Classify image
URL = "https://upload.wikimedia.org/wikipedia/commons/5/55/Dalailama1_20121014_4639.jpg"
# Now there is a trained endpoint that can be used to make a prediction
predictor = CustomVisionPredictionClient(prediction_key, endpoint=ENDPOINT)
with open(URL, "rb") as image_contents:
results = predictor.classify_image(
PROJECT_ID, publish_iteration_name, image_contents.read())
# Display the results.
for prediction in results.predictions:
print("\t" + prediction.tag_name +
": {0:.2f}%".format(prediction.probability * 100))
Help would be appreciated!
Thanks in advance!
There are two ways to give an image to the Cognitive Service. You are mixing both ;)
1) Provide a URL to an image that is accessible over the internet. You do this by sending a JSON to the service:
{"url":"https://sample.com/myimage.png"}
2) Upload the image as binary in the POST request.
Source: https://learn.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/use-prediction-api#get-the-url-and-prediction-key
Your issue is that you are trying to use open() for method 2. However, this does not work with remote files in Python. If you want to do this (instead of method 1), use for example urllib2.urlopen like this.

Disabling Billing on Google Cloud Using Google Cloud Function (Keyerror: 'data')

I am attempting to write a Google Cloud Function to set caps to disable usage above a certain limit. I followed the instructions here: https://cloud.google.com/billing/docs/how-to/notify#cap_disable_billing_to_stop_usage.
This is what my cloud function looks like (I am just copying and pasting from the Google Cloud docs page linked above):
import base64
import json
import os
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
PROJECT_ID = os.getenv('GCP_PROJECT')
PROJECT_NAME = f'projects/{PROJECT_ID}'
def stop_billing(data, context):
pubsub_data = base64.b64decode(data['data']).decode('utf-8')
pubsub_json = json.loads(pubsub_data)
cost_amount = pubsub_json['costAmount']
budget_amount = pubsub_json['budgetAmount']
if cost_amount <= budget_amount:
print(f'No action necessary. (Current cost: {cost_amount})')
return
billing = discovery.build(
'cloudbilling',
'v1',
cache_discovery=False,
credentials=GoogleCredentials.get_application_default()
)
projects = billing.projects()
if __is_billing_enabled(PROJECT_NAME, projects):
print(__disable_billing_for_project(PROJECT_NAME, projects))
else:
print('Billing already disabled')
def __is_billing_enabled(project_name, projects):
"""
Determine whether billing is enabled for a project
#param {string} project_name Name of project to check if billing is enabled
#return {bool} Whether project has billing enabled or not
"""
res = projects.getBillingInfo(name=project_name).execute()
return res['billingEnabled']
def __disable_billing_for_project(project_name, projects):
"""
Disable billing for a project by removing its billing account
#param {string} project_name Name of project disable billing on
#return {string} Text containing response from disabling billing
"""
body = {'billingAccountName': ''} # Disable billing
res = projects.updateBillingInfo(name=project_name, body=body).execute()
print(f'Billing disabled: {json.dumps(res)}')
Also attaching screenshot of what it looks like on Google Cloud Function UI:
I'm also attaching a screenshot to show that I copied and pasted the relevant things to the requirements.txt file as well.
But when I go to test the code, it gives me an error:
Expand all | Collapse all{
insertId: "000000-69dce50a-e079-45ed-b949-a241c97fdfe4"
labels: {…}
logName: "projects/stanford-cs-231n/logs/cloudfunctions.googleapis.com%2Fcloud-functions"
receiveTimestamp: "2020-02-06T16:24:26.800908134Z"
resource: {…}
severity: "ERROR"
textPayload: "Traceback (most recent call last):
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 383, in run_background_function
_function_handler.invoke_user_function(event_object)
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 217, in invoke_user_function
return call_user_function(request_or_event)
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 214, in call_user_function
event_context.Context(**request_or_event.context))
File "/user_code/main.py", line 9, in stop_billing
pubsub_data = base64.b64decode(data['data']).decode('utf-8')
KeyError: 'data'
"
timestamp: "2020-02-06T16:24:25.411Z"
trace: "projects/stanford-cs-231n/traces/8e106d5ab629141d5d91b6b68fb30c82"
}
Any idea why?
Relevant Stack Overflow Post: https://stackoverflow.com/a/58673874/3507127
There seems to be an error in the code Google provided. I got it working when I changed the stop_billing function:
def stop_billing(data, context):
if 'data' in data.keys():
pubsub_data = base64.b64decode(data['data']).decode('utf-8')
pubsub_json = json.loads(pubsub_data)
cost_amount = pubsub_json['costAmount']
budget_amount = pubsub_json['budgetAmount']
else:
cost_amount = data['costAmount']
budget_amount = data['budgetAmount']
if cost_amount <= budget_amount:
print(f'No action necessary. (Current cost: {cost_amount})')
return
if PROJECT_ID is None:
print('No project specified with environment variable')
return
billing = discovery.build('cloudbilling', 'v1', cache_discovery=False, )
projects = billing.projects()
billing_enabled = __is_billing_enabled(PROJECT_NAME, projects)
if billing_enabled:
__disable_billing_for_project(PROJECT_NAME, projects)
else:
print('Billing already disabled')
The problem is that the pub/sub message provides input as a json message with a 'data' entry that is base64 encoded. In the testing functionality you provide the json entry without a 'data' key and without encoding it. This is checked for in the function that I rewrote above.

Uploading a Video to Azure Media Services with Python SDKs

I am currently looking for a way to upload a video to Azure Media Services (AMS v3) via Python SDKs. I have followed its instruction, and am able to connect to AMS successfully.
Example
credentials = AdalAuthentication(
context.acquire_token_with_client_credentials,
RESOURCE,
CLIENT,
KEY)
client = AzureMediaServices(credentials, SUBSCRIPTION_ID) # Successful
I also successfully get all the videos' details uploaded via its portal
for data in client.assets.list(RESOUCE_GROUP_NAME, ACCOUNT_NAME).get(0):
print(f'Asset_name: {data.name}, file_name: {data.description}')
# Asset_name: 4f904060-d15c-4880-8c5a-xxxxxxxx, file_name: 夢想全紀錄.mp4
# Asset_name: 8f2e5e36-d043-4182-9634-xxxxxxxx, file_name: an552Qb_460svvp9.webm
# Asset_name: aef495c1-a3dd-49bb-8e3e-xxxxxxxx, file_name: world_war_2.webm
# Asset_name: b53d8152-6ecd-41a2-a59e-xxxxxxxx, file_name: an552Qb_460svvp9.webm - Media Encoder Standard encoded
However, when I tried to use the following method; it failed. Since I have no idea what to parse as parameters - Link to Python SDKs
create_or_update(resource_group_name, account_name, asset_name,
parameters, custom_headers=None, raw=False, **operation_config)
Therefore, I would like to ask questions as follows (everything is done via Python SDKs):
What kind of parameters does it expect?
Can a video be uploaded directly to AMS or it should be uploaded to Blob Storage first?
Should an Asset contain only one video or multiple files are fine?
The documentation for the REST version of that method is at https://learn.microsoft.com/en-us/rest/api/media/assets/createorupdate. This is effectively the same as the Python parameters.
Videos are stored in Azure Storage for Media Services. This is true for input assets, the assets that are encoded, and any streamed content. It all is in Storage but accessed by Media Services. You do need to create an asset in Media Services which creates the Storage container. Once the Storage container exists you upload via the Storage APIs to that Media Services created container.
Technically multiple files are fine, but there are a number of issues with doing that that you may not expect. I'd recommend using 1 input video = 1 Media Services asset. On the encoding output side there will be more than one file in the asset. Encoding output contains one or more videos, manifests, and metadata files.
I have found my method to work around using Python SDKs and REST; however, I am not quite sure it's proper.
Log-In to Azure Media Services and Blob Storage via Python packages
import adal
from msrestazure.azure_active_directory import AdalAuthentication
from msrestazure.azure_cloud import AZURE_PUBLIC_CLOUD
from azure.mgmt.media import AzureMediaServices
from azure.mgmt.media.models import MediaService
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
Create Assets for an original file and an encoded one by parsing these parameters. Example of the original file Asset creation.
asset_name = 'asset-myvideo'
asset_properties = {
'properties': {
'description': 'Original File Description',
'storageAccountName': "storage-account-name"
}
}
client.assets.create_or_update(RESOUCE_GROUP_NAME, ACCOUNT_NAME, asset_name, asset_properties)
Upload a video to the Blob Storage derived from the created original asset
current_container = [data.container for data in client.assets.list(RESOUCE_GROUP_NAME, ACCOUNT_NAME).get(0) if data.name == asset_name][0] # Get Blob Storage location
file_name = "myvideo.mp4"
blob_client = blob_service_client.get_blob_client(container=current_container, blob=file_name)
with open('original_video.mp4', 'rb') as data:
blob_client.upload_blob(data)
print(f'Video uploaded to {current_container}')
And after that, I do Transform, Job, and Streaming Locator to get the video Streaming Link successfully.
I was able to get this to work with the newer python SDK. The python documentation is mostly missing, so I constructed this mainly from the python SDK source code and the C# examples.
azure-storage-blob==12.3.1
azure-mgmt-media==2.1.0
azure-mgmt-resource==9.0.0
adal~=1.2.2
msrestazure~=0.6.3
0) Import a lot of stuff
from azure.mgmt.media.models import Asset, Transform, Job,
BuiltInStandardEncoderPreset, TransformOutput, \
JobInputAsset, JobOutputAsset, AssetContainerSas, AssetContainerPermission
import adal
from msrestazure.azure_active_directory import AdalAuthentication
from msrestazure.azure_cloud import AZURE_PUBLIC_CLOUD
from azure.mgmt.media import AzureMediaServices
from azure.storage.blob import BlobServiceClient, ContainerClient
import datetime as dt
import time
LOGIN_ENDPOINT = AZURE_PUBLIC_CLOUD.endpoints.active_directory
RESOURCE = AZURE_PUBLIC_CLOUD.endpoints.active_directory_resource_id
# AzureSettings is a custom NamedTuple
1) Log in to AMS:
def get_ams_client(settings: AzureSettings) -> AzureMediaServices:
context = adal.AuthenticationContext(LOGIN_ENDPOINT + '/' +
settings.AZURE_MEDIA_TENANT_ID)
credentials = AdalAuthentication(
context.acquire_token_with_client_credentials,
RESOURCE,
settings.AZURE_MEDIA_CLIENT_ID,
settings.AZURE_MEDIA_SECRET
)
return AzureMediaServices(credentials, settings.AZURE_SUBSCRIPTION_ID)
2) Create an input and output asset
input_asset = create_or_update_asset(
input_asset_name, "My Input Asset", client, azure_settings)
input_asset = create_or_update_asset(
output_asset_name, "My Output Asset", client, azure_settings)
3) Get the Container Name. (most documentation refers to BlockBlobService, which is seems to have been removed from the SDK)
def get_container_name(client: AzureMediaServices, asset_name: str, settings: AzureSettings):
expiry_time = dt.datetime.now(dt.timezone.utc) + dt.timedelta(hours=4)
container_list: AssetContainerSas = client.assets.list_container_sas(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
asset_name=asset_name,
permissions = AssetContainerPermission.read_write,
expiry_time=expiry_time
)
sas_uri: str = container_list.asset_container_sas_urls[0]
container_client: ContainerClient = ContainerClient.from_container_url(sas_uri)
return container_client.container_name
4) Upload a file the the input asset container:
def upload_file_to_asset_container(
container: str, local_file, uploaded_file_name, settings: AzureSettings):
blob_service_client = BlobServiceClient.from_connection_string(settings.AZURE_MEDIA_STORAGE_CONNECTION_STRING))
blob_client = blob_service_client.get_blob_client(container=container, blob=uploaded_file_name)
with open(local_file, 'rb') as data:
blob_client.upload_blob(data)
5) Create a transform (in my case, using the adaptive streaming preset):
def get_or_create_transform(
client: AzureMediaServices,
transform_name: str,
settings: AzureSettings):
transform_output = TransformOutput(preset=BuiltInStandardEncoderPreset(preset_name="AdaptiveStreaming"))
transform: Transform = client.transforms.create_or_update(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
transform_name=transform_name,
outputs=[transform_output]
)
return transform
5) Submit the Job
def submit_job(
client: AzureMediaServices,
settings: AzureSettings,
input_asset: Asset,
output_asset: Asset,
transform_name: str,
correlation_data: dict) -> Job:
job_input = JobInputAsset(asset_name=input_asset.name)
job_outputs = [JobOutputAsset(asset_name=output_asset.name)]
job: Job = client.jobs.create(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
job_name=f"test_job_{UNIQUENESS}",
transform_name=transform_name,
parameters=Job(input=job_input,
outputs=job_outputs,
correlation_data=correlation_data)
)
return job
6) Then I get the URLs after the Event Grid has told me the job is done:
# side-effect warning: this starts the streaming endpoint $$$
def get_urls(client: AzureMediaServices, output_asset_name: str
locator_name: str):
try:
locator: StreamingLocator = client.streaming_locators.create(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
streaming_locator_name=locator_name,
parameters=StreamingLocator(
asset_name=output_asset_name,
streaming_policy_name="Predefined_ClearStreamingOnly"
)
)
except Exception as ex:
print("ignoring existing")
streaming_endpoint: StreamingEndpoint = client.streaming_endpoints.get(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
streaming_endpoint_name="default")
if streaming_endpoint:
if streaming_endpoint.resource_state != "Running":
client.streaming_endpoints.start(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
streaming_endpoint_name="default"
)
paths = client.streaming_locators.list_paths(
resource_group_name=settings.AZURE_MEDIA_RESOURCE_GROUP_NAME,
account_name=settings.AZURE_MEDIA_ACCOUNT_NAME,
streaming_locator_name=locator_name
)
return [f"https://{streaming_endpoint.host_name}{path.paths[0]}" for path in paths.streaming_paths]

Cloud Natural Language API Python script error (Client object has no attribute create_rows)

I was trying to create a script that feeds articles through the classification tool of the Natural Language API and I found a tutorial that does exactly that. I was following this simple tutorial to get an intro into Google Cloud and the Natural Language API.
The end result is supposed to be a script that sends a bunch of new articles from the Google Cloud Storage to the Natural Language API to classify the articles and then save the whole thing into a table created in BigQuery.
I was following the example fine, but when running the final script I get the following error:
Traceback (most recent call last):
File "classify-text.py", line 39, in <module>
errors = bq_client.create_rows(table, rows_for_bq)
AttributeError: 'Client' object has no attribute 'create_rows'
The full script is:
from google.cloud import storage, language, bigquery
# Set up our GCS, NL, and BigQuery clients
storage_client = storage.Client()
nl_client = language.LanguageServiceClient()
# TODO: replace YOUR_PROJECT with your project name below
bq_client = bigquery.Client(project='Your_Project')
dataset_ref = bq_client.dataset('news_classification')
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table('article_data')
table = bq_client.get_table(table_ref)
# Send article text to the NL API's classifyText method
def classify_text(article):
response = nl_client.classify_text(
document=language.types.Document(
content=article,
type=language.enums.Document.Type.PLAIN_TEXT
)
)
return response
rows_for_bq = []
files = storage_client.bucket('text-classification-codelab').list_blobs()
print("Got article files from GCS, sending them to the NL API (this will take ~2 minutes)...")
# Send files to the NL API and save the result to send to BigQuery
for file in files:
if file.name.endswith('txt'):
article_text = file.download_as_string()
nl_response = classify_text(article_text)
if len(nl_response.categories) > 0:
rows_for_bq.append((article_text, nl_response.categories[0].name, nl_response.categories[0].confidence))
print("Writing NL API article data to BigQuery...")
# Write article text + category data to BQ
errors = bq_client.create_rows(table, rows_for_bq)
assert errors == []
You are using deprecated methods; these methods were marked as obsolete in version 0.29, and removed altogether in version 1.0.0.
You should use client.insert_rows() instead; the method accepts the same arguments:
errors = bq_client.insert_rows(table, rows_for_bq)

Categories