Google Cloud Analyze Sentiment in JupyterLab with Python

Google Cloud Analyze Sentiment in JupyterLab with Python - python

I am using Google Cloud / JupyterLab /Python
I'm trying to run a sample sentiment analysis, following the guide here
However, on running the example, I get this error:
AttributeError: 'SpeechClient' object has no attribute
'analyze_sentiment'
Below is the code I'm trying:
def sample_analyze_sentiment (gcs_content_uri):
gcs_content_uri = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = "en" document = {
"gcs_content_uri":'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt',
"type": 'enums.Document.Type.PLAIN_TEXT', "language": 'en'
}
response = client.analyze_sentiment(document,
encoding_type=encoding_type)
I had no problem generating the transcript using Speech to Text but no success getting a document sentiment analysis!?

I had no problem to perform analyze_sentiment following the documentation example.
I have some issues about your code. To me it should be
from google.cloud import language_v1
from google.cloud.language import enums
from google.cloud.language import types
def sample_analyze_sentiment(path):
#path = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
# if path is sent through the function it does not need to be specified inside it
# you can always set path = "default-path" when defining the function
client = language_v1.LanguageServiceClient()
document = types.Document(
gcs_content_uri = path,
type = enums.Document.Type.PLAIN_TEXT,
language = 'en',
)
response = client.analyze_sentiment(document)
return response
Therefore, I have tried the previous code with a path of my own to a text file inside a bucket in Google Cloud Storage.
response = sample_analyze_sentiment("<my-path>")
sentiment = response.document_sentiment
print(sentiment.score)
print(sentiment.magnitude)
I've got a successful run with sentiment score -0.5 and magnitude 1.5. I performed the run in JupyterLab with python3 which I assume is the set up you have.

Related

Deployment Error: Function deployment failed due to a health check failure on Google Cloud Function with Tweepy

I am trying to pull twitter streaming data in cloud function and essentially export the stream data into big query.
Currently, i have this code. The Entry Point is set to stream_twitter.
main.txt:
import os
import tweepy
import pandas as pd
import datalab.bigquery as bq
from google.cloud import bigquery
import os
import tweepy
import pandas as pd
import datalab.bigquery as bq
from google.cloud import bigquery
#access key
api_key = os.environ['API_KEY']
secret_key = os.environ['SECRET_KEY']
bearer_token = os.environ['BEARER_TOKEN']
def stream_twitter(event, context):
#authentication
auth = tweepy.Client(bearer_token = bearer_token)
api = tweepy.API(auth)
#create Stream Listener
class Listener(tweepy.StreamingClient):
#save list to dataframe
tweets = []
def on_tweet(self, tweet):
if tweet.referenced_tweets == None: #Original tweet not reply or retweet
self.tweets.append(tweet)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
stream_tweet = Listener(bearer_token)
#filtered Stream using rules
rule = tweepy.StreamRule("(covid OR covid19 OR coronavirus OR pandemic OR #covid19 OR #covid) lang:en")
stream_tweet.add_rules(rule, dry_run = True)
stream_tweet.filter(tweet_fields=["referenced_tweets"])
#insert into dataframe
columns = ["UserID", "Tweets"]
data = []
for tweet in stream_tweet.tweets:
data.append([tweet.id, tweet.text, ])
stream_df = pd.DataFrame(data, columns=columns)
## Insert time col - TimeStamp to give the time that data is pulled from API
stream_df.insert(0, 'TimeStamp', pd.to_datetime('now').replace(microsecond=0))
## Converting UTC Time to SGT(UTC+8hours)
stream_df.insert(1,'SGT_TimeStamp', '')
stream_df['SGT_TimeStamp'] = stream_df['TimeStamp'] + pd.Timedelta(hours=8)
## Define BQ dataset & table names
bigquery_dataset_name = 'streaming_dataset'
bigquery_table_name = 'streaming-table'
## Define BigQuery dataset & table
dataset = bq.Dataset(bigquery_dataset_name)
table = bq.Table(bigquery_dataset_name + '.' + bigquery_table_name)
if not table.exists():
# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_dataframe(stream_df)
table.create(schema = table_schema, overwrite = False)
# Write the DataFrame to a BigQuery table
table.insert_data(stream_df)
requirement.txt:
tweepy
pandas
google-cloud-bigquery
However, i keep getting a
"Deployment failure: Function deployment failed due to a health check failure. This usually indicates that your code was built successfully but failed during a test execution. Examine the logs to determine the cause. Try deploying again in a few minutes if it appears to be transient."
I can't seem to figure how to solve this error. Is there something wrong with my codes? Or is there something that i should have done? I test the streaming codes on Pycharm and was able to pull the data.
Would appreicate any help i can get. Thank you.
The logs to the function are this. (I am unfamiliar with Logs hence i shall include a screenshot.) Essentially, those were the 2 info and error i've been getting.

I managed to replicate your error message. All I did was add datalab==1.2.0 inside requirements.txt. Since you are importing the datalab library, you need to include the support package for it, which is the latest version of datalab.
Here's the reference that I used: Migrating from the datalab Python package.
See the requirements.txt file to view the versions of the libraries used for these code snippets.
Here's the screenshot of the logs:

Python Clarifai program stopped working inspite of no changes made

I have the following Python code to label images using Clarifai. It was a working code and had been usable for the past 6-8 months. However, for the last few days, I have been getting the error mentioned below. Note that I have not made any changes to the working version of the code for the error to creep in.
#python program to analyse an image and label it
'''
Dependencies:
pip install flask
pip install clarifai-grpc
pip install logging
'''
import json
from flask import Flask, render_template, request
import logging
import os
from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel
from clarifai_grpc.grpc.api import resources_pb2, service_pb2, service_pb2_grpc
from clarifai_grpc.grpc.api.status import status_pb2, status_code_pb2
channel = ClarifaiChannel.get_json_channel()
stub = service_pb2_grpc.V2Stub(channel)
# This will be used by every Clarifai endpoint call.
# The word 'Key' is required to precede the authentication key.
metadata = (('authorization', 'Key API_KEY_HERE'),)
webapp = Flask(__name__) #creating a web application for the current python file
#decorators
#webapp.route('/')
def index():
return render_template('index.html', len=0)
#webapp.route('/' , methods = ['POST'])
def search():
if request.form['searchByURL']:
url = request.form['searchByURL']
my_request = service_pb2.PostModelOutputsRequest(
# This is the model ID of a publicly available General model.
#You may use any other public or custom model ID.
model_id='aaa03c23b3724a16a56b629203edc62c',
inputs=[
resources_pb2.Input(data=resources_pb2.Data(image=resources_pb2.Image(url=url)))
])
response = stub.PostModelOutputs(my_request, metadata=metadata)
if response.status.code != status_code_pb2.SUCCESS:
message = ["You have reached the limit for today!"]
return render_template('/index.html' , len = 1, searchResults = message)
concepts = []
for concept in response.outputs[0].data.concepts:
concepts.append(concept.name)
concepts = concepts[0:10]
return render_template('/index.html' , len = len(concepts), searchResults = concepts )
elif request.files['searchByImage']:
file = request.files['searchByImage']
file.save(file.filename)
#IMAGE INPUT:
with open(file.filename, "rb") as f:
file_bytes = f.read()
post_model_outputs_response = stub.PostModelOutputs(
service_pb2.PostModelOutputsRequest(
model_id="aaa03c23b3724a16a56b629203edc62c",
version_id="aa7f35c01e0642fda5cf400f543e7c40", # This is optional. Defaults to the latest model version.
inputs=[
resources_pb2.Input(
data=resources_pb2.Data(
image=resources_pb2.Image(
base64=file_bytes
)
)
)
]
),
metadata=metadata
)
if post_model_outputs_response.status.code != status_code_pb2.SUCCESS:
message = ["You have reached the limit for today!"]
return render_template('/index.html' , len = 1, searchResults = message)
# Since we have one input, one output will exist here.
output = post_model_outputs_response.outputs[0]
os.remove(file.filename)
concepts = []
#Predicted concepts:
for concept in output.data.concepts:
concepts.append(concept.name)
concepts = concepts[0:10]
return render_template('/index.html' , len = len(concepts), searchResults = concepts )
else:
return render_template('/index.html' , len = 1, searchResults = ["No Image entered!"] )
#run the server
if __name__ == "__main__":
logging.basicConfig(filename = 'error.log' , level = logging.DEBUG, )
webapp.run(debug=True)
Error:
Exception has occurred: ImportError
dlopen(/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-darwin.so, 0x0002): tried: '/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/grpc/_cython/cygrpc.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
File "/Users/eshaangupta/Desktop/Python-Level-4/Image Analyser.py", line 15, in <module>
from clarifai_grpc.channel.clarifai_channel import ClarifaiChannel

It looks like you are trying to run this on a different architecture than you've been in the past. You've been running on x86 (looks like likely MacOS) and now have moved to an arm architecture. I'm guessing you've upgraded to an M1 chip macbook, although maybe you've moved over to a different ARM based chip.
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e'))
This is a problem with a file in grpc - specifically the library file cygrpc.cpython-310-darwin.so. I'd recommend removing gRPC and re-installing and see if that helps to resolve it.
Something like this might work:
python -m pip install --force-reinstall grpcio
(This is assuming python points to python3.10 in your system).
although I'm not sure how you've installed it so that is just a guess.

Azure Machine Learning Studio Designer Error: code_expired

I am trying to register a data set via the Azure Machine Learning Studio designer but keep getting an error. Here is my code, used in a "Execute Python Script" module:
import pandas as pd
from azureml.core.dataset import Dataset
from azureml.core import Workspace
def azureml_main(dataframe1 = None, dataframe2 = None):
ws = Workspace.get(name = <my_workspace_name>, subscription_id = <my_id>, resource_group = <my_RG>)
ds = Dataset.from_pandas_dataframe(dataframe1)
ds.register(workspace = ws,
name = "data set name",
description = "example description",
create_new_version = True)
return dataframe1,
But I get the following error in the Workspace.get line:
Authentication Exception: Unknown error occurred during authentication. Error detail: Unexpected polling state code_expired.
Since I am inside the workspace and in the designer, I do not usually need to do any kind of authentication (or even reference the workspace). Can anybody offer some direction? Thanks!

when you're inside a "Execute Python Script" module or PythonScriptStep, the authentication for fetching the workspace is already done for you (unless you're trying to authenticate to different Azure ML workspace.
from azureml.core import Run
run = Run.get_context()
ws = run.experiment.workspace
You should be able to use that ws object to register a Dataset.

AutoML Vision metadata issue

I'm trying to use Raspi 3B+ and AutoML Vision to train a model for classification. However, when I try to create a dataset on Google Cloud Platform, it runs into a problem as follows:
Traceback (most recent call last):
File "/home/pi/.local/lib/python3.7/site-packages/google/api core/grpc helpers.py", line 57, in error remapped callable
return callable (*args, **kwargs)
File "/home/pi/.local/lib/python3.7/site-packages/grpc/ channel.py", line 826, in call __
return end_unary_response blocking({state, call, False, None)
File "/home/pi/.local/lib/python3.7/site-packages/grpc/ channel.py", line 729, in end unary response blocking
raise InactiveRpcError(state)
grpc. channel. InactiveRpcError: < InactiveRpcError of RPC that terminated with:
status = StatusCode. INVALID ARGUMENT
details = "List of found errors: 1.Field: parent; Message: Required field is invalid "
debug error string = "{"created":"G1604833054.567218256", "description":"Error received from peer ipv6: [2a00:1450:400a: 801: :200a] :443","file":"src/core/lib/surface/call.cc","file line":1056,"grpc_message":"List of found
errors:\tl.Field: parent; Nessage: Required field is invalid\t","grpc_ status":3}"
>
The creating-dataset code is
automl_client = automl.AutoMlClient()
project_location = automl_client.location_path(project_id, region_name)
bucket = storage_client.bucket(bucket_name)
# upload the images to google cloud bucket
upload_image_excel(bucket, bucket_name, dataset_name, status_list, csv_name)
# Create a new automl dataset programatically
classification_type = 'MULTICLASS'
dataset_metadata = {'classification_type': classification_type}
dataset_config = {
'display_name': dataset_name,
'image_classification_dataset_metadata': dataset_metadata
}
dataset = automl_client.create_dataset(project_location, dataset_config)
dataset_id = dataset.name.split('/')[-1]
dataset_full_id = automl_client.dataset_path(
project_id, region_name, dataset_id
)
# Read the *.csv file on Google Cloud
remote_csv_path = 'gs://{0}/{1}'.format(bucket_name, csv_name)
input_uris = remote_csv_path.split(',')
input_config = {'gcs_source': {'input_uris': input_uris}}
response = automl_client.import_data(dataset_full_id, input_config)
Does anyone know what's happening here?

Which region are you using? Be aware that for this feature, currently project resources must be in the us-central1 region to use this API [1].
The error promting is an INVALID ARGUMENT therefore I do not think the above mentioned is the issue. Looking at the GCP documentation on Creating a dataset [1] I see your code differs from what is done on that sample. The metadata and the configuration is set in a different way. Could you please try to recreate it using the same format as in the sample shared? I believe this should resolve the issue being experienced.
Here you have a code example:
from google.cloud import automl
# TODO(developer): Uncomment and set the following variables
# project_id = "YOUR_PROJECT_ID"
# display_name = "your_datasets_display_name"
client = automl.AutoMlClient()
# A resource that represents Google Cloud Platform location.
project_location = f"projects/{project_id}/locations/us-central1"
# Specify the classification type
# Types:
# MultiLabel: Multiple labels are allowed for one example.
# MultiClass: At most one label is allowed per example.
# https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#classificationtype
metadata = automl.ImageClassificationDatasetMetadata(
classification_type=automl.ClassificationType.MULTILABEL
)
dataset = automl.Dataset(
display_name=display_name,
image_classification_dataset_metadata=metadata,
)
# Create a dataset with the dataset metadata in the region.
response = client.create_dataset(parent=project_location, dataset=dataset)
created_dataset = response.result()
# Display the dataset information
print("Dataset name: {}".format(created_dataset.name))
print("Dataset id: {}".format(created_dataset.name.split("/")[-1]))
[1] https://cloud.google.com/vision/automl/docs/create-datasets#automl_vision_classification_create_dataset-python

Cloud Natural Language API Python script error (Client object has no attribute create_rows)

I was trying to create a script that feeds articles through the classification tool of the Natural Language API and I found a tutorial that does exactly that. I was following this simple tutorial to get an intro into Google Cloud and the Natural Language API.
The end result is supposed to be a script that sends a bunch of new articles from the Google Cloud Storage to the Natural Language API to classify the articles and then save the whole thing into a table created in BigQuery.
I was following the example fine, but when running the final script I get the following error:
Traceback (most recent call last):
File "classify-text.py", line 39, in <module>
errors = bq_client.create_rows(table, rows_for_bq)
AttributeError: 'Client' object has no attribute 'create_rows'
The full script is:
from google.cloud import storage, language, bigquery
# Set up our GCS, NL, and BigQuery clients
storage_client = storage.Client()
nl_client = language.LanguageServiceClient()
# TODO: replace YOUR_PROJECT with your project name below
bq_client = bigquery.Client(project='Your_Project')
dataset_ref = bq_client.dataset('news_classification')
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table('article_data')
table = bq_client.get_table(table_ref)
# Send article text to the NL API's classifyText method
def classify_text(article):
response = nl_client.classify_text(
document=language.types.Document(
content=article,
type=language.enums.Document.Type.PLAIN_TEXT
)
)
return response
rows_for_bq = []
files = storage_client.bucket('text-classification-codelab').list_blobs()
print("Got article files from GCS, sending them to the NL API (this will take ~2 minutes)...")
# Send files to the NL API and save the result to send to BigQuery
for file in files:
if file.name.endswith('txt'):
article_text = file.download_as_string()
nl_response = classify_text(article_text)
if len(nl_response.categories) > 0:
rows_for_bq.append((article_text, nl_response.categories[0].name, nl_response.categories[0].confidence))
print("Writing NL API article data to BigQuery...")
# Write article text + category data to BQ
errors = bq_client.create_rows(table, rows_for_bq)
assert errors == []

You are using deprecated methods; these methods were marked as obsolete in version 0.29, and removed altogether in version 1.0.0.
You should use client.insert_rows() instead; the method accepts the same arguments:
errors = bq_client.insert_rows(table, rows_for_bq)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Google Cloud Analyze Sentiment in JupyterLab with Python - python

Related

Deployment Error: Function deployment failed due to a health check failure on Google Cloud Function with Tweepy

Python Clarifai program stopped working inspite of no changes made

Azure Machine Learning Studio Designer Error: code_expired

AutoML Vision metadata issue

Cloud Natural Language API Python script error (Client object has no attribute create_rows)

Categories

Resources