I am trying to read mp3/wav data from google cloud and trying to implement audio diarization technique. Issue is that I am not able to read the result which has passed by google api in variable response.
below is my python code
speech_file = r'gs://pp003231/a4a.wav'
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
audio = speech.types.RecognitionAudio(uri=speech_file)
response = client.long_running_recognize(config, audio)
print response
result = response.results[-1]
print result
Output displayed on console is
Traceback (most recent call last):
File "a1.py", line 131, in
print response.results
AttributeError: 'Operation' object has no attribute 'results'
Can you please share your expert advice about what I am doing wrong?
Thanks for your help.
Its too late for the author of this thread. However, posting the solution for someone in future as I too had similar issue.
Change
result = response.results[-1]
to
result = response.result().results[-1]
and it will work fine
Do you have access to the wav file in your bucket? also, this is the entire code? It seems that the sample_rate_hertz and the imports are missing. Here you have the code copy/pasted from the google docs samples, but I edited it to have just the diarization function.
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://<YOUR_BUCKET/<YOUR_WAV_FILE>")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
To run the code just name it diarization.py and use the command:
python diarization.py
Also, you have to install the latest google-cloud-speech library:
pip install --upgrade google-cloud-speech
And you need to have the credentials of your service account in a json file, you can check more info here
Related
As I said already sorry for the title. I have never worked with Azure API and have no idea what is wrong with the code, as I just copied from the documentation and put in my information.
Here is the code:
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
Well as I run this I get no error and in fact, get in my desired folder a .wav file, but this file has 0 bytes and it looks corrupted.
Now here is why I have no idea of what's wrong because if I remove this
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
So it becomes this
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
It now works all of the sudden, but with what I assume to be the basic/common voice.
So here is my question: how do I choose a voice that I want(btw is this one "en-US-JennyNeural" style="customerservice" or something among these lines)
Thank You in advance!
ChristopherNeural is not a valid voice name. The actual name of the voice is en-US-ChristopherNeural.
speech_config.speech_synthesis_voice_name = "en-US-ChristopherNeural"
This is well-documented on the Language support page of the Speech services documentation.
For other, more fine-grained control over voice characteristics, you'll require the use of SSML as outlined in text-to-speech-basics.py.
I'm working for the first time with IBM Watson Visual Recognition. My Python app needs to pass images that it's managing in memory to the service. However, the rather limited documentation and sample code I've been able to find from IBM shows calls to the API as referencing saved files. The file is passed to the call as an io.BufferedReader.
with open(car_path, 'rb') as images_file:
car_results = service.classify(
images_file=images_file,
threshold='0.1',
classifier_ids=['default']
).get_result()
My application will be working with images from memory and I don't want to have to save every image to file before I can make a call. I tried replacing the BufferedReader with an io.BytesIO stream, and I got back an error saying I was missing an images_filename param. When I added a mock filename (e.g. 'xyz123.jpg') I get back the following error:
TypeError: a bytes-like object is required, not 'float'
Can I make calls to the analysis API using an image from memory? If so, how?
EDIT:
This is essentially what I'm trying to do:
def analyze_image(pillow_img: PIL.Image):
byte_stream = io.BytesIO()
pillow_img.save(byte_stream, format='JPEG')
bytes_img = byte_stream.getvalue()
watson_vr = VisualRecognitionV3(
'2019-04-30',
url='https://gateway.watsonplatform.net/visual-recognition/api',
iam_apikey='<API KEY>'
)
result_json = watson_vr.classify(
images_file=bytes_img,
threshold=0.1,
classifier_ids=['default']
).get_result()
Thanks
How about
bytes_img = byte_stream.getbuffer()
...
result_json = watson_vr.classify(
images_file=bytes_img,
threshold=0.1,
classifier_ids=['default']
).get_result()
or
with byte_stream as images_file:
result_json = watson_vr.classify(
images_file=images_file,
threshold='0.1',
classifier_ids=['default']
).get_result()
I am using the code below to transcribe an audio file. When the process is completed, I only get the last word.
I have tried both flac and wav files and made sure the files are in my bucket.
Also verified service account is google is working fine. But can't figure out why I am only getting the last word.
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://MYBUCKET/MYAudiofile")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
RESULTS is shown here after running the code:
python diarazation.py
Waiting for operation to complete...
word: 'bye', speaker_tag: 0
I'm facing the following error while trying to persist log files to Azure Blob storage from Azure Batch execution - "FileUploadMiscError - A miscellaneous error was encountered while uploading one of the output files". This error doesn't give a lot of information as to what might be going wrong. I tried checking the Microsoft Documentation for this error code, but it doesn't mention this particular error code.
Below is the relevant code for adding the task to Azure Batch that I have ported from C# to Python for persisting the log files.
Note: The container that I have configured gets created when the task is added, but there's no blob inside.
import datetime
import logging
import os
import azure.storage.blob.models as blob_model
import yaml
from azure.batch import models
from azure.storage.blob.baseblobservice import BaseBlobService
from azure.storage.common.cloudstorageaccount import CloudStorageAccount
from dotenv import load_dotenv
LOG = logging.getLogger(__name__)
def add_tasks(batch_client, job_id, task_id, io_details, blob_details):
task_commands = "This is a placeholder. Actual code has an actual task. This gets completed successfully."
LOG.info("Configuring the blob storage details")
base_blob_service = BaseBlobService(
account_name=blob_details['account_name'],
account_key=blob_details['account_key'])
LOG.info("Base blob service created")
base_blob_service.create_container(
container_name=blob_details['container_name'], fail_on_exist=False)
LOG.info("Container present")
container_sas = base_blob_service.generate_container_shared_access_signature(
container_name=blob_details['container_name'],
permission=blob_model.ContainerPermissions(write=True),
expiry=datetime.datetime.now() + datetime.timedelta(days=1))
LOG.info(f"Container SAS created: {container_sas}")
container_url = base_blob_service.make_container_url(
container_name=blob_details['container_name'], sas_token=container_sas)
LOG.info(f"Container URL created: {container_url}")
# fpath = task_id + '/output.txt'
fpath = task_id
LOG.info(f"Creating output file object:")
out_files_list = list()
out_files = models.OutputFile(
file_pattern=r"../stderr.txt",
destination=models.OutputFileDestination(
container=models.OutputFileBlobContainerDestination(
container_url=container_url, path=fpath)),
upload_options=models.OutputFileUploadOptions(
upload_condition=models.OutputFileUploadCondition.task_completion))
out_files_list.append(out_files)
LOG.info(f"Output files: {out_files_list}")
LOG.info(f"Creating the task now: {task_id}")
task = models.TaskAddParameter(
id=task_id, command_line=task_commands, output_files=out_files_list)
batch_client.task.add(job_id=job_id, task=task)
LOG.info(f"Added task: {task_id}")
There is a bug in Batch's OutputFile handling which causes it to fail to upload to containers if the full container URL includes any query-string parameters other than the ones included in the SAS token. Unfortunately, the azure-storage-blob Python module includes an extra query string parameter when generating the URL via make_container_url.
This issue was just raised to us, and a fix will be released in the coming weeks, but an easy workaround is instead of using make_container_url to craft the URL, craft it yourself like so: container_url = 'https://{}/{}?{}'.format(blob_service.primary_endpoint, blob_details['container_name'], container_sas).
The resulting URL should look something like this: https://<account>.blob.core.windows.net/<container>?se=2019-01-12T01%3A34%3A05Z&sp=w&sv=2018-03-28&sr=c&sig=<sig> - specifically it shouldn't have restype=container in it (which is what the azure-storage-blob package is including)
I was trying to create a script that feeds articles through the classification tool of the Natural Language API and I found a tutorial that does exactly that. I was following this simple tutorial to get an intro into Google Cloud and the Natural Language API.
The end result is supposed to be a script that sends a bunch of new articles from the Google Cloud Storage to the Natural Language API to classify the articles and then save the whole thing into a table created in BigQuery.
I was following the example fine, but when running the final script I get the following error:
Traceback (most recent call last):
File "classify-text.py", line 39, in <module>
errors = bq_client.create_rows(table, rows_for_bq)
AttributeError: 'Client' object has no attribute 'create_rows'
The full script is:
from google.cloud import storage, language, bigquery
# Set up our GCS, NL, and BigQuery clients
storage_client = storage.Client()
nl_client = language.LanguageServiceClient()
# TODO: replace YOUR_PROJECT with your project name below
bq_client = bigquery.Client(project='Your_Project')
dataset_ref = bq_client.dataset('news_classification')
dataset = bigquery.Dataset(dataset_ref)
table_ref = dataset.table('article_data')
table = bq_client.get_table(table_ref)
# Send article text to the NL API's classifyText method
def classify_text(article):
response = nl_client.classify_text(
document=language.types.Document(
content=article,
type=language.enums.Document.Type.PLAIN_TEXT
)
)
return response
rows_for_bq = []
files = storage_client.bucket('text-classification-codelab').list_blobs()
print("Got article files from GCS, sending them to the NL API (this will take ~2 minutes)...")
# Send files to the NL API and save the result to send to BigQuery
for file in files:
if file.name.endswith('txt'):
article_text = file.download_as_string()
nl_response = classify_text(article_text)
if len(nl_response.categories) > 0:
rows_for_bq.append((article_text, nl_response.categories[0].name, nl_response.categories[0].confidence))
print("Writing NL API article data to BigQuery...")
# Write article text + category data to BQ
errors = bq_client.create_rows(table, rows_for_bq)
assert errors == []
You are using deprecated methods; these methods were marked as obsolete in version 0.29, and removed altogether in version 1.0.0.
You should use client.insert_rows() instead; the method accepts the same arguments:
errors = bq_client.insert_rows(table, rows_for_bq)