RecognitionConfig error with .flac files in google transcribe

RecognitionConfig error with .flac files in google transcribe - python

I am trying to transcribe an audio file with google cloud. Here is my code:
from google.cloud.speech_v1 import enums
from google.cloud import speech_v1p1beta1
import os
import io
def sample_long_running_recognize(local_file_path):
client = speech_v1p1beta1.SpeechClient()
# local_file_path = 'resources/commercial_mono.wav'
# If enabled, each word in the first alternative of each result will be
# tagged with a speaker tag to identify the speaker.
enable_speaker_diarization = True
# Optional. Specifies the estimated number of speakers in the conversation.
diarization_speaker_count = 2
# The language of the supplied audio
language_code = "en-US"
config = {
"enable_speaker_diarization": enable_speaker_diarization,
"diarization_speaker_count": diarization_speaker_count,
"language_code": language_code,
"encoding": enums.RecognitionConfig.AudioEncoding.FLAC
}
with io.open(local_file_path, "rb") as f:
content = f.read()
audio = {"content": content}
# audio = {"uri": storage_uri}
operation = client.long_running_recognize(config, audio)
print(u"Waiting for operation to complete...")
response = operation.result()
for result in response.results:
# First alternative has words tagged with speakers
alternative = result.alternatives[0]
print(u"Transcript: {}".format(alternative.transcript))
# Print the speaker_tag of each word
for word in alternative.words:
print(u"Word: {}".format(word.word))
print(u"Speaker tag: {}".format(word.speaker_tag))
sample_long_running_recognize('/Users/asi/Downloads/trimmed_3.flac')
I keep getting this error:
google.api_core.exceptions.InvalidArgument: 400 audio_channel_count `1` in RecognitionConfig must either be unspecified or match the value in the FLAC header `2`.
I cannot figure out what I am doing wrong. I have pretty much copy and pasted this from the google cloud speech API docs. Any advice?

This attribute (audio_channel_count) is the number of channels in the input audio data, and you only need to set this for MULTI-CHANNEL recognition. I would assume that this is your case, so as the message suggests, you need to set 'audio_channel_count' : 2 in your config to exactly match your audio file.
Please take a look on the source code for more information about the attributes for RecognitionConfig object.

Related

Azure API Not Working(sorry for the title I have no idea what's wrong)

As I said already sorry for the title. I have never worked with Azure API and have no idea what is wrong with the code, as I just copied from the documentation and put in my information.
Here is the code:
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
Well as I run this I get no error and in fact, get in my desired folder a .wav file, but this file has 0 bytes and it looks corrupted.
Now here is why I have no idea of what's wrong because if I remove this
speech_config.speech_synthesis_language = "en-US"
speech_config.speech_synthesis_voice_name = "ChristopherNeural"
So it becomes this
from azure.cognitiveservices.speech import AudioDataStream, SpeechConfig, SpeechSynthesizer, SpeechSynthesisOutputFormat
from azure.cognitiveservices.speech.audio import AudioOutputConfig
speech_config = SpeechConfig(subscription="ImagineHereAreNumbers", region="westeurope")
audio_config = AudioOutputConfig(filename=r'C:\Users\TheD4\OneDrive\Desktop\SpeechFolder\Azure.wav')
synthesizer = SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
synthesizer.speak_text_async("A simple test to write to a file.")
It now works all of the sudden, but with what I assume to be the basic/common voice.
So here is my question: how do I choose a voice that I want(btw is this one "en-US-JennyNeural" style="customerservice" or something among these lines)
Thank You in advance!

ChristopherNeural is not a valid voice name. The actual name of the voice is en-US-ChristopherNeural.
speech_config.speech_synthesis_voice_name = "en-US-ChristopherNeural"
This is well-documented on the Language support page of the Speech services documentation.
For other, more fine-grained control over voice characteristics, you'll require the use of SSML as outlined in text-to-speech-basics.py.

Google Cloud Analyze Sentiment in JupyterLab with Python

I am using Google Cloud / JupyterLab /Python
I'm trying to run a sample sentiment analysis, following the guide here
However, on running the example, I get this error:
AttributeError: 'SpeechClient' object has no attribute
'analyze_sentiment'
Below is the code I'm trying:
def sample_analyze_sentiment (gcs_content_uri):
gcs_content_uri = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = "en" document = {
"gcs_content_uri":'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt',
"type": 'enums.Document.Type.PLAIN_TEXT', "language": 'en'
}
response = client.analyze_sentiment(document,
encoding_type=encoding_type)
I had no problem generating the transcript using Speech to Text but no success getting a document sentiment analysis!?

I had no problem to perform analyze_sentiment following the documentation example.
I have some issues about your code. To me it should be
from google.cloud import language_v1
from google.cloud.language import enums
from google.cloud.language import types
def sample_analyze_sentiment(path):
#path = 'gs://converted_audiofiles/Converted_Audio/200315_1633 1.txt'
# if path is sent through the function it does not need to be specified inside it
# you can always set path = "default-path" when defining the function
client = language_v1.LanguageServiceClient()
document = types.Document(
gcs_content_uri = path,
type = enums.Document.Type.PLAIN_TEXT,
language = 'en',
)
response = client.analyze_sentiment(document)
return response
Therefore, I have tried the previous code with a path of my own to a text file inside a bucket in Google Cloud Storage.
response = sample_analyze_sentiment("<my-path>")
sentiment = response.document_sentiment
print(sentiment.score)
print(sentiment.magnitude)
I've got a successful run with sentiment score -0.5 and magnitude 1.5. I performed the run in JupyterLab with python3 which I assume is the set up you have.

Why google speech_v1p1beta1 output only shows the last word?

I am using the code below to transcribe an audio file. When the process is completed, I only get the last word.
I have tried both flac and wav files and made sure the files are in my bucket.
Also verified service account is google is working fine. But can't figure out why I am only getting the last word.
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://MYBUCKET/MYAudiofile")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
RESULTS is shown here after running the code:
python diarazation.py
Waiting for operation to complete...
word: 'bye', speaker_tag: 0

how to read mp3 data from google cloud using python

I am trying to read mp3/wav data from google cloud and trying to implement audio diarization technique. Issue is that I am not able to read the result which has passed by google api in variable response.
below is my python code
speech_file = r'gs://pp003231/a4a.wav'
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
audio = speech.types.RecognitionAudio(uri=speech_file)
response = client.long_running_recognize(config, audio)
print response
result = response.results[-1]
print result
Output displayed on console is
Traceback (most recent call last):
File "a1.py", line 131, in
print response.results
AttributeError: 'Operation' object has no attribute 'results'
Can you please share your expert advice about what I am doing wrong?
Thanks for your help.

Its too late for the author of this thread. However, posting the solution for someone in future as I too had similar issue.
Change
result = response.results[-1]
to
result = response.result().results[-1]
and it will work fine

Do you have access to the wav file in your bucket? also, this is the entire code? It seems that the sample_rate_hertz and the imports are missing. Here you have the code copy/pasted from the google docs samples, but I edited it to have just the diarization function.
#!/usr/bin/env python
"""Google Cloud Speech API sample that demonstrates enhanced models
and recognition metadata.
Example usage:
python diarization.py
"""
import argparse
import io
def transcribe_file_with_diarization():
"""Transcribe the given audio file synchronously with diarization."""
# [START speech_transcribe_diarization_beta]
from google.cloud import speech_v1p1beta1 as speech
client = speech.SpeechClient()
audio = speech.types.RecognitionAudio(uri="gs://<YOUR_BUCKET/<YOUR_WAV_FILE>")
config = speech.types.RecognitionConfig(
encoding=speech.enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='en-US',
enable_speaker_diarization=True,
diarization_speaker_count=2)
print('Waiting for operation to complete...')
response = client.recognize(config, audio)
# The transcript within each result is separate and sequential per result.
# However, the words list within an alternative includes all the words
# from all the results thus far. Thus, to get all the words with speaker
# tags, you only have to take the words list from the last result:
result = response.results[-1]
words_info = result.alternatives[0].words
# Printing out the output:
for word_info in words_info:
print("word: '{}', speaker_tag: {}".format(word_info.word,
word_info.speaker_tag))
# [END speech_transcribe_diarization_beta]
if __name__ == '__main__':
transcribe_file_with_diarization()
To run the code just name it diarization.py and use the command:
python diarization.py
Also, you have to install the latest google-cloud-speech library:
pip install --upgrade google-cloud-speech
And you need to have the credentials of your service account in a json file, you can check more info here

Google speech service - not returning transcription

I am using the sample code provided here and have implemented with the following:
# [START import_libraries]
import argparse
import base64
import json
import time
from oauth2client.service_account import ServiceAccountCredentials
import googleapiclient.discovery
import googleapiclient as gac
# [END import_libraries]
# [START authenticating]
# Application default credentials provided by env variable
# GOOGLE_APPLICATION_CREDENTIALS
def get_speech_service(credentials):
return googleapiclient.discovery.build('speech', 'v1beta1',credentials = credentials)
def main(speech_file):
"""Transcribe the given audio file asynchronously.
Args:
speech_file: the name of the audio file.
"""
# [START construct_request]
with open(speech_file, 'rb') as speech:
# Base64 encode the binary audio file for inclusion in the request.
speech_content = base64.b64encode(speech.read())
# print speech_content
scopes = ['https://www.googleapis.com/auth/cloud-platform']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'/Users/user/Documents/google_cloud/myjson.json', scopes)
service = get_speech_service(credentials)
service_request = service.speech().asyncrecognize(
body={
'config': {
# There are a bunch of config options you can specify. See
# https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig for the full list.
'encoding': 'LINEAR16', # raw 16-bit signed LE samples
'sampleRate': 16000, # 16 khz
# See http://g.co/cloud/speech/docs/languages for a list of
# supported languages.
'languageCode': 'en-US', # a BCP-47 language tag
},
'audio': {
'content': speech_content.decode('UTF-8')
}
})
# [END construct_request]
# [START send_request]
response = service_request.execute()
print(json.dumps(response))
# [END send_request]
name = response['name']
# Construct a GetOperation request.
service_request = service.operations().get(name=name)
while True:
# Give the server a few seconds to process.
print('Waiting for server processing...')
time.sleep(1)
# Get the long running operation with response.
response = service_request.execute()
if 'done' in response and response['done']:
break
# First print the raw json response
print(json.dumps(response['response'], indent=2))
# Now print the actual transcriptions
out = []
for result in response['response'].get('results', []):
print 'poo'
print('Result:')
for alternative in result['alternatives']:
print(u' Alternative: {}'.format(alternative['transcript']))
out.append(result)
return response
r = main("/Users/user/Downloads/brooklyn.flac")
Yet my print is the following:
{"name": "3202776140236290963"}
Waiting for server processing...
Waiting for server processing...
{
"#type": "type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse"
}
And my returned object is:
{u'done': True,
u'metadata': {u'#type': u'type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeMetadata',
u'lastUpdateTime': u'2017-03-25T15:54:46.136925Z',
u'progressPercent': 100,
u'startTime': u'2017-03-25T15:54:44.514614Z'},
u'name': u'2024312474309214820',
u'response': {u'#type': u'type.googleapis.com/google.cloud.speech.v1beta1.AsyncRecognizeResponse'}}
On my console screen I see the requests coming through via:
Unsure why I am not getting the proper transcription back from the sample file.
Any input is appreciated!

Well your config options have the following:
'config': {
# There are a bunch of config options you can specify. See
# https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig for the full list.
'encoding': 'LINEAR16', # raw 16-bit signed LE samples
'sampleRate': 16000, # 16 khz
# See http://g.co/cloud/speech/docs/languages for a list of
# supported languages.
'languageCode': 'en-US', # a BCP-47 language tag
},
However, you are using a FLAC file:
r = main("/Users/user/Downloads/brooklyn.flac")
Quoting https://cloud.google.com/speech/reference/rest/v1beta1/RecognitionConfig:
LINEAR16
Uncompressed 16-bit signed little-endian samples (Linear PCM). This is the only encoding that may be used by speech.asyncrecognize.
FLAC
This is the recommended encoding for speech.syncrecognize and StreamingRecognize because it uses lossless compression; therefore recognition accuracy is not compromised by a lossy codec.
In other words you can't use FLAC with speech.asyncrecognize, you may need to transcode your sample to Linear PCM first, or use speech.syncrecognize with FLAC encoding option.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

RecognitionConfig error with .flac files in google transcribe - python

Related

Azure API Not Working(sorry for the title I have no idea what's wrong)

Google Cloud Analyze Sentiment in JupyterLab with Python

Why google speech_v1p1beta1 output only shows the last word?

how to read mp3 data from google cloud using python

Google speech service - not returning transcription

Categories

Resources