I am using Cloud speech to text api to convert audio file to text file. I am executing it using python, Below is code.
import io
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="D:\\Sentiment_Analysis\\My Project 59503-717155d6fb4a.json"
# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
file_name = os.path.join(os.path.dirname('D:\CallADoc_VoiceImplementation\audioclip154173607416598.amr'),'CallADoc_VoiceImplementation','audioclip154173607416598.amr')
# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file: content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-IN')
# Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results: print('Transcript: {}'.format(result.alternatives[0].transcript))
When i execute the sample/tested audio file in the name "audio.raw", the audio is converting and result is like below.
runfile('C:/Users/sandesh.p/CallADoc/GoogleSpeechtoText.py', wdir='C:/Users/sandesh.p/CallADoc')
Transcript: how old is the Brooklyn Bridge
But for same code, i am recording a audio and try to convert, it is giving empty result like below:
runfile('C:/Users/sandesh.p/CallADoc/GoogleSpeechtoText.py', wdir='C:/Users/sandesh.p/CallADoc')
I am trying to fix this from past 2 days and please help me to resolve this.
Try following the troubleshooting steps to have your audio with the appropriate settings.
For instance, your audio file will have the following settings, which are required to have better results:
Encoding: FLAC
Channels: 1 # 16-bit
Sampleratehertz: 16000Hz
Related
I would like to know if it is possible to get all the possible transcripts that google can generate from a given audio file, as you can see it is only giving the transcript that has the higher matching result.
from google.cloud import speech
import os
import io
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = ''
# Creates google client
client = speech.SpeechClient()
# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"test2.wav")
#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
audio_channel_count=1,
language_code="en-gb"
)
# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})
print(response.results)
# Reads the response
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
On your RecognitionConfig(), set a value to max_alternatives. When this is set greater than 1, it will show the other possible transcriptions.
max_alternatives int
Maximum number of recognition hypotheses to be returned. Specifically,
the maximum number of SpeechRecognitionAlternative messages within
each SpeechRecognitionResult. The server may return fewer than
max_alternatives. Valid values are 0-30. A value of 0
or 1 will return a maximum of one. If omitted, will return a
maximum of one.
Update your RecognitionConfig() to the code below:
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
audio_channel_count=1,
language_code="en-gb",
max_alternatives=10 # place a value between 0 - 30
)
I tested this using the sample audio from the github repo of Speech API. I used code below for testing:
from google.cloud import speech
import os
import io
# Creates google client
client = speech.SpeechClient()
# Full path of the audio file, Replace with your file name
file_name = os.path.join(os.path.dirname(__file__),"audio.raw")
#Loads the audio file into memory
with io.open(file_name, "rb") as audio_file:
content = audio_file.read()
audio = speech.RecognitionAudio(content=content)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
audio_channel_count=1,
language_code="en-us",
max_alternatives=10 # used 10 for testing
)
# Sends the request to google to transcribe the audio
response = client.recognize(request={"config": config, "audio": audio})
for result in response.results:
print(result.alternatives)
Output:
Apologies for the English....
I am building a chatbot application where voice is recorded on client side through HTML5's mediaRecorder api and sent as Formdata to python's falcon web service.
On Python side i need to directly convert this audio blob to text.
Currently I am writing this audio blob to a wav file and then reading from that file. However it is taking a long time in this process as FileIO is involved. I need to somehow directly consume this audio blob as input source for speech recognition.
This is What I have tried:
def on_post(self, req, resp):
open("backend.wav",'wb')
.write(req.get_param('audio_data').file.read());
mic = sr.AudioFile('backend.wav')
with mic as source:
print("Speak !!")
audio = r.record(source)
#audio = req
results = r.recognize_google(audio_data=audio, language="en-US",show_all=True)
return results;
I am not an experienced Python Developer ,So please pardon if it's a stupid question. Any help is highly appreciated..
I can't test it but it could work.
It seems that AudioFile can use file-object so this code uses io.BytesIO to create file-object in memory and save data in this file. This way it doesn't have to use disk.
import io
def on_post(self, req, resp):
f = req.get_param('audio_data').file
file_obj = io.BytesIO() # create file-object
file_obj.write(f.read()) # write in file-object
file_obj.seek(0) # move to beginning so it will read from beginning
mic = sr.AudioFile(file_obj) # use file-object
with mic as source:
audio = r.record(source)
result = r.recognize_google(audio_data=audio, language="en-US", show_all=True)
return result
I'm trying to implement google's speech api but every time I try to run the program, the terminal goes unresponsive. It seems that the program runs until the line "response = client.recognize(config, audio)" and just gets stuck at that point. Here's a picture of my code, I pulled most of it straight from google's cloud platform documentation.
def transcribe_file(speech_file):
import os
import io
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="api-key.json"
"""Transcribe the given audio file."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
import io
client = speech.SpeechClient()
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding='FLAC',
sample_rate_hertz=16000,
language_code='en-US')
print(config)
response = client.recognize(config, audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
transcribe_file(audio/file/name.wav)
I had the same problem when building my app with PyInstaller. Make sure you have
the certifi folder with the file cacert.pem in it. Those should come with the library when you run pip install google-cloud-speech. If not reinstall it.
I was facing the same issue, tried changing
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="api-key.json"
to
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]= r"full/path/api-key.json"
on window.
The following is my code (I made some slight changes to the original example code):
import io
import os
# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(__file__),
'C:\\Users\\louie\\Desktop',
'TOEFL2.mp3')
# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
# Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
text_file = open("C:\\Users\\louie\\Desktop\\Output.txt", "w")
text_file.write('Transcript: {}'.format(result.alternatives[0].transcript))
text_file.close()
I can only directly run this code in my windows prompt command since otherwise, the system cannot know the GOOGLE_APPLICATION_CREDENTIALS. However, when I run the code, nothing happened. I followed all the steps and I could see the request traffic changed on my console. But I cannot see any transcript. Could someone help me out?
You are trying to decode TOEFL2.mp3 file encoded as MP3 while you specify LINEAR audio encoding with
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16
You have to convert mp3 to wav first, see information about AudioEncoding
I'm using Google Speech API and since i'm using LongRunning functions for wav files, and they're all in pt-BR language, they're returning with content such as "voc\303\252 hoje boa noite cart\303\243o".
How can I convert this back to UTF-8?
I already tried .encode function, and already tried to check if there's any parameter to send, but I cannot find anything.
# [START def_transcribe_gcs]
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='pt-BR')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=300)
# Print the first alternative of all the consecutive results.
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
##This part is mine, the rest of the code belongs to Google
file = open("Test.txt", "wb")
file.write(str(response.results))
file.close()
# [END def_transcribe_gcs]