Converting Audio Blob to text in Python using Speech recognition - python

Apologies for the English....
I am building a chatbot application where voice is recorded on client side through HTML5's mediaRecorder api and sent as Formdata to python's falcon web service.
On Python side i need to directly convert this audio blob to text.
Currently I am writing this audio blob to a wav file and then reading from that file. However it is taking a long time in this process as FileIO is involved. I need to somehow directly consume this audio blob as input source for speech recognition.
This is What I have tried:
def on_post(self, req, resp):
open("backend.wav",'wb')
.write(req.get_param('audio_data').file.read());
mic = sr.AudioFile('backend.wav')
with mic as source:
print("Speak !!")
audio = r.record(source)
#audio = req
results = r.recognize_google(audio_data=audio, language="en-US",show_all=True)
return results;
I am not an experienced Python Developer ,So please pardon if it's a stupid question. Any help is highly appreciated..

I can't test it but it could work.
It seems that AudioFile can use file-object so this code uses io.BytesIO to create file-object in memory and save data in this file. This way it doesn't have to use disk.
import io
def on_post(self, req, resp):
f = req.get_param('audio_data').file
file_obj = io.BytesIO() # create file-object
file_obj.write(f.read()) # write in file-object
file_obj.seek(0) # move to beginning so it will read from beginning
mic = sr.AudioFile(file_obj) # use file-object
with mic as source:
audio = r.record(source)
result = r.recognize_google(audio_data=audio, language="en-US", show_all=True)
return result

Related

How to save file as mp3 from Amazon Polly using Python

I am using Amazon Polly for TTS, but I am not able to get how to save the converted speech into a .mp3 file in my computer
I have tried gTTS but i require Amazon Polly for my task.
import boto3
client = boto3.client('polly')
response = client.synthesize_speech
(Text = "Hello my name is Shubham", OuptutFormat = "mp3", VoiceId = 'Aditi')
Now, what Should I do to play this converted speech or save it into my PC as .mp3 file?
This code sample is taken straight from the documentation: https://docs.aws.amazon.com/polly/latest/dg/SynthesizeSpeechSamplePython.html
import boto3
polly_client = boto3.Session(
aws_access_key_id=,
aws_secret_access_key=,
region_name='us-west-2').client('polly')
response = polly_client.synthesize_speech(VoiceId='Joanna',
OutputFormat='mp3',
Text = 'This is a sample text to be synthesized.')
file = open('speech.mp3', 'wb')
file.write(response['AudioStream'].read())
file.close()
While not directly related to the original question, I responded to one of the comments about hot to get to the audio stream without saving the audio to a file.
You might also check out the documentation for this example:
https://docs.aws.amazon.com/polly/latest/dg/example-Python-server-code.html
This shows getting the response back from Polly:
response = polly.synthesize_speech(Text=text, VoiceId=voiceId, OutputFormat=outputFormat)
data_stream=response.get("AudioStream")
The first line makes the request to Polly and stores the response in the response object, while the second line gets the audio stream from the response object.

Google Speech API Python not responding

I'm trying to implement google's speech api but every time I try to run the program, the terminal goes unresponsive. It seems that the program runs until the line "response = client.recognize(config, audio)" and just gets stuck at that point. Here's a picture of my code, I pulled most of it straight from google's cloud platform documentation.
def transcribe_file(speech_file):
import os
import io
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="api-key.json"
"""Transcribe the given audio file."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
import io
client = speech.SpeechClient()
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding='FLAC',
sample_rate_hertz=16000,
language_code='en-US')
print(config)
response = client.recognize(config, audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
transcribe_file(audio/file/name.wav)
I had the same problem when building my app with PyInstaller. Make sure you have
the certifi folder with the file cacert.pem in it. Those should come with the library when you run pip install google-cloud-speech. If not reinstall it.
I was facing the same issue, tried changing
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="api-key.json"
to
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]= r"full/path/api-key.json"
on window.

Google speech to text API result is empty

I am using Cloud speech to text api to convert audio file to text file. I am executing it using python, Below is code.
import io
import os
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="D:\\Sentiment_Analysis\\My Project 59503-717155d6fb4a.json"
# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
file_name = os.path.join(os.path.dirname('D:\CallADoc_VoiceImplementation\audioclip154173607416598.amr'),'CallADoc_VoiceImplementation','audioclip154173607416598.amr')
# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file: content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,sample_rate_hertz=16000,language_code='en-IN')
# Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results: print('Transcript: {}'.format(result.alternatives[0].transcript))
When i execute the sample/tested audio file in the name "audio.raw", the audio is converting and result is like below.
runfile('C:/Users/sandesh.p/CallADoc/GoogleSpeechtoText.py', wdir='C:/Users/sandesh.p/CallADoc')
Transcript: how old is the Brooklyn Bridge
But for same code, i am recording a audio and try to convert, it is giving empty result like below:
runfile('C:/Users/sandesh.p/CallADoc/GoogleSpeechtoText.py', wdir='C:/Users/sandesh.p/CallADoc')
I am trying to fix this from past 2 days and please help me to resolve this.
Try following the troubleshooting steps to have your audio with the appropriate settings.
For instance, your audio file will have the following settings, which are required to have better results:
Encoding: FLAC
Channels: 1 # 16-bit
Sampleratehertz: 16000Hz

Google speech-to-text Python example code doesn't work

The following is my code (I made some slight changes to the original example code):
import io
import os
# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(__file__),
'C:\\Users\\louie\\Desktop',
'TOEFL2.mp3')
# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
# Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
text_file = open("C:\\Users\\louie\\Desktop\\Output.txt", "w")
text_file.write('Transcript: {}'.format(result.alternatives[0].transcript))
text_file.close()
I can only directly run this code in my windows prompt command since otherwise, the system cannot know the GOOGLE_APPLICATION_CREDENTIALS. However, when I run the code, nothing happened. I followed all the steps and I could see the request traffic changed on my console. But I cannot see any transcript. Could someone help me out?
You are trying to decode TOEFL2.mp3 file encoded as MP3 while you specify LINEAR audio encoding with
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16
You have to convert mp3 to wav first, see information about AudioEncoding

How to convert \303\255 to "í" in Python?

I'm using Google Speech API and since i'm using LongRunning functions for wav files, and they're all in pt-BR language, they're returning with content such as "voc\303\252 hoje boa noite cart\303\243o".
How can I convert this back to UTF-8?
I already tried .encode function, and already tried to check if there's any parameter to send, but I cannot find anything.
# [START def_transcribe_gcs]
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='pt-BR')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=300)
# Print the first alternative of all the consecutive results.
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
##This part is mine, the rest of the code belongs to Google
file = open("Test.txt", "wb")
file.write(str(response.results))
file.close()
# [END def_transcribe_gcs]

Categories