I want to make a Python program where microphone audio input is received.
I already tried pyaudio but I can't understand how it works.
There is this module called gTTS that you can use instead.
The get_audio function will be able to detect a users voice, translate the audio to text and return it to us. It will even wait until the user is speaking to start translating/recording the audio
Here's a complete example on Getting user input using the get_audio function.
def get_audio():
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
said = ""
try:
said = r.recognize_google(audio)
print(said)
except Exception as e:
print("Exception: " + str(e))
return said
Related
My problem is really simple but has been bugging me since quite a while now. I just want my program to print what I said but all happens is that it shows Listening... and when I speak something, It shows Recognizing... and ends.
Here's my code:
import speech_recognition as sr
def listen():
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening..")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing..")
query = r.recognize_google(audio, language='en-in')
print(f'You said: {query}')
except:
return ""
query = str(query)
return query.lower()
listen()
Please tell me where I am wrong or how I can fix this.
I have speech audio files in wav format that are 60 seconds each. However, the output gets truncated and only captures about 15% of the length. I have tried this both in my local Jupyter Notebook but also through Google Colab. According to the documentation, this request is below the threshold of the API. What am I doing wrong or how can I get around this limitation?
# select a recognizer session
# recognize_google() : Google Web Speech API
r = sr.Recognizer()
interview = sr.AudioFile('sample.wav')
with interview as source:
print('Ready...')
r.pause_threshold = 2
audio = r.record(source, duration=60)
type(audio)
transcription = r.recognize_google(audio, language='en_CA')
print(transcription)
Try to use this code and if output still same as old you can ident try and except block or change pause_threshold value
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile("sample.wav") as source:
print("Ready")
r.pause_threshold = 0.6
audio = r.record(source)
try:
s = r.recognize_google(audio)
print("Text: "+s)
except sr.UnknownValueError:
print("Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Error {0}".format(e))
import speech_recognition as sr
def takecommand():
r = sr.Recognizer()
with sr.Microphone(device_index=1) as source:
print("Listening...")
r.pause_threshold = 1
r.energy_threshold = 200
audio = r.listen(source)
try:
print("Recognizing...")
query = r.recognize_google(audio, language='en-in')
print(f"User Said: {query}\n")
except Exception as e:
# print(e)
speak("Please Say that Again")
print("Say that Again...")
return "None"
return query
When I Try To Use the Above Code my speech recognizer takes speakers voice as input instead of microphone
According to Python SpeechRecognition doc:
A device index is an integer between 0 and pyaudio.get_device_count() - 1 (assume we have used import pyaudio beforehand) inclusive. It represents an audio device such as a microphone or speaker. See the PyAudio documentation for more details.
If device_index is unspecified or None, the default microphone is used as the audio source. Otherwise, device_index should be the index of the device to use for audio input.
You are using the device with index 1:
with sr.Microphone(device_index=1) as source:
What I suggest is to query the available audio devices, and change the index accordingly:
List available devices using the pyaudio package(pip install pyaudio)
import pyaudio
pa = pyaudio.PyAudio()
pa.get_device_count() #get available devices
Change your device index accordingly(most probably your other device is index 0 or 2):
with sr.Microphone(device_index=0) as source: #replace 0 with the index of the device you want to use
the first except block runs every time i speak into the microphone, please help!
'''
import speech_recognition as sr
# get audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Speak:")
audio = r.listen(source)
try:
print("You said " + r.recognize_google(audio))
except sr.UnknownValueError:
print("Could not understand audio")
except sr.RequestError as e:
print("Could not request results; {0}".format(e))
'''
I think your RequestError is a result of the Google API reaching its limits. Google says:
Audio longer than ~1 minute must use the uri field to reference an audio file in Google Cloud Storage.
See here for the documentation
So you need to create an account here and use the API key given. Then upload the audio to the cloud and then use that link as a parameter in your program.
This is the only solution Google gives. Hope it helps :)
THIS SHOULD HELP IT
instead of '+' I added ',' in the print statement of try block
import speech_recognition as sr
# get audio from the microphone
r = sr.Recognizer()
with sr.Microphone() as source:
print("Speak:")
audio = r.listen(source)
try:
print("You said " ,r.recognize_google(audio))
except sr.UnknownValueError:
print("Could not understand audio")
except sr.RequestError as e:
print("Could not request results; {0}".format(e))
Is there a way to call the Bing Text To Speech API or the IBM Text To Speech API through Python?
Maybe in the fashion that Python's SpeechRecognition library works?
For Bing translation, set BING_KEY=**your key**.
You could then do translation as bing_en_US=recognizer.recognize_bing(audio, key=BING_KEY, language="en-US").
Ref: https://pypi.python.org/pypi/SpeechRecognition/
Get your key here:https://azure.microsoft.com/en-us/try/cognitive-services/?api=speech-api
I believe you can add:
return recognizer.recognize_ibm(audio)
in the code after downloading everything you need including the IBM zip file here:
https://github.com/watson-developer-cloud/speech-to-text-websockets-python
heres the entire code:
import speech_recognition
while 1:
recognizer = speech_recognition.Recognizer()
def listen():
with speech_recognition.Microphone() as source:
recognizer.adjust_for_ambient_noise(source)
audio = recognizer.listen(source)
try:
# return recognizer.recognize_sphinx(audio)
#return recognizer.recognize_google(audio)
return recognizer.recognize_ibm(audio)
except speech_recognition.UnknownValueError:
print("Could not understand audio")
except speech_recognition.RequestError as e:
print("Recog Error; {0}".format(e))
return ""
listen()
print (listen())