Convert voice to text while talking in python - python

I made a program which allows me to speak and converts it to a text. It converts my voice after I stopped talking. What I want to do is to convert my voice to text while I am talking.
https://www.youtube.com/watch?v=96AO6L9qp2U&t=2s&ab_channel=StormHack at min 2:31.
Pay attention to top right corner of Tony's monitor. It converts his voice to text while talking. I want to do the same thing. Can it be done?
This is my whole program:
import speech_recognition as sr
import pyaudio
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print("You said : {}".format(text))
except:
print("Sorry could not recognize what you said")
solution, tips, hints, or anything would be greatly appreciated, thank you in advance.

In order to do this you will have to do what's called VAD: Voice Audio Detection, a simple way to do this is take a set of samples from the audio and grab their intensity, if they are above a certain threshold then you should begin recording, once the intensity falls below a certain threshold for a given period of time then you conclude the recording and send it off to the service. You can find an example of this here.
More complex systems use better heuristics to decide whether or not the user is speaking, such as the frequency as well as applying things like noise reduction, other systems are also able to perform live speech to text as the user is speaking like DeepSpeech 2.

To do what you want, you need to listen not to a complete sentence, but for just a few words. You then have to process the audio data and to finally print the result. Here is a very basic implementation of it:
import speech_recognition as sr
import threading
import time
from queue import Queue
listen_recognizer = sr.Recognizer()
process_recognizer = sr.Recognizer()
audios_to_process = Queue()
def callback(recognizer, audio_data):
if audio_data:
audios_to_process.put(audio_data)
def listen():
source = sr.Microphone()
stop_listening = listen_recognizer.listen_in_background(source, callback, 3)
return stop_listening
def process_thread_func():
while True:
if audios_to_process.empty():
time.sleep(2)
continue
audio = audios_to_process.get()
if audio:
try:
text = process_recognizer.recognize_google(audio)
except:
pass
else:
print(text)
stop_listening = listen()
process_thread = threading.Thread(target=process_thread_func)
process_thread.start()
input()
stop_listening()
As you can see, I use 2 recognizers, so one will always be listening and the other will process the audio data.
The first one listens to data, then adds the audio data to a queue and listens again. At the same time, the other recognizer is checking if there is audio data to process into some text to then print it.

Related

How do i make speech recognition understand spesific words i choose ? Python

Header is probably not clear enough to specify my issue, sorry for that.
Im working on playing chess with speech recognition.
The issue:
Example: User is going to say "Rook A1 to A4" but, the speech recog thinks rook is "rogue" or "Brooklyn A1" etc.
How do i choose the spesific words like rook,pawn,queen etc. and make speech recog ai only understand those words ?
Current code i started with:
import pyttsx3
import speech_recognition as sr
recognizer = sr.Recognizer()
while True:
try:
with sr.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.2)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio)
text = text.lower()
print(f"{text}")
except sr.UnknownValueError():
recognizer = sr.Recognizer()
continue
You should use keyword_entries. Provide them as dictionary like so keyword_only_text = r.recognize_sphinx(audio, keyword_entries=[("rook",0.1),("knight",0.1),...])
This feature unfortunately is not available for recognize_google
The scalar values decide the sensitivity to that word. If you set them to one it will map any words recorded only to your words. If set to 0.001 you will only get a slight bias towards them.
See the workings of the function at speech_recognition/recognize_sphinx.

Recognizing numbers (integers) in python using voice

I am looking for code that can recognize a number by using a voice command. Is there any code on this? I would like to be able to recognize any number from 0 to 100.
I searched for a library and found this one, but there is no documentation on how to use it.
The library that I am currently using is speech_recognition as you can see in the code below.
I am trying to write some code where the user can give any value of brightness (of screen) form 0 to 100 and program will change the brightness. I would like to learn how to do this over voice not with int(input()).
My code:
import speech_recognition as sr
import time
t = {'one':1,'two':2,'three':3,'four':4,'five':5,'six':6,'seven':7,'eight':8,'nine':9,'ten':10}
r = sr.Recognizer()
with sr.Microphone() as source:
print('Speak anything: ')
audio = r.listen(source)
try:
text = r.recognize_google(audio)
print('You said : {0} {1} '.format(text, t[text]))
time.sleep(1)
except:
print('Sorry could not recogonize your voice')
My brightness code:
brightness = amount # percentage [0-100]
c = wmi.WMI(namespace='wmi')
methods = c.WmiMonitorBrightnessMethods()[0]
methods.WmiSetBrightness(brightness, 0)

Python: Get system audio in speech recognition instead of microphone

I am working on speech recognition in python, but it is only getting the input from Micropohone. How is it possible to give the audio from speakers as input to the speech recognition library?
The piece of code is given below:
import speech_recognition as sr
with sr.Microphone() as source: # using microphone here, would like to use speaker instead
print("Talk Something...")
audio=r.listen(source)
print("Time Over...")
import time
try:
t1=time.time()
print("Text: "+r.recognize_google(audio)) # prints after converting speech to text
t2=time.time()
print("T2-T1: ", t2-t1)
except:
print("Didn't understand the audio")
I have been struggling here for so long and any help will be much appreciated. Thanks!
You can configure device index as in docs:
import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))
If LINEIN is not available as a separate input, you might just configure it as a recording source in audio properties.

Enabling Audio Input for Speech Recognition Library

How do I turn on audio input for all device indexes using a Speech Recognition Library? As I want to pass in the audio for testing and there might be possibility that the library uses a different audio input device. How do I let it take the audio input from all the indexes?
You can use your microphone as a default audio input device below is a code snippet:
import speech_recognition as sr
r=sr.Recognizer() # this is a recognizer which recognize our voice3
with sr.Microphone() as source: # in this we are using a microphone to record our voicecmd
speak.speak("What can i do for you!") # this a speak invoke method w3hich ask us something
print("Ask me Something!") # this a print statement which come on console to ask something
audio=r.listen(source,timeout=60,phrase_time_limit=3)
data = ""
try:
"""
this is a try block it will recognize it our voice and say what we have told
"""
data= r.recognize_google(audio,language="en-US")
print("dynamo think you said!" + " "+data) # this will print on your console what will going to recognize by google apis
except:
"""
this is a except block which except the error which come in try block and the code is not able to run it will pass a value
"""
print("not able to listen you or your microphone is not good")
exit()
First, You require the following things installed on your system.
1. Python
2. Speech Recognition Package
3. PyAudio
Now, You can run this Code for know your Version
import speech_recognition as s_r
print(s_r.__version__)
Output
3.8.1
It will print the current version of your speech recognition package.
Then, Set microphone to accept sound :
my_mic = s_r.Microphone()
Here you have to pass the parameter device_index=?
To recognize input from the microphone you have to use a recognizer class. Let’s just create one.
r = s_r.Recognizer()
Now, I Convert the Sound Speech into Text In Python
To convert using Google speech recognition we can use the following line:
r.recognize_google(audio)
It will return a string with some texts. ( It will convert your voice to texts and return that as a string.
You can simply print it using the below line:
print(r.recognize_google(audio))
Now the full program will look like this:
import speech_recognition as s_r
print(s_r.__version__) # just to print the version not required
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1) #my device index is 1, you have to put your device index
with my_mic as source:
print("Say now!!!!")
audio = r.listen(source) #take voice input from the microphone
print(r.recognize_google(audio)) #to print voice into text
If you run this should you get an Output.
But after waiting a few moments if you don’t get any output, check your internet connection.

Speech or no speech detection in Python

I am writing a program that recognizes speech. What it does is it records audio from the microphone and converts it to text using Sphinx. My problem is I want to start recording audio only when something is spoken by the user.
I experimented by reading the audio levels from the microphone and recording only when the level is above a particular value. But it ain't that effective. The program starts recording whenever it detects anything loud. This is the code I used
import audioop
import pyaudio as pa
import wav
class speech():
def __init__(self):
# soundtrack properties
self.format = pa.paInt16
self.rate = 16000
self.channel = 1
self.chunk = 1024
self.threshold = 150
self.file = 'audio.wav'
# intialise microphone stream
self.audio = pa.PyAudio()
self.stream = self.audio.open(format=self.format,
channels=self.channel,
rate=self.rate,
input=True,
frames_per_buffer=self.chunk)
def record(self)
while True:
data = self.stream.read(self.chunk)
rms = audioop.rms(data,2) #get input volume
if rms>self.threshold: #if input volume greater than threshold
break
# array to store frames
frames = []
# record upto silence only
while rms>threshold:
data = self.stream.read(self.chunk)
rms = audioop.rms(data,2)
frames.append(data)
print 'finished recording.... writing file....'
write_frames = wav.open(self.file, 'wb')
write_frames.setnchannels(self.channel)
write_frames.setsampwidth(self.audio.get_sample_size(self.format))
write_frames.setframerate(self.rate)
write_frames.writeframes(''.join(frames))
write_frames.close()
Is there a way I can differentiate between human voice and other noise in Python ? Hope somebody can find me a solution.
I think that your issue is that at the moment you are trying to record without recognition of the speech so it is not discriminating - recognisable speech is anything that gives meaningful results after recognition - so catch 22. You could simplify matters by looking for an opening keyword. You can also filter on voice frequency range as the human ear and the telephone companies both do and you can look at the mark space ratio - I believe that there were some publications a while back on that but look out - it varies from language to language. A quick Google can be very informative. You may also find this article interesting.
I think waht you are looking for is VAD (voice activity detection). VAD can be used for preprocessing speech for ASR. Here is some open-source project for implements of VAD link. May it help you.
This is an example script using a VAD library.
https://github.com/wiseman/py-webrtcvad/blob/master/example.py

Categories