How to get words timestamps using speech_recognition in Python? - python

I want to determine, when exactly a speech in the audio file starts and ends. Firstly, I am using speech_recognition library to determine speech content of the audio file:
import speech_recognition as sr
filename = './dir_name/1.wav'
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data, language = "en-US")
print(text)
Running this code I get a correct output (speech recognized content of the audio file). How can I determine, where exactly the speech starts and ends? Let's assume the file contains ONLY speech. I tried different methods, e. g. by analysing amplitude of the signal, its envelope etc. but I do not get high efficiency. So I started using speech_recognition library with a hope it could be useful here.

Related

How do i make speech recognition understand spesific words i choose ? Python

Header is probably not clear enough to specify my issue, sorry for that.
Im working on playing chess with speech recognition.
The issue:
Example: User is going to say "Rook A1 to A4" but, the speech recog thinks rook is "rogue" or "Brooklyn A1" etc.
How do i choose the spesific words like rook,pawn,queen etc. and make speech recog ai only understand those words ?
Current code i started with:
import pyttsx3
import speech_recognition as sr
recognizer = sr.Recognizer()
while True:
try:
with sr.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.2)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio)
text = text.lower()
print(f"{text}")
except sr.UnknownValueError():
recognizer = sr.Recognizer()
continue
You should use keyword_entries. Provide them as dictionary like so keyword_only_text = r.recognize_sphinx(audio, keyword_entries=[("rook",0.1),("knight",0.1),...])
This feature unfortunately is not available for recognize_google
The scalar values decide the sensitivity to that word. If you set them to one it will map any words recorded only to your words. If set to 0.001 you will only get a slight bias towards them.
See the workings of the function at speech_recognition/recognize_sphinx.

Python: Get system audio in speech recognition instead of microphone

I am working on speech recognition in python, but it is only getting the input from Micropohone. How is it possible to give the audio from speakers as input to the speech recognition library?
The piece of code is given below:
import speech_recognition as sr
with sr.Microphone() as source: # using microphone here, would like to use speaker instead
print("Talk Something...")
audio=r.listen(source)
print("Time Over...")
import time
try:
t1=time.time()
print("Text: "+r.recognize_google(audio)) # prints after converting speech to text
t2=time.time()
print("T2-T1: ", t2-t1)
except:
print("Didn't understand the audio")
I have been struggling here for so long and any help will be much appreciated. Thanks!
You can configure device index as in docs:
import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))
If LINEIN is not available as a separate input, you might just configure it as a recording source in audio properties.

'Audio data must be audio data' error with google speech recognition in python

I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)

Enabling Audio Input for Speech Recognition Library

How do I turn on audio input for all device indexes using a Speech Recognition Library? As I want to pass in the audio for testing and there might be possibility that the library uses a different audio input device. How do I let it take the audio input from all the indexes?
You can use your microphone as a default audio input device below is a code snippet:
import speech_recognition as sr
r=sr.Recognizer() # this is a recognizer which recognize our voice3
with sr.Microphone() as source: # in this we are using a microphone to record our voicecmd
speak.speak("What can i do for you!") # this a speak invoke method w3hich ask us something
print("Ask me Something!") # this a print statement which come on console to ask something
audio=r.listen(source,timeout=60,phrase_time_limit=3)
data = ""
try:
"""
this is a try block it will recognize it our voice and say what we have told
"""
data= r.recognize_google(audio,language="en-US")
print("dynamo think you said!" + " "+data) # this will print on your console what will going to recognize by google apis
except:
"""
this is a except block which except the error which come in try block and the code is not able to run it will pass a value
"""
print("not able to listen you or your microphone is not good")
exit()
First, You require the following things installed on your system.
1. Python
2. Speech Recognition Package
3. PyAudio
Now, You can run this Code for know your Version
import speech_recognition as s_r
print(s_r.__version__)
Output
3.8.1
It will print the current version of your speech recognition package.
Then, Set microphone to accept sound :
my_mic = s_r.Microphone()
Here you have to pass the parameter device_index=?
To recognize input from the microphone you have to use a recognizer class. Let’s just create one.
r = s_r.Recognizer()
Now, I Convert the Sound Speech into Text In Python
To convert using Google speech recognition we can use the following line:
r.recognize_google(audio)
It will return a string with some texts. ( It will convert your voice to texts and return that as a string.
You can simply print it using the below line:
print(r.recognize_google(audio))
Now the full program will look like this:
import speech_recognition as s_r
print(s_r.__version__) # just to print the version not required
r = s_r.Recognizer()
my_mic = s_r.Microphone(device_index=1) #my device index is 1, you have to put your device index
with my_mic as source:
print("Say now!!!!")
audio = r.listen(source) #take voice input from the microphone
print(r.recognize_google(audio)) #to print voice into text
If you run this should you get an Output.
But after waiting a few moments if you don’t get any output, check your internet connection.

Transcribe an Audio File in Python

I'm trying to transcribe an audio file which is bit large. It's properties are as follows.
Size : 278.3 MB
Duration : 52 minutes
Format : WAV
Follwoing is my code which I used to convert it having 60 second durations. Could you please advice to transcribe this file at once?
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('sampleMp3.WAV') as source:
audio = r.record(source, duration=60)
command = r.recognize_google(audio)
text_file = open("Output.txt", "w")
text_file.write(command)
text_file.close()
speech_recognition python package is just a wrapper, it does not provide even basic functions.
If you want to use Google Speech API (paid), you can do something like this:
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe_async.py
If you want to consider Bing, it also provides similar API, see How can I transcribe a speech file with the Bing Speech API in Python?
For the free alternative consider https://github.com/alumae/kaldi-offline-transcriber
Instead Of Transcirbing Using Python, Use Nuance Dragon Instead.
https://www.nuance.com/en-nz/dragon/dragon-anywhere/free-trial.html
The Best Transcribing Software.

Categories