I am trying to convert audio to text. The audio is not in English but in Dutch. I am unable to convert the Dutch audio to text. The code works only for English audios. I am not sure if I need to include some functions or options for the code to recognise other languages as well. Below is the code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('audio.wav') as source:
audio = r.listen(source)
try:
text = (r.recognize_google)
print(text)
Use:
text = r.recognize_google(audio, language="nl-NL")
you can do any language you just need to know what is your countries domains like for instance German would be: de-DE
Related
Header is probably not clear enough to specify my issue, sorry for that.
Im working on playing chess with speech recognition.
The issue:
Example: User is going to say "Rook A1 to A4" but, the speech recog thinks rook is "rogue" or "Brooklyn A1" etc.
How do i choose the spesific words like rook,pawn,queen etc. and make speech recog ai only understand those words ?
Current code i started with:
import pyttsx3
import speech_recognition as sr
recognizer = sr.Recognizer()
while True:
try:
with sr.Microphone() as mic:
recognizer.adjust_for_ambient_noise(mic, duration=0.2)
audio = recognizer.listen(mic)
text = recognizer.recognize_google(audio)
text = text.lower()
print(f"{text}")
except sr.UnknownValueError():
recognizer = sr.Recognizer()
continue
You should use keyword_entries. Provide them as dictionary like so keyword_only_text = r.recognize_sphinx(audio, keyword_entries=[("rook",0.1),("knight",0.1),...])
This feature unfortunately is not available for recognize_google
The scalar values decide the sensitivity to that word. If you set them to one it will map any words recorded only to your words. If set to 0.001 you will only get a slight bias towards them.
See the workings of the function at speech_recognition/recognize_sphinx.
I want to determine, when exactly a speech in the audio file starts and ends. Firstly, I am using speech_recognition library to determine speech content of the audio file:
import speech_recognition as sr
filename = './dir_name/1.wav'
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data, language = "en-US")
print(text)
Running this code I get a correct output (speech recognized content of the audio file). How can I determine, where exactly the speech starts and ends? Let's assume the file contains ONLY speech. I tried different methods, e. g. by analysing amplitude of the signal, its envelope etc. but I do not get high efficiency. So I started using speech_recognition library with a hope it could be useful here.
I am working on speech recognition in python, but it is only getting the input from Micropohone. How is it possible to give the audio from speakers as input to the speech recognition library?
The piece of code is given below:
import speech_recognition as sr
with sr.Microphone() as source: # using microphone here, would like to use speaker instead
print("Talk Something...")
audio=r.listen(source)
print("Time Over...")
import time
try:
t1=time.time()
print("Text: "+r.recognize_google(audio)) # prints after converting speech to text
t2=time.time()
print("T2-T1: ", t2-t1)
except:
print("Didn't understand the audio")
I have been struggling here for so long and any help will be much appreciated. Thanks!
You can configure device index as in docs:
import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))
If LINEIN is not available as a separate input, you might just configure it as a recording source in audio properties.
I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)
I want to convert an audio(ex: ".mp3") file to text file. I have tried different approaches like pyspeech and speech recognition, But i didn't get any answer. Is there any other way to do this..? Any help would be appreciated !
Did you try https://pypi.python.org/pypi/SpeechRecognition/ ? That sounds like exactly what you want.
I also found the CMU Sphinx project via this blog. It has Python bindings too (as mentioned in the article).
The other item I found was Google's Speech to Text API. You might want to check that out too. Here's a decent tutorial on this subject:
http://codeabitwiser.com/2014/09/python-google-speech-api/
import speech_recognition as sr
print(sr.__version__)
r = sr.Recognizer()
file_audio = sr.AudioFile('file_audio.wav')
with file_audio as source:
audio_text = r.record(source)
print(type(audio_text))
print(r.recognize_google(audio_text))
way 1: convet audio file to bytes (0,1) with https://github.com/jiaaro/pydub or by f = open("test.mp3", "rb") first16bytes = f.read(16)
way 2: audio to speech convertors.eg.-convert to english or other language with pip libraries like SpeechRecognition pydub. (but i think you don't asked for this)
way 3: convert mp3 to Json. If anyone did this, then please share.