I recently tried to learn how to transcribe an audio file, but I am not very familiar with python.
I have read the example from the SpeechRecognition from the following website
https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py
I try to use them using the following code:
However, it looks like I cannot import my file in my windows computer.
I wonder if I have a wav file in my computer with the path
"C:\Users\Chen\Downloads\english.wav"
and I tried to replace the file with "C:\Users\Chen\Downloads" in my python code.
But it shows me that
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Chen\english.wav'
Please help me to fix the problems.
import speech_recognition as sr
# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
Use function listen() if you need to recognize text
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.listen(source) # read the entire audio file
text = r.recognize_google(audio)
print("Google Speech Recognition thinks you said " + text)
# Below code is for audio file in hindi
file = "hindi.wav"
with sr.AudioFile(file) as source:
audio = r.listen(source)
text = r.recognize_google(audio, language='hi-IN')
print("Text : " + text)
Related
So, I wanted to write a program that recognizes the audio from the Microphone and prints it as a string. But Python didn't print the recorded audio as a string. I watched some tutorials and read some articles but nothing helped.
Can you help me?
This is my code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.Microphone() as source:
audio = r.listen(source)
try:
print("You said " + r.recognize_sphinx(audio))
except LookupError:
print("Could not understand audio")
Thanks!
I want to convert the mp4 file in my system into text using python its easy to convert wav file into text but mp4 file conversion is having many issues especially with ffmpeg i think. In my code it always shows no such file or directory is found
enter code here
import speech_recognition as sr
import os
import pyaudio
command2mp3 = 'ffmpeg -i nanavi.mp4 nanavi.mp3'
command2wav = 'ffmpeg -i nanavi.mp3 nanavi.wav'
os.system(command2mp3)
os.system(command2wav)
r = sr.Recognizer()
with sr.AudioFile("nanavi.wav") as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source, duration=10)
print(r.recognize_google(audio))
error
FileNotFoundError:
[Errno 2] No such file or directory: 'nanavi.wav
Here is my python code..
import pyttsx3;
engine = pyttsx3.init(driverName='sapi5')
infile = "tanjil.txt"
f = open(infile, 'r')
theText = f.read()
f.close()
engine.say(theText)
engine.runAndWait()
I couldn't save the file to audio file
As of July 14 2019, I'm able to save to file with the pyttsx3 library (without using another library or internet connection).
It doesn't appear to be documented, but looking at the source code in github for the Engine class in "engine.py" (https://github.com/nateshmbhat/pyttsx3/blob/master/pyttsx3/engine.py), I was able to find a "save_to_file" function:
def save_to_file(self, text, filename, name=None):
'''
Adds an utterance to speak to the event queue.
#param text: Text to sepak
#type text: unicode
#param filename: the name of file to save.
#param name: Name to associate with this utterance. Included in
notifications about this utterance.
#type name: str
'''
self.proxy.save_to_file(text, filename, name)
I am able to use this like:
engine.save_to_file('the text I want to save as audio', path_to_save)
Not sure the format - it's some raw audio format (I guess it's maybe something like aiff) - but I can play it in an audio player.
If you install pydub:
https://pypi.org/project/pydub/
then you can easily convert this to mp3, e.g.:
from pydub import AudioSegment
AudioSegment.from_file(path_to_save).export('converted.mp3', format="mp3")
I've tried #Brian's solution but it didn't work for me.
I searched around a bit and I couldn't figure out how to save the speech to mp3 in pyttx3 but I found another solution without pyttx3.
It can take a .txt file and directly output a .wav file,
def txt_zu_wav(eingabe, ausgabe, text_aus_datei = True, geschwindigkeit = 2, Stimmenname = "Zira"):
from comtypes.client import CreateObject
engine = CreateObject("SAPI.SpVoice")
engine.rate = geschwindigkeit # von -10 bis 10
for stimme in engine.GetVoices():
if stimme.GetDescription().find(Stimmenname) >= 0:
engine.Voice = stimme
break
else:
print("Fehler Stimme nicht gefunden -> Standard wird benutzt")
if text_aus_datei:
datei = open(eingabe, 'r')
text = datei.read()
datei.close()
else:
text = eingabe
stream = CreateObject("SAPI.SpFileStream")
from comtypes.gen import SpeechLib
stream.Open(ausgabe, SpeechLib.SSFMCreateForWrite)
engine.AudioOutputStream = stream
engine.speak(text)
stream.Close()
txt_zu_wav("test.txt", "test_1.wav")
txt_zu_wav("It also works with a string instead of a file path", "test_2.wav", False)
This was tested with Python 3.7.4 on Windows 10.
import pyttsx3
engine = pyttsx3.init("sapi5")
voices = engine.getProperty("voices")[0]
engine.setProperty('voice', voices)
text = 'Your Text'
engine.save_to_file(text, 'name.mp3')
engine.runAndWait() # don't forget to use this line
Try the following code snippet to convert text to audio and save it as an mp3 file.
import pyttsx3
from pydub import AudioSegment
engine = pyttsx3.init('sapi5')
engine.save_to_file('This is a test phrase.', 'test.mp3') # raw audio file
engine.runAndWait()
AudioSegment.from_file('test.mp3').export('test.mp3', format="mp3") # audio file in mp3 format
NB: pyttsx3 save_to_file() method creates a raw audio file and it won't be useful for other applications to use even if we are able to play it in the media player. pydub is a useful package to convert raw audio into a specific format.
In the website of SpeechRecognition of google as the following
https://cloud.google.com/speech/quotas
it was taught that if the audio file more than one min, it should be used the uri field to solve the problmes.
I upload a file in google cloud storage with bucket as speechrecognitionelong
and file name as test.wav
So i think the uri name for this file should be gs://speechrecognitionelong/test.wav
I use the following code but it does not work
import speech_recognition as sr
AUDIO_FILE = "gs://speechrecognitionelong/test.wav" r = sr.Recognizer() with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
r.recognize_google(audio, language='en-US')
Did I misunderstand the uri filed or my code was wrong?
I want to convert a sound recording from Facebook Messenger to text.
Here is an example of an .mp4 file send using Facebook's API:
https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833
So this file includes only audio (not video) and I want to convert it to text.
Moreover, I want to do it as fast as possible since I'll use the generated text in an almost real-time application (i.e. user sends the .mp4 file, the script translates it to text and shows it back).
I've found this example https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py
and here is the code I use:
import requests
import speech_recognition as sr
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
with open("test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)
r = sr.Recognizer()
with sr.AudioFile('test.mp4') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command
But I'm getting this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Asterios\Anaconda2\lib\site-packages\speech_recognition\__init__.py", line 200, in __enter__
self.audio_reader = aifc.open(aiff_file, "rb")
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 952, in open
return Aifc_read(f)
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 347, in __init__
self.initfp(f)
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 298, in initfp
chunk = Chunk(file)
File "C:\Users\Asterios\Anaconda2\lib\chunk.py", line 63, in __init__
raise EOFError
EOFError
Any ideas?
EDIT: I want to run the script on the free-plan of pythonanywhere.com, so I'm not sure how I can install tools like ffmpeg there.
EDIT 2: If you run the above script substituting the url with this one "http://www.wavsource.com/snds_2017-01-08_2348563217987237/people/men/about_time.wav" and change 'mp4' to 'wav', the it works fine. So it is for sure something with the file format.
Finally I found an solution. I'm posting it here in case it helps someone in the future.
Fortunately, pythonanywhere.com comes with avconv pre-installed (avconv is similar to ffmpeg).
So here is some code that works:
import urllib2
import speech_recognition as sr
import subprocess
import os
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
mp4file = urllib2.urlopen(url)
with open("test.mp4", "wb") as handle:
handle.write(mp4file.read())
cmdline = ['avconv',
'-i',
'test.mp4',
'-vn',
'-f',
'wav',
'test.wav']
subprocess.call(cmdline)
r = sr.Recognizer()
with sr.AudioFile('test.wav') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command
os.remove("test.mp4")
os.remove("test.wav")
In the free plan, cdn.fbsbx.com was not on the white list of sites on pythonanywhere so I could not download the content with urllib2. I contacted them and they added the domain to the white list within 1-2 hours!
So a huge thanks and congrats to them for the excellent service even though I'm using the free tier.
Use Python Video Converter
https://github.com/senko/python-video-converter
import requests
import speech_recognition as sr
from converter import Converter
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
c = Converter()
with open("/tmp/test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)
conv = c.convert('/tmp/test.mp4', '/tmp/test.wav', {
'format': 'wav',
'audio': {
'codec': 'pcm',
'samplerate': 44100,
'channels': 2
},
})
for timecode in conv:
pass
r = sr.Recognizer()
with sr.AudioFile('/tmp/test.wav') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command