conversion of mp4 file into text using python in windows - python

I want to convert the mp4 file in my system into text using python its easy to convert wav file into text but mp4 file conversion is having many issues especially with ffmpeg i think. In my code it always shows no such file or directory is found
enter code here
import speech_recognition as sr
import os
import pyaudio
command2mp3 = 'ffmpeg -i nanavi.mp4 nanavi.mp3'
command2wav = 'ffmpeg -i nanavi.mp3 nanavi.wav'
os.system(command2mp3)
os.system(command2wav)
r = sr.Recognizer()
with sr.AudioFile("nanavi.wav") as source:
r.adjust_for_ambient_noise(source)
audio = r.listen(source, duration=10)
print(r.recognize_google(audio))
error
FileNotFoundError:
[Errno 2] No such file or directory: 'nanavi.wav

Related

How to solve "Audio file is corrupted or in another format" while I can listen the file, and the file is in the right format?

I'am working on a Speech to text assignment. I have an example working with an example audio file, but when I try my own audio file I receive this error:
Traceback (most recent call last):
File "<ipython-input-27-43c56c192b14>", line 1, in <module>
with input_audio as source:
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\speech_recognition\__init__.py", line 236, in __enter__
raise ValueError("Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format")
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
My Question: What can I do, to analyze my Audio file? Since It looks like it is in the right format..
I googled, and found another question on Stackoverflow. The author was mentioning that probably, the type of WAV file is wrong. However, when I check the type of my audio, it looks right:
import fleep
with open("my_own_audio.wav", "rb") as file:
info = fleep.get(file.read(128))
print(info.extension)
['wav']
My code so far (it is the same as the Ultimate Guide To Speech Recognition)
import os
import speech_recognition as sr
os.chdir(r'C:\Desktop\Speech_to_Text')
r = sr.Recognizer()
input_audio = sr.AudioFile('harvard.wav') # The example works!
input_audio = sr.AudioFile('my_own_audio.wav') # Will throw the error!
type(input_audio) # For both, it will print Out[29]: speech_recognition.AudioFile
# This chunk will throw the error!
with input_audio as source:
# If the data has a lot of noise.
r.adjust_for_ambient_noise(source)
audio = r.record(source)
r.recognize_google(audio, show_all = True)

Extract WAV audio file from video after uploading on web, using FFMPEG in django

Problem
I am trying to find a way to extract an audio wav file from a mp4 video file that is uploaded by a web user using ffmpeg using Django.
If I will find to extract audio, then where should I save it in my project?
I tried it with "Django-ffmpeg", but didn't convert and was stuck in 'pending conversion' message.
Then I tried with:
import subprocess
subprocess.call('ffmpeg -i filename.mp4 filename.wav')
Error
Script
def validate_file_extension(value):
import os
from django.core.exceptions import ValidationError
ext = os.path.splitext(value.name)1 # [0] returns path+filename
filename = os.path.splitext(value.name)1 # [0] returns path+filename
valid_extensions = ['.mp4']
if not ext.lower() in valid_extensions:
raise ValidationError(u'Unsupported file extension.')
else:
import subprocess
subprocess.call('ffmpeg -i filename.mp4 filename.wav')
Solution:
- you can extract wav audio file in this way in Django:
def extract_audio(videofile,channels=1, rate=16000):
your_media_root = settings.MEDIA_ROOT
path_to_user_folder = your_media_root + videofile.name
inFile = path_to_user_folder
temp = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
command = ["ffmpeg", "-y", "-i", inFile,
"-ac", str(channels), "-ar", str(rate),
"-loglevel", "error", temp.name]
subprocess.check_output(command)
print(temp.name)
return temp.name
secondly use "import tempfile" library to store extracted file temporary.

how can i convert a text file to mp3 file using python pyttsx3 and sapi5?

Here is my python code..
import pyttsx3;
engine = pyttsx3.init(driverName='sapi5')
infile = "tanjil.txt"
f = open(infile, 'r')
theText = f.read()
f.close()
engine.say(theText)
engine.runAndWait()
I couldn't save the file to audio file
As of July 14 2019, I'm able to save to file with the pyttsx3 library (without using another library or internet connection).
It doesn't appear to be documented, but looking at the source code in github for the Engine class in "engine.py" (https://github.com/nateshmbhat/pyttsx3/blob/master/pyttsx3/engine.py), I was able to find a "save_to_file" function:
def save_to_file(self, text, filename, name=None):
'''
Adds an utterance to speak to the event queue.
#param text: Text to sepak
#type text: unicode
#param filename: the name of file to save.
#param name: Name to associate with this utterance. Included in
notifications about this utterance.
#type name: str
'''
self.proxy.save_to_file(text, filename, name)
I am able to use this like:
engine.save_to_file('the text I want to save as audio', path_to_save)
Not sure the format - it's some raw audio format (I guess it's maybe something like aiff) - but I can play it in an audio player.
If you install pydub:
https://pypi.org/project/pydub/
then you can easily convert this to mp3, e.g.:
from pydub import AudioSegment
AudioSegment.from_file(path_to_save).export('converted.mp3', format="mp3")
I've tried #Brian's solution but it didn't work for me.
I searched around a bit and I couldn't figure out how to save the speech to mp3 in pyttx3 but I found another solution without pyttx3.
It can take a .txt file and directly output a .wav file,
def txt_zu_wav(eingabe, ausgabe, text_aus_datei = True, geschwindigkeit = 2, Stimmenname = "Zira"):
from comtypes.client import CreateObject
engine = CreateObject("SAPI.SpVoice")
engine.rate = geschwindigkeit # von -10 bis 10
for stimme in engine.GetVoices():
if stimme.GetDescription().find(Stimmenname) >= 0:
engine.Voice = stimme
break
else:
print("Fehler Stimme nicht gefunden -> Standard wird benutzt")
if text_aus_datei:
datei = open(eingabe, 'r')
text = datei.read()
datei.close()
else:
text = eingabe
stream = CreateObject("SAPI.SpFileStream")
from comtypes.gen import SpeechLib
stream.Open(ausgabe, SpeechLib.SSFMCreateForWrite)
engine.AudioOutputStream = stream
engine.speak(text)
stream.Close()
txt_zu_wav("test.txt", "test_1.wav")
txt_zu_wav("It also works with a string instead of a file path", "test_2.wav", False)
This was tested with Python 3.7.4 on Windows 10.
import pyttsx3
engine = pyttsx3.init("sapi5")
voices = engine.getProperty("voices")[0]
engine.setProperty('voice', voices)
text = 'Your Text'
engine.save_to_file(text, 'name.mp3')
engine.runAndWait() # don't forget to use this line
Try the following code snippet to convert text to audio and save it as an mp3 file.
import pyttsx3
from pydub import AudioSegment
engine = pyttsx3.init('sapi5')
engine.save_to_file('This is a test phrase.', 'test.mp3') # raw audio file
engine.runAndWait()
AudioSegment.from_file('test.mp3').export('test.mp3', format="mp3") # audio file in mp3 format
NB: pyttsx3 save_to_file() method creates a raw audio file and it won't be useful for other applications to use even if we are able to play it in the media player. pydub is a useful package to convert raw audio into a specific format.

demo how to Transcribe an audio file using the SpeechRecognition

I recently tried to learn how to transcribe an audio file, but I am not very familiar with python.
I have read the example from the SpeechRecognition from the following website
https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py
I try to use them using the following code:
However, it looks like I cannot import my file in my windows computer.
I wonder if I have a wav file in my computer with the path
"C:\Users\Chen\Downloads\english.wav"
and I tried to replace the file with "C:\Users\Chen\Downloads" in my python code.
But it shows me that
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Chen\english.wav'
Please help me to fix the problems.
import speech_recognition as sr
# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
Use function listen() if you need to recognize text
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.listen(source) # read the entire audio file
text = r.recognize_google(audio)
print("Google Speech Recognition thinks you said " + text)
# Below code is for audio file in hindi
file = "hindi.wav"
with sr.AudioFile(file) as source:
audio = r.listen(source)
text = r.recognize_google(audio, language='hi-IN')
print("Text : " + text)

Convert mp4 sound to text in python

I want to convert a sound recording from Facebook Messenger to text.
Here is an example of an .mp4 file send using Facebook's API:
https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833
So this file includes only audio (not video) and I want to convert it to text.
Moreover, I want to do it as fast as possible since I'll use the generated text in an almost real-time application (i.e. user sends the .mp4 file, the script translates it to text and shows it back).
I've found this example https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py
and here is the code I use:
import requests
import speech_recognition as sr
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
with open("test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)
r = sr.Recognizer()
with sr.AudioFile('test.mp4') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command
But I'm getting this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Asterios\Anaconda2\lib\site-packages\speech_recognition\__init__.py", line 200, in __enter__
self.audio_reader = aifc.open(aiff_file, "rb")
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 952, in open
return Aifc_read(f)
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 347, in __init__
self.initfp(f)
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 298, in initfp
chunk = Chunk(file)
File "C:\Users\Asterios\Anaconda2\lib\chunk.py", line 63, in __init__
raise EOFError
EOFError
Any ideas?
EDIT: I want to run the script on the free-plan of pythonanywhere.com, so I'm not sure how I can install tools like ffmpeg there.
EDIT 2: If you run the above script substituting the url with this one "http://www.wavsource.com/snds_2017-01-08_2348563217987237/people/men/about_time.wav" and change 'mp4' to 'wav', the it works fine. So it is for sure something with the file format.
Finally I found an solution. I'm posting it here in case it helps someone in the future.
Fortunately, pythonanywhere.com comes with avconv pre-installed (avconv is similar to ffmpeg).
So here is some code that works:
import urllib2
import speech_recognition as sr
import subprocess
import os
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
mp4file = urllib2.urlopen(url)
with open("test.mp4", "wb") as handle:
handle.write(mp4file.read())
cmdline = ['avconv',
'-i',
'test.mp4',
'-vn',
'-f',
'wav',
'test.wav']
subprocess.call(cmdline)
r = sr.Recognizer()
with sr.AudioFile('test.wav') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command
os.remove("test.mp4")
os.remove("test.wav")
In the free plan, cdn.fbsbx.com was not on the white list of sites on pythonanywhere so I could not download the content with urllib2. I contacted them and they added the domain to the white list within 1-2 hours!
So a huge thanks and congrats to them for the excellent service even though I'm using the free tier.
Use Python Video Converter
https://github.com/senko/python-video-converter
import requests
import speech_recognition as sr
from converter import Converter
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
c = Converter()
with open("/tmp/test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)
conv = c.convert('/tmp/test.mp4', '/tmp/test.wav', {
'format': 'wav',
'audio': {
'codec': 'pcm',
'samplerate': 44100,
'channels': 2
},
})
for timecode in conv:
pass
r = sr.Recognizer()
with sr.AudioFile('/tmp/test.wav') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command

Categories