Convert mp4 sound to text in python - python

I want to convert a sound recording from Facebook Messenger to text.
Here is an example of an .mp4 file send using Facebook's API:
https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833
So this file includes only audio (not video) and I want to convert it to text.
Moreover, I want to do it as fast as possible since I'll use the generated text in an almost real-time application (i.e. user sends the .mp4 file, the script translates it to text and shows it back).
I've found this example https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py
and here is the code I use:
import requests
import speech_recognition as sr
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
with open("test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)
r = sr.Recognizer()
with sr.AudioFile('test.mp4') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command
But I'm getting this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Asterios\Anaconda2\lib\site-packages\speech_recognition\__init__.py", line 200, in __enter__
self.audio_reader = aifc.open(aiff_file, "rb")
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 952, in open
return Aifc_read(f)
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 347, in __init__
self.initfp(f)
File "C:\Users\Asterios\Anaconda2\lib\aifc.py", line 298, in initfp
chunk = Chunk(file)
File "C:\Users\Asterios\Anaconda2\lib\chunk.py", line 63, in __init__
raise EOFError
EOFError
Any ideas?
EDIT: I want to run the script on the free-plan of pythonanywhere.com, so I'm not sure how I can install tools like ffmpeg there.
EDIT 2: If you run the above script substituting the url with this one "http://www.wavsource.com/snds_2017-01-08_2348563217987237/people/men/about_time.wav" and change 'mp4' to 'wav', the it works fine. So it is for sure something with the file format.

Finally I found an solution. I'm posting it here in case it helps someone in the future.
Fortunately, pythonanywhere.com comes with avconv pre-installed (avconv is similar to ffmpeg).
So here is some code that works:
import urllib2
import speech_recognition as sr
import subprocess
import os
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
mp4file = urllib2.urlopen(url)
with open("test.mp4", "wb") as handle:
handle.write(mp4file.read())
cmdline = ['avconv',
'-i',
'test.mp4',
'-vn',
'-f',
'wav',
'test.wav']
subprocess.call(cmdline)
r = sr.Recognizer()
with sr.AudioFile('test.wav') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command
os.remove("test.mp4")
os.remove("test.wav")
In the free plan, cdn.fbsbx.com was not on the white list of sites on pythonanywhere so I could not download the content with urllib2. I contacted them and they added the domain to the white list within 1-2 hours!
So a huge thanks and congrats to them for the excellent service even though I'm using the free tier.

Use Python Video Converter
https://github.com/senko/python-video-converter
import requests
import speech_recognition as sr
from converter import Converter
url = 'https://cdn.fbsbx.com/v/t59.3654-21/15720510_10211855778255994_5430581267814940672_n.mp4/audioclip-1484407992000-3392.mp4?oh=a78286aa96c9dea29e5d07854194801c&oe=587C3833'
r = requests.get(url)
c = Converter()
with open("/tmp/test.mp4", "wb") as handle:
for data in r.iter_content():
handle.write(data)
conv = c.convert('/tmp/test.mp4', '/tmp/test.wav', {
'format': 'wav',
'audio': {
'codec': 'pcm',
'samplerate': 44100,
'channels': 2
},
})
for timecode in conv:
pass
r = sr.Recognizer()
with sr.AudioFile('/tmp/test.wav') as source:
audio = r.record(source)
command = r.recognize_google(audio)
print command

Related

How to solve "Audio file is corrupted or in another format" while I can listen the file, and the file is in the right format?

I'am working on a Speech to text assignment. I have an example working with an example audio file, but when I try my own audio file I receive this error:
Traceback (most recent call last):
File "<ipython-input-27-43c56c192b14>", line 1, in <module>
with input_audio as source:
File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\speech_recognition\__init__.py", line 236, in __enter__
raise ValueError("Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format")
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC; check if file is corrupted or in another format
My Question: What can I do, to analyze my Audio file? Since It looks like it is in the right format..
I googled, and found another question on Stackoverflow. The author was mentioning that probably, the type of WAV file is wrong. However, when I check the type of my audio, it looks right:
import fleep
with open("my_own_audio.wav", "rb") as file:
info = fleep.get(file.read(128))
print(info.extension)
['wav']
My code so far (it is the same as the Ultimate Guide To Speech Recognition)
import os
import speech_recognition as sr
os.chdir(r'C:\Desktop\Speech_to_Text')
r = sr.Recognizer()
input_audio = sr.AudioFile('harvard.wav') # The example works!
input_audio = sr.AudioFile('my_own_audio.wav') # Will throw the error!
type(input_audio) # For both, it will print Out[29]: speech_recognition.AudioFile
# This chunk will throw the error!
with input_audio as source:
# If the data has a lot of noise.
r.adjust_for_ambient_noise(source)
audio = r.record(source)
r.recognize_google(audio, show_all = True)

How to change a wave.open file to a normal file object without saving the file

I am trying to use the python speech_recognition to get text from a set of frames in a wave file got from a client.
I have tried to use speech_recognition on a wave object but this doesn't work and it only works on files (or the path to a file)
I tried:
import speech_recognition as sr
import wave
r = sr.Recognizer()
# code to get frames
waveFile = wave.open(file, 'wb')
waveFile.setnchannels(1)
waveFile.setsampwidth(2)
waveFile.setframerate(44100)
waveFile.writeframes(frames) # from client
f = sr.AudioFile(waveFile)
with f as source:
audio_file = r.record(source)
text = r.recognize_google(audio_data=audio_file, language="en")
print(text)
Then I get the error:
AssertionError: Given audio file must be a filename string or a file-like object
So I am wondering if there is a way to convert a wave object to a normal file-like object.

Google speech-to-text Python example code doesn't work

The following is my code (I made some slight changes to the original example code):
import io
import os
# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(__file__),
'C:\\Users\\louie\\Desktop',
'TOEFL2.mp3')
# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
# Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
text_file = open("C:\\Users\\louie\\Desktop\\Output.txt", "w")
text_file.write('Transcript: {}'.format(result.alternatives[0].transcript))
text_file.close()
I can only directly run this code in my windows prompt command since otherwise, the system cannot know the GOOGLE_APPLICATION_CREDENTIALS. However, when I run the code, nothing happened. I followed all the steps and I could see the request traffic changed on my console. But I cannot see any transcript. Could someone help me out?
You are trying to decode TOEFL2.mp3 file encoded as MP3 while you specify LINEAR audio encoding with
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16
You have to convert mp3 to wav first, see information about AudioEncoding

demo how to Transcribe an audio file using the SpeechRecognition

I recently tried to learn how to transcribe an audio file, but I am not very familiar with python.
I have read the example from the SpeechRecognition from the following website
https://github.com/Uberi/speech_recognition/blob/master/examples/audio_transcribe.py
I try to use them using the following code:
However, it looks like I cannot import my file in my windows computer.
I wonder if I have a wav file in my computer with the path
"C:\Users\Chen\Downloads\english.wav"
and I tried to replace the file with "C:\Users\Chen\Downloads" in my python code.
But it shows me that
FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Chen\english.wav'
Please help me to fix the problems.
import speech_recognition as sr
# obtain path to "english.wav" in the same folder as this script
from os import path
AUDIO_FILE = path.join(path.dirname(path.realpath(__file__)), "english.wav")
# use the audio file as the audio source
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.record(source) # read the entire audio file
print("Google Speech Recognition thinks you said " + r.recognize_google(audio))
Use function listen() if you need to recognize text
r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
audio = r.listen(source) # read the entire audio file
text = r.recognize_google(audio)
print("Google Speech Recognition thinks you said " + text)
# Below code is for audio file in hindi
file = "hindi.wav"
with sr.AudioFile(file) as source:
audio = r.listen(source)
text = r.recognize_google(audio, language='hi-IN')
print("Text : " + text)

Playing live audio from a website using PyAudio

I want to build a small application capable of playing audio streams from internet radio services.
I already found this code that saves the code as a mp3-File, but I'd like to instantly play the sound.
import requests
stream_url = "http://uk5.internet-radio.com:8097/;stream"
r = requests.get(stream_url, stream=True)
with open('stream.wav', 'wb') as f:
try:
for block in r.iter_content(1024):
f.write(block)
except KeyboardInterrupt:
pass
This code doesn't work as it doesn't produce any useable sound and a lot underflowed errors occur:
stream_url = 'http://149.56.147.197:8064/;stream/1'#"""
r = requests.get(stream_url, stream=True)
pya = pyaudio.PyAudio()
stream = pya.open(format=pyaudio.paInt16, frames_per_buffer=2048, channels=2, rate=44100, output=True)
for block in r.iter_content(2048):
data = array.array("h", block)
stream.write(data,exception_on_underflow=True)
stream.stop_stream()
stream.close()
The error message:
Traceback (most recent call last):
File "/Users/bahe007/Desktop/pythonRadio.py", line 15, in <module>
stream.write(data,exception_on_underflow=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pyaudio.py", line 586, in write
exception_on_underflow)
IOError: [Errno -9980] Output underflowed
Does anybody have ideas how I could achieve my live-streaming feature for internet radio in Python?

Categories