Audio file to text file python - python

I want to convert an audio(ex: ".mp3") file to text file. I have tried different approaches like pyspeech and speech recognition, But i didn't get any answer. Is there any other way to do this..? Any help would be appreciated !

Did you try https://pypi.python.org/pypi/SpeechRecognition/ ? That sounds like exactly what you want.
I also found the CMU Sphinx project via this blog. It has Python bindings too (as mentioned in the article).
The other item I found was Google's Speech to Text API. You might want to check that out too. Here's a decent tutorial on this subject:
http://codeabitwiser.com/2014/09/python-google-speech-api/

import speech_recognition as sr
print(sr.__version__)
r = sr.Recognizer()
file_audio = sr.AudioFile('file_audio.wav')
with file_audio as source:
audio_text = r.record(source)
print(type(audio_text))
print(r.recognize_google(audio_text))

way 1: convet audio file to bytes (0,1) with https://github.com/jiaaro/pydub or by f = open("test.mp3", "rb") first16bytes = f.read(16)
way 2: audio to speech convertors.eg.-convert to english or other language with pip libraries like SpeechRecognition pydub. (but i think you don't asked for this)
way 3: convert mp3 to Json. If anyone did this, then please share.

Related

How to get words timestamps using speech_recognition in Python?

I want to determine, when exactly a speech in the audio file starts and ends. Firstly, I am using speech_recognition library to determine speech content of the audio file:
import speech_recognition as sr
filename = './dir_name/1.wav'
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data, language = "en-US")
print(text)
Running this code I get a correct output (speech recognized content of the audio file). How can I determine, where exactly the speech starts and ends? Let's assume the file contains ONLY speech. I tried different methods, e. g. by analysing amplitude of the signal, its envelope etc. but I do not get high efficiency. So I started using speech_recognition library with a hope it could be useful here.

reading .opus audio files in python

I'm trying to use librosa to read an .opus file but it runs forever and doesn't load anything (I've waited for around 30 minutes for a 51MB file and still nothing).
Here is the code I am using
path_to_opus = '/my/path/to/file.opus'
y, sr = librosa.load(path_to_opus, sr=16000)
Is there a good way of reading .opus audio files in python fast?
Thanks!
By looking at the librosa documentation you can specify a res_type field that sounds to be useful for you.
This is a quote from the doc:
res_type : str
By default, this uses resampy’s high-quality mode (‘kaiser_best’).
To use a faster method, set res_type=’kaiser_fast’.
To use scipy.signal.resample, set res_type=’scipy’.
You could try something like:
X, sr = librosa.load('myfile.opus', res_type='kaiser_fast', ...)

'Audio data must be audio data' error with google speech recognition in python

I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)

Speech recognization other languages apart from English in python

I am trying to convert audio to text. The audio is not in English but in Dutch. I am unable to convert the Dutch audio to text. The code works only for English audios. I am not sure if I need to include some functions or options for the code to recognise other languages as well. Below is the code:
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('audio.wav') as source:
audio = r.listen(source)
try:
text = (r.recognize_google)
print(text)
Use:
text = r.recognize_google(audio, language="nl-NL")
you can do any language you just need to know what is your countries domains like for instance German would be: de-DE

Transcribe an Audio File in Python

I'm trying to transcribe an audio file which is bit large. It's properties are as follows.
Size : 278.3 MB
Duration : 52 minutes
Format : WAV
Follwoing is my code which I used to convert it having 60 second durations. Could you please advice to transcribe this file at once?
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('sampleMp3.WAV') as source:
audio = r.record(source, duration=60)
command = r.recognize_google(audio)
text_file = open("Output.txt", "w")
text_file.write(command)
text_file.close()
speech_recognition python package is just a wrapper, it does not provide even basic functions.
If you want to use Google Speech API (paid), you can do something like this:
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe_async.py
If you want to consider Bing, it also provides similar API, see How can I transcribe a speech file with the Bing Speech API in Python?
For the free alternative consider https://github.com/alumae/kaldi-offline-transcriber
Instead Of Transcirbing Using Python, Use Nuance Dragon Instead.
https://www.nuance.com/en-nz/dragon/dragon-anywhere/free-trial.html
The Best Transcribing Software.

Categories