Transcribe an Audio File in Python - python

I'm trying to transcribe an audio file which is bit large. It's properties are as follows.
Size : 278.3 MB
Duration : 52 minutes
Format : WAV
Follwoing is my code which I used to convert it having 60 second durations. Could you please advice to transcribe this file at once?
import speech_recognition as sr
r = sr.Recognizer()
with sr.AudioFile('sampleMp3.WAV') as source:
audio = r.record(source, duration=60)
command = r.recognize_google(audio)
text_file = open("Output.txt", "w")
text_file.write(command)
text_file.close()

speech_recognition python package is just a wrapper, it does not provide even basic functions.
If you want to use Google Speech API (paid), you can do something like this:
https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/speech/cloud-client/transcribe_async.py
If you want to consider Bing, it also provides similar API, see How can I transcribe a speech file with the Bing Speech API in Python?
For the free alternative consider https://github.com/alumae/kaldi-offline-transcriber

Instead Of Transcirbing Using Python, Use Nuance Dragon Instead.
https://www.nuance.com/en-nz/dragon/dragon-anywhere/free-trial.html
The Best Transcribing Software.

Related

How to get words timestamps using speech_recognition in Python?

I want to determine, when exactly a speech in the audio file starts and ends. Firstly, I am using speech_recognition library to determine speech content of the audio file:
import speech_recognition as sr
filename = './dir_name/1.wav'
r = sr.Recognizer()
with sr.AudioFile(filename) as source:
audio_data = r.record(source)
text = r.recognize_google(audio_data, language = "en-US")
print(text)
Running this code I get a correct output (speech recognized content of the audio file). How can I determine, where exactly the speech starts and ends? Let's assume the file contains ONLY speech. I tried different methods, e. g. by analysing amplitude of the signal, its envelope etc. but I do not get high efficiency. So I started using speech_recognition library with a hope it could be useful here.

Python: Get system audio in speech recognition instead of microphone

I am working on speech recognition in python, but it is only getting the input from Micropohone. How is it possible to give the audio from speakers as input to the speech recognition library?
The piece of code is given below:
import speech_recognition as sr
with sr.Microphone() as source: # using microphone here, would like to use speaker instead
print("Talk Something...")
audio=r.listen(source)
print("Time Over...")
import time
try:
t1=time.time()
print("Text: "+r.recognize_google(audio)) # prints after converting speech to text
t2=time.time()
print("T2-T1: ", t2-t1)
except:
print("Didn't understand the audio")
I have been struggling here for so long and any help will be much appreciated. Thanks!
You can configure device index as in docs:
import speech_recognition as sr
for index, name in enumerate(sr.Microphone.list_microphone_names()):
print("Microphone with name \"{1}\" found for `Microphone(device_index={0})`".format(index, name))
If LINEIN is not available as a separate input, you might just configure it as a recording source in audio properties.

'Audio data must be audio data' error with google speech recognition in python

I am trying to load an audio file in python and process it with google speech recognition
The problem is that unlike in C++, python doesn't show data types, classes, or give you access to memory to convert between one data type and another by creating a new object and repacking data
I dont understand how it's possible to convert from one data type to another in python
The code in question is below,
import speech_recognition as spr
import librosa
audio, sr = librosa.load('sample_data/metal.mp3')
# create a speech recognition object
r = spr.Recognizer()
r.recognize_google(audio)
The error is:
audio_data must be audio data
How do I convert the audio object to be used in google speech recognition
#Mich, I hope you have found a solution by now. If not, please try the below.
First, convert the .mp3 format to .wav format using other methods as a pre-process step.
import speech_recognition as sr
# Create an instance of the Recognizer class
recognizer = sr.Recognizer()
# Create audio file instance from the original file
audio_ex = sr.AudioFile('sample_data/metal.wav')
type(audio_ex)
# Create audio data
with audio_ex as source:
audiodata = recognizer.record(audio_ex)
type(audiodata)
# Extract text
text = recognizer.recognize_google(audio_data=audiodata, language='en-US')
print(text)
You can select the speech language from https://cloud.google.com/speech-to-text/docs/languages
Additionally you can set the minimum threshold for the loudness of the audio using below command.
recognizer.set_threshold = 300 # min threshold set to 300
Librosa returns numpy array, you need to convert it back to wav. Something like this:
raw_audio = np.int16(audio/np.max(np.abs(audio)) * 32767).tobytes()
You probably better load mp3 with ffmpeg wrapper without librosa things, librosa does strange things with the audio (normalizes, etc). Its better to work with raw data.
Try this with speech recognizer:
import speech_recognition as spr
with spr.WavFile('sample_data/metal.mp3') as source:
audio = r.record(source)
r = spr.Recognizer()
r.recognize_google(audio)

Google Cloud Speech-to-Text API - Waiting infinitely

I`m trying to use Google Cloud Speech-to-Text API.
I converted mp3 audio file format to .raw as I understood from API documentation, and uploaded to bucket storage.
Here is my code:
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.FLAC,
sample_rate_hertz=16000,
language_code='en-US')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result()
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
transcribe_gcs("gs://cloudh3-200314.appspot.com/cs.raw")
What I`m doing wrong?
I faced a similar issue, this is something to do with the format that is acceptable. Even though you may have converted into RAW, there still could be something wrong with the format, it wouldn't give you output if it can't read the file.
I recently processed a 56 min audio that took 17 mins so that should give you an idea of how long it should be.
Process your file using sox, I found the conversion parameters that work using the command -
sox basefile.mp3 -r 16000 -c 1 newfile.flac

Audio file to text file python

I want to convert an audio(ex: ".mp3") file to text file. I have tried different approaches like pyspeech and speech recognition, But i didn't get any answer. Is there any other way to do this..? Any help would be appreciated !
Did you try https://pypi.python.org/pypi/SpeechRecognition/ ? That sounds like exactly what you want.
I also found the CMU Sphinx project via this blog. It has Python bindings too (as mentioned in the article).
The other item I found was Google's Speech to Text API. You might want to check that out too. Here's a decent tutorial on this subject:
http://codeabitwiser.com/2014/09/python-google-speech-api/
import speech_recognition as sr
print(sr.__version__)
r = sr.Recognizer()
file_audio = sr.AudioFile('file_audio.wav')
with file_audio as source:
audio_text = r.record(source)
print(type(audio_text))
print(r.recognize_google(audio_text))
way 1: convet audio file to bytes (0,1) with https://github.com/jiaaro/pydub or by f = open("test.mp3", "rb") first16bytes = f.read(16)
way 2: audio to speech convertors.eg.-convert to english or other language with pip libraries like SpeechRecognition pydub. (but i think you don't asked for this)
way 3: convert mp3 to Json. If anyone did this, then please share.

Categories