I'm currently working on a project where I request a phone call (Mp3) and have to make an automatic transcript through a python script.
I'm using the Azure Speech to text services and got that all working, but that service only supports a Wav. file and I am still stuck at that part.
import azure.cognitiveservices.speech as speechsdk
import time
from os import path
from pydub import AudioSegment
import requests
import hashlib
OID = ***
string = f"***"
encoded = string.encode()
result = hashlib.sha256(encoded)
resultHash = (result.hexdigest())
r = requests.get(f"***", headers={f"***":f"{***}"})
Telefoongesprek = r
# converts audio file (mp3 to Wav.)
#src = Telefoongesprek
#dst = "Telefoongesprek #****.wav"
#sound = AudioSegment.from_mp3(src)
#sound.export(dst, format="wav")
def speech_recognize_continuous_from_file():
speech_config = speechsdk.SpeechConfig(subscription="***", region="***")
speech_config.speech_recognition_language = "nl-NL"
audio_config = speechsdk.audio.AudioConfig(filename="Telefoongesprek #****.wav")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
done = False
def stop_cb(evt):
print('CLOSING on {}'.format(evt))
nonlocal done
done = True
all_results = []
def handle_final_result(evt):
all_results.append(evt.result.text)
#speech_recognizer.recognizing.connect(handle_final_result)
speech_recognizer.recognized.connect(handle_final_result)
speech_recognizer.session_started.connect(handle_final_result)
speech_recognizer.session_stopped.connect(handle_final_result)
speech_recognizer.canceled.connect(handle_final_result)
speech_recognizer.session_stopped.connect(stop_cb)
speech_recognizer.canceled.connect(stop_cb)
speech_recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
speech_recognizer.stop_continuous_recognition()
print(all_results)
speech_recognize_continuous_from_file()
Thats the code im using without all the keys and encryption, and everthing works apart from the convert from MP3 to Wav.
is there any way I can save the requested file locally in this script and pass it through in:
audio_config = speechsdk.audio.AudioConfig(filename="Telefoongesprek #****.wav"). or do I have to save it to the pc and do it another way.
I have been stuck on this problem for over a week and have tried many different ways.
Thanks in advance!
Beau van der Meer
You should be able to save the response data ( you can access the raw bytes with r.content) to a .mp3 file locally and then pass that file path to pydub.
with open('path/to/local/file.mp3', 'wb') as f:
f.write(r.content)
Another option is to use the module io.BytesIO from the standard library.
If you pass it raw bytes, e g import io; f = io.BytesIO(r.content), it will give you a object that behaves like an open filehandle back, which you can pass to functions accepting files. I didn't check that pydub method you are trying to use accepts filehandles or only paths, so you have to check that first.
Related
I'm trying to implement google's speech api but every time I try to run the program, the terminal goes unresponsive. It seems that the program runs until the line "response = client.recognize(config, audio)" and just gets stuck at that point. Here's a picture of my code, I pulled most of it straight from google's cloud platform documentation.
def transcribe_file(speech_file):
import os
import io
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="api-key.json"
"""Transcribe the given audio file."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
import io
client = speech.SpeechClient()
with io.open(speech_file, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding='FLAC',
sample_rate_hertz=16000,
language_code='en-US')
print(config)
response = client.recognize(config, audio)
# Each result is for a consecutive portion of the audio. Iterate through
# them to get the transcripts for the entire audio file.
for result in response.results:
# The first alternative is the most likely one for this portion.
print(u'Transcript: {}'.format(result.alternatives[0].transcript))
transcribe_file(audio/file/name.wav)
I had the same problem when building my app with PyInstaller. Make sure you have
the certifi folder with the file cacert.pem in it. Those should come with the library when you run pip install google-cloud-speech. If not reinstall it.
I was facing the same issue, tried changing
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="api-key.json"
to
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]= r"full/path/api-key.json"
on window.
Here is my python code..
import pyttsx3;
engine = pyttsx3.init(driverName='sapi5')
infile = "tanjil.txt"
f = open(infile, 'r')
theText = f.read()
f.close()
engine.say(theText)
engine.runAndWait()
I couldn't save the file to audio file
As of July 14 2019, I'm able to save to file with the pyttsx3 library (without using another library or internet connection).
It doesn't appear to be documented, but looking at the source code in github for the Engine class in "engine.py" (https://github.com/nateshmbhat/pyttsx3/blob/master/pyttsx3/engine.py), I was able to find a "save_to_file" function:
def save_to_file(self, text, filename, name=None):
'''
Adds an utterance to speak to the event queue.
#param text: Text to sepak
#type text: unicode
#param filename: the name of file to save.
#param name: Name to associate with this utterance. Included in
notifications about this utterance.
#type name: str
'''
self.proxy.save_to_file(text, filename, name)
I am able to use this like:
engine.save_to_file('the text I want to save as audio', path_to_save)
Not sure the format - it's some raw audio format (I guess it's maybe something like aiff) - but I can play it in an audio player.
If you install pydub:
https://pypi.org/project/pydub/
then you can easily convert this to mp3, e.g.:
from pydub import AudioSegment
AudioSegment.from_file(path_to_save).export('converted.mp3', format="mp3")
I've tried #Brian's solution but it didn't work for me.
I searched around a bit and I couldn't figure out how to save the speech to mp3 in pyttx3 but I found another solution without pyttx3.
It can take a .txt file and directly output a .wav file,
def txt_zu_wav(eingabe, ausgabe, text_aus_datei = True, geschwindigkeit = 2, Stimmenname = "Zira"):
from comtypes.client import CreateObject
engine = CreateObject("SAPI.SpVoice")
engine.rate = geschwindigkeit # von -10 bis 10
for stimme in engine.GetVoices():
if stimme.GetDescription().find(Stimmenname) >= 0:
engine.Voice = stimme
break
else:
print("Fehler Stimme nicht gefunden -> Standard wird benutzt")
if text_aus_datei:
datei = open(eingabe, 'r')
text = datei.read()
datei.close()
else:
text = eingabe
stream = CreateObject("SAPI.SpFileStream")
from comtypes.gen import SpeechLib
stream.Open(ausgabe, SpeechLib.SSFMCreateForWrite)
engine.AudioOutputStream = stream
engine.speak(text)
stream.Close()
txt_zu_wav("test.txt", "test_1.wav")
txt_zu_wav("It also works with a string instead of a file path", "test_2.wav", False)
This was tested with Python 3.7.4 on Windows 10.
import pyttsx3
engine = pyttsx3.init("sapi5")
voices = engine.getProperty("voices")[0]
engine.setProperty('voice', voices)
text = 'Your Text'
engine.save_to_file(text, 'name.mp3')
engine.runAndWait() # don't forget to use this line
Try the following code snippet to convert text to audio and save it as an mp3 file.
import pyttsx3
from pydub import AudioSegment
engine = pyttsx3.init('sapi5')
engine.save_to_file('This is a test phrase.', 'test.mp3') # raw audio file
engine.runAndWait()
AudioSegment.from_file('test.mp3').export('test.mp3', format="mp3") # audio file in mp3 format
NB: pyttsx3 save_to_file() method creates a raw audio file and it won't be useful for other applications to use even if we are able to play it in the media player. pydub is a useful package to convert raw audio into a specific format.
The following is my code (I made some slight changes to the original example code):
import io
import os
# Imports the Google Cloud client library
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
file_name = os.path.join(
os.path.dirname(__file__),
'C:\\Users\\louie\\Desktop',
'TOEFL2.mp3')
# Loads the audio into memory
with io.open(file_name, 'rb') as audio_file:
content = audio_file.read()
audio = types.RecognitionAudio(content=content)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code='en-US')
# Detects speech in the audio file
response = client.recognize(config, audio)
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
text_file = open("C:\\Users\\louie\\Desktop\\Output.txt", "w")
text_file.write('Transcript: {}'.format(result.alternatives[0].transcript))
text_file.close()
I can only directly run this code in my windows prompt command since otherwise, the system cannot know the GOOGLE_APPLICATION_CREDENTIALS. However, when I run the code, nothing happened. I followed all the steps and I could see the request traffic changed on my console. But I cannot see any transcript. Could someone help me out?
You are trying to decode TOEFL2.mp3 file encoded as MP3 while you specify LINEAR audio encoding with
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16
You have to convert mp3 to wav first, see information about AudioEncoding
I'm using Google Speech API and since i'm using LongRunning functions for wav files, and they're all in pt-BR language, they're returning with content such as "voc\303\252 hoje boa noite cart\303\243o".
How can I convert this back to UTF-8?
I already tried .encode function, and already tried to check if there's any parameter to send, but I cannot find anything.
# [START def_transcribe_gcs]
def transcribe_gcs(gcs_uri):
"""Asynchronously transcribes the audio file specified by the gcs_uri."""
from google.cloud import speech
from google.cloud.speech import enums
from google.cloud.speech import types
client = speech.SpeechClient()
audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
encoding=enums.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=8000,
language_code='pt-BR')
operation = client.long_running_recognize(config, audio)
print('Waiting for operation to complete...')
response = operation.result(timeout=300)
# Print the first alternative of all the consecutive results.
for result in response.results:
print('Transcript: {}'.format(result.alternatives[0].transcript))
print('Confidence: {}'.format(result.alternatives[0].confidence))
##This part is mine, the rest of the code belongs to Google
file = open("Test.txt", "wb")
file.write(str(response.results))
file.close()
# [END def_transcribe_gcs]
I'm trying to write a python script that will play mp3 from Soundcloud URL
This is what I've already done:
from urllib.request import urlopen
url = "soundcloud.com/artist/song.mp3"
u = urlopen(url)
data = u.read(1024)
while data:
player.play(data)
data = u.read(1024)
I tried pyaudio with many options like changing formats, channels, rate.
and I just get strange sound from the speakers, I searched Google for pyaudio playing mp3 and didn't found any information.
I tried pygame by creating Sound object by passing the bytes from the mp3 and then just by executing the play function. I am not getting any errors: the script runs but nothing is playing.
I'm working with Python 3 and Ubuntu.
If you happen to have VLC installed (or are willing to install it), then this should work:
import vlc
p = vlc.MediaPlayer("http://your_mp3_url")
p.play()
This has the advantage that it works with everything VLC works with, not just MP3. It can also be paused if you want to.
You can install vlc for python using
pip install python-vlc
Check if you can download file manually using that URL. If its protected site with username/passwd, you may need to take care of that first.
If not, here is a working code that downloads file from url using urllib2 and then plays it using pydub.
Its a two step process where first mp3 file is downloaded and saved to file and then played using external player.
import urllib2
from pydub import AudioSegment
from pydub.playback import play
mp3file = urllib2.urlopen("http://www.bensound.org/bensound-music/bensound-dubstep.mp3")
with open('./test.mp3','wb') as output:
output.write(mp3file.read())
song = AudioSegment.from_mp3("./test.mp3")
play(song)
** Update **
You did mention that you need streaming from web. In that case you may want to look at GStreamer with Python Bindings
Here is a SO link for that.
Sorry but I do not have Python3 to test here, to stream mp3 using pyaudio you will need decode it in PCM data, I know that pymedia can do it, but it is too old and just support python27.
To do this the right way you will need to know some attributes of your audio, things like samplerate, number of channels, bit resolution, to set it in the pyaudio.
I can show how I do it using python27 + pyaudio, first I will show how it is done to stream .wav
from urllib2 import urlopen
#to python3.x
#from urllib.request import urlopen
import pyaudio
pyaud = pyaudio.PyAudio()
srate=44100
stream = pyaud.open(format = pyaud.get_format_from_width(1),
channels = 1,
rate = srate,
output = True)
url = "http://download.wavetlan.com/SVV/Media/HTTP/WAV/NeroSoundTrax/NeroSoundTrax_test4_PCM_Mono_VBR_8SS_44100Hz.wav"
u = urlopen(url)
data = u.read(8192)
while data:
stream.write(data)
data = u.read(8192)
choose large buffer, python is slow in while loop, i did it using chunks of size 8192, note that format, channels and rate are the rigth attributes for this wav file, so for .wav we not need decode, it is a PCM data, now for mp3 we will need decode and put in PCM format to stream.
Lets try using pymedia
from urllib2 import urlopen
import pyaudio
import pymedia.audio.acodec as acodec
import pymedia.muxer as muxer
dm= muxer.Demuxer( 'mp3' )
pyaud = pyaudio.PyAudio()
srate=44100
stream = pyaud.open(format = pyaud.get_format_from_width(2),
channels = 1,
rate = srate,
output = True)
url = "http://www.bensound.org/bensound-music/bensound-dubstep.mp3"
u = urlopen(url)
data = u.read(8192)
while data:
#Start Decode using pymedia
dec= None
s= " "
sinal=[]
while len( s ):
s= data
if len( s ):
frames= dm.parse( s )
for fr in frames:
if dec== None:
# Open decoder
dec= acodec.Decoder( dm.streams[ 0 ] )
r= dec.decode( fr[ 1 ] )
if r and r.data:
din = r.data;
s=""
#decode ended
stream.write(din)
data = u.read(8192)
This may be secret, because I never saw anyone showing how this can be done in python, for python3 I not know anything that can decode .mp3 into pieces like pymedia do.
Here these two codes are streming and working for .wav and .mp3