pyttsx3 prints the current word being uttered - python

i basically want the tts to talk while printing out what it is saying.
i'v pretty much copied and pasted the pyttsx3 documentation to do this but it just would not work.
import pyttsx3
def onStart(name):
print ('starting', name)
def onWord(name, location, length):
print ('word', name, location, length)
def onEnd(name, completed):
print ('finishing', name, completed)
engine = pyttsx3.init()
engine.connect('started-utterance', onStart)
engine.connect('started-word', onWord)
engine.connect('finished-utterance', onEnd)
engine.say('The quick brown fox jumped over the lazy dog.')
engine.runAndWait()
and the result is this. the word event only fires after the speaking was complete and none of the words are actually printed.
starting None
word None 1 0
finishing None True
iv been working on this for days, iv tried other libraries like win32com.client.Dispatch('SAPI.Spvoice') and gtts, but none seems to be able to do what I want. Sapi.spvoice seems to have an event which would do what I want it, but I cant seem to get that to work either. though I'm not sure I'm doing it correctly either. https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms723593(v=vs.85)
from win32com.client import Dispatch
import win32com.client
class ContextEvents():
def onWord():
print("the word event occured")
# Work with Result
s = Dispatch('SAPI.Spvoice')
e = win32com.client.WithEvents(s, ContextEvents)
s.Speak('The quick brown fox jumped over the lazy dog.')
from what I understood, there needs to be a class for the events and the event must in the form of On(event) in that class. or something.
i tried installing espeak but that did not work out either.
keep in mined I'm kinda of a newb in python so if anyone would be willing to give a thorough explination that would be really great.

So I'm not familiar with that library, but most likely what's happening is the stream is getting generated and played before the events are able to be passed off to the wrapper library. I can say that AWS's Polly will output word-level timing information if you want to use that - you'd need two calls - one to get the audio stream and the other to get the ssml metadata.
The Windows .net System.Speech.Synthesis library does have progress events that you could listen for, but I don't know if there's a python library to wrap that.
However, if you're willing to run a powershell command from python then you can try using this gist I wrote, which wraps the Windows synthesis functionality and outputs the word timings. Here's an example that should get you what you want:
$text = "hello world! this is a long sentence with many words";
$sampleRate = 24000;
# generate tts and save bytes to memory (powershell variable)
# events holds event timings
# NOTE: assumes out-ssml-winrt.ps1 is in current directory, change as needed...
$events = .\out-ssml-winrt.ps1 $text -Variable 'soundstream' -SampleRate $sampleRate -Channels 1 -SpeechMarkTypes 'words';
# estimate duration based on samplerate (rough)
$estimatedDurationMilliseconds = $global:soundstream.Length / $sampleRate * 1000;
$global:e = $events;
# add a final event at the end of the loop to wait for audio to complete
$events += #([pscustomobject]#{ type = 'end'; time = $estimatedDurationMilliseconds; value = '' });
# create background player
$memstream = [System.IO.MemoryStream]::new($global:soundstream);
$player = [System.Media.SoundPlayer]::new($memstream)
$player.Play();
# loop through word events
$now = 0;
$events | % {
$word = $_;
# milliseconds into wav file event happens
$when = $word.time;
# distance from last timestamp to this event
$delta = $when - $now;
# wait until right time to display
if ($delta -gt 0) {
Start-sleep -Milliseconds $delta;
}
$now = $when;
# output word
Write-Output $word.value;
}
# just to let you know - audio should be finished
Write-Output "Playback Complete";
$player.Stop(); $player.Dispose(); $memstream.Dispose();

Related

is it possible to pass data from one python program to another python program? [duplicate]

Is it possible -- other than by using something like a .txt/dummy file -- to pass a value from one program to another?
I have a program that uses a .txt file to pass a starting value to another program. I update the value in the file in between starting the program each time I run it (ten times, essentially simultaneously). Doing this is fine, but I would like to have the 'child' program report back to the 'mother' program when it is finished, and also report back what files it found to download.
Is it possible to do this without using eleven files to do it (that's one for each instance of the 'child' to 'mother' reporting, and one file for the 'mother' to 'child')? I am talking about completely separate programs, not classes or functions or anything like that.
To operate efficently, and not be waiting around for hours for everything to complete, I need the 'child' program to run ten times and get things done MUCH faster. Thus I run the child program ten times and give each program a separate range to check through.
Both programs run fine, I but would like to get them to run/report back and forth with each other and hopefully not be using file 'transmission' to accomplish the task, especially on the child-mother side of the transferring of data.
'Mother' program...currently
import os
import sys
import subprocess
import time
os.chdir ('/media/')
#find highest download video
Hival = open("Highest.txt", "r")
Histr = Hival.read()
Hival.close()
HiNext = str(int(Histr)+1)
#setup download #1
NextVal = open("NextVal.txt","w")
NextVal.write(HiNext)
NextVal.close()
#call download #1
procs=[]
proc=subprocess.Popen(['python','test.py'])
procs.append(proc)
time.sleep(2)
#setup download #2-11
Histr2 = int(Histr)/10000
Histr2 = Histr2 + 1
for i in range(10):
Hiint = str(Histr2)+"0000"
NextVal = open("NextVal.txt","w")
NextVal.write(Hiint)
NextVal.close()
proc=subprocess.Popen(['python','test.py'])
procs.append(proc)
time.sleep(2)
Histr2 = Histr2 + 1
for proc in procs:
proc.wait()
'Child' program
import urllib
import os
from Tkinter import *
import time
root = Tk()
root.title("Audiodownloader")
root.geometry("200x200")
app = Frame(root)
app.grid()
os.chdir('/media/')
Fileval = open('NextVal.txt','r')
Fileupdate = Fileval.read()
Fileval.close()
Fileupdate = int(Fileupdate)
Filect = Fileupdate/10000
Filect2 = str(Filect)+"0009"
Filecount = int(Filect2)
while Fileupdate <= Filecount:
root.title(Fileupdate)
url = 'http://www.yourfavoritewebsite.com/audio/encoded/'+str(Fileupdate)+'.mp3'
urllib.urlretrieve(url,str(Fileupdate)+'.mp3')
statinfo = os.stat(str(Fileupdate)+'.mp3')
if statinfo.st_size<10000L:
os.remove(str(Fileupdate)+'.mp3')
time.sleep(.01)
Fileupdate = Fileupdate+1
root.update_idletasks()
I'm trying to convert the original VB6 program over to Linux and make it much easier to use at the same time. Hence the lack of .mainloop being missing. This was my first real attempt at anything in Python at all hence the lack of def or classes. I'm trying to come back and finish this up after 1.5 months of doing nothing with it mostly due to not knowing how to. In research a little while ago I found this is WAY over my head. I haven't ever did anything with threads/sockets/client/server interaction so I'm purely an idiot in this case. Google anything on it and I just get brought right back here to stackoverflow.
Yes, I want 10 running copies of the program at the same time, to save time. I could do without the gui interface if it was possible for the program to report back to 'mother' so the mother could print on the screen the current value that is being searched. As well as if the child could report back when its finished and if it had any file that it downloaded successfully(versus downloaded and then erased due to being to small). I would use the successful download information to update Highest.txt for the next time the program got ran.
I think this may clarify things MUCH better...that or I don't understand the nature of using server/client interaction:) Only reason time.sleep is in the program was due to try to make sure that the files could get written before the next instance of the child program got ran. I didn't know for sure what kind of timing issue I may run into so I included those lines for safety.
This can be implemented using a simple client/server topology using the multiprocessing library. Using your mother/child terminology:
server.py
from multiprocessing.connection import Listener
# client
def child(conn):
while True:
msg = conn.recv()
# this just echos the value back, replace with your custom logic
conn.send(msg)
# server
def mother(address):
serv = Listener(address)
while True:
client = serv.accept()
child(client)
mother(('', 5000))
client.py
from multiprocessing.connection import Client
c = Client(('localhost', 5000))
c.send('hello')
print('Got:', c.recv())
c.send({'a': 123})
print('Got:', c.recv())
Run with
$ python server.py
$ python client.py
When you talk about using txt to pass information between programs, we first need to know what language you're using.
Within my knowledge of Java and Python achi viable despite laborious depensendo the amount of information that wants to work.
In python, you can use the library that comes with it for reading and writing txt and schedule execution, you can use the apscheduler.

Get the duration of a URL-based Media object with python-vlc - Cannot parse

I'm trying to use the python 2.7 python-vlc to parse then get the duration of a music track from a URL. Parsing doesn't work and playing then pausing the media returns -1 for the duration occasionally.
There are two ways I know of to parse media, which has to be done before using media.get_duration(). I can parse it, or I can play it.
No matter what, I cannot parse the media. Using parse_with_options() gives me parsed status MediaParsedStatus.skipped for everything except for parse_with_option(1,0)which gives me parsed status MediaParsedStatus.FIXME_(0L)
p = vlc.MediaPlayer(songurl)
media = p.get_media()
media.parse_with_options(1, 0)
print media.get_parsed_status()
print media.get_duration()
The string "songurl" is the actual streaming URL of a song from Youtube or Google Play Music, which works perfectly fine with the MediaPlayer.
I have also tried playing the media for short 0.01 to 0.5 second periods then attempting to get the time, which works MOST OF THE TIME but randomly returns a duration of -1 about 1 in 10 times. Using media.get_duration() again returns the same result.
I would prefer to just parse the song rather than worry about playing it, but I can't figure out any way to parse it.
I already submitted a bug report to the python-vlc github since I figure MediaParsedStatus.FIXME_(0L) is some sort of bug.
UPDATE: I GOT IT! This was possibly the biggest pain in all my programming career (which isnt much). Here's the code used to get the time for a URL track:
instance = vlc.Instance()
media = instance.media_new(songurl)
player = instance.media_player_new()
player.set_media(media)
#Start the parser
media.parse_with_options(1,0)
while True:
if str(media.get_parsed_status()) == 'MediaParsedStatus.done':
break #Might be a good idea to add a failsafe in here.
print media.get_duration()
media.parse_with_options is asynchronous. So your code isn't waiting for a response from URL, it's just immediately moving on. As with all asynchronous methods, you need to receive a notification that the data has been received and then you can move on. In this case it looks like it is the MediaParsedChanged event.
https://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__media.html#ga55f5a33e22aa32e17a9bb75decd1497b
Alternatively, you should be able to use the parse() method which is synchronous and will block until the meta data is received. This isn't recommended (and it's deprecated) because it could block indefinitely and lock up. But it is an option depending on what you are using the code for.
https://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__media.html#ga4b71084fb35b3dd8cc6457a4d27baf0c
EDIT:
If you need an example of using the event manager with the python bindings, here is a great example:
VLC Python EventManager callback type?
Particularly, look at Rolf's answer as the way he is using it might be a good starting point for you.
import vlc
parseReady = 0
def ParseReceived(event):
global parseReady
#set a flag that your data is ready
parseReady = 1
...
events = player.event_manager()
events.event_attach(vlc.EventType.MediaParsedChanged, ParseReceived)
...
parseReady = 0
media.parse_with_options(1, 0)
while parseReady == 0:
#TODO: spin something to waste time
#Once the flag is set, your data is ready
print media.get_parsed_status()
print media.get_duration()
There are undoubtedly better ways to do it but that's a basic example. Note, according to the documentation, you can not call vlc methods from within an event callback. Thus the use of a simple flag rather that calling the media methods directly in the callback.
libvlc will not parse network resources by default. You need to call parse with options with libvlc_media_parse_network.

Python MP3Play with Threading

I am trying to create an MP3Player with python (not using any fancy GUIs, just basic command-line). You are able to input commands like "playlist" which prints all songs or "play [num]" which plays the specified song in your playlist. I can do this all in the one thread, but what I want is to create another thread so you can do more commands like "add song" or "delete song" while the actual music is playing (instead of the command line waiting for the music to finish). Here is what I have with one thread:
import mp3play, time
clip = mp3play.load(song)
clip.play()
time.sleep(clip.seconds()) #Waits the duration of the song
clip.stop()
#Command line asks for input after this
This works all fine and dandy, but when I try to implement threading into this, like this:
import mp3play, time
def play(songname):
clip = mp3play.load(song)
clip.play()
time.sleep(clip.seconds()) #Waits the duration of the song
clip.stop()
#Get user input here
#Check if user inputted command
#User inputed "play"
thread.start_new_thread(play,("some_random_song.mp3",))
It glitches out. It all seems fine until you close the application half way through the song and the music still keeps running. To stop the music, I have to open Task Manager and end the task. So I thought about having a "stop" command as well, which wouldn't close the thread, but it would use
clip.stop()
I don't know what happens if you try to stop() a clip that isn't running, so I implemented a prevention system (boolean running that checks if it is or not). But now nothing works, so far here is my code:
def play(filename):
global clip
clip.play()
time.sleep(clip.seconds())
clip.stop()
playing = False
clip = ""
#Get user input
#blah blah blah command stuff
#User inputted "play"
thread.start_new_thread(play,("some_random_song.mp3",))
playing = True
#Goes back to command line
#User inputted 'stop' this time
if playing:
clip.stop()
playing = False
But when I try to run this, it gets to clip.play() in the thread but doesnt start it. Im not sure how I can get around this, and if it's possible to do this without threading. Thanks in advance.
It would be better to play MP3's using a different process, using multiprocessing.Process.
Write a function that takes the path to an MP3 file, and start that as a Process.
For technical reasons, the dominant python implementation (from python.org) restricts threading so that only one thread at a time can be executing python bytecode. This will probably never be glitch free.

How can I get (and set) current bash cursor position when using python readline?

I have a python script that takes manages the stdin, stdout, and stderr of any application and allows for readline to be inserted gracefully. Think of any application that has lots of console output, but also accepts commands from stdin.
In any case, my script uses these two functions:
def blank_current_readline():
# Next line said to be reasonably portable for various Unixes
(rows,cols) = struct.unpack('hh', fcntl.ioctl(sys.stdout, termios.TIOCGWINSZ,'1234'))
text_len = len(readline.get_line_buffer())+2
# ANSI escape sequences (All VT100 except ESC[0G)
sys.stdout.write('\x1b[2K') # Clear current line
sys.stdout.write('\x1b[1A\x1b[2K'*(text_len/cols)) # Move cursor up and clear line
sys.stdout.write('\x1b[0G') # Move to start of line
def print_line(line):
global cmd_state
blank_current_readline()
print line,
sys.stdout.write(cmd_state["prompt"] + readline.get_line_buffer())
sys.stdout.flush()
When handling stdout, I call print_line(). This blanks whatever the user might be typing, prints the line, then restores the user's input text. This all happens without the user noticing a thing.
The problem occurs when the cursor is not at the end of whatever input the user is typing. When the cursor is in the middle of the test and a line is printed, the cursor will automatically be placed at the end of the input. To solve this, I want to do something like this in print_line:
def print_line(line):
global cmd_state
cursorPos = getCurrentCursorPos() #Doesn't exist
blank_current_readline()
print line,
sys.stdout.write(cmd_state["prompt"] + readline.get_line_buffer())
sys.stdout.setCurrentCursorPos(cursorPos) #Doesn't exist
sys.stdout.flush()
Edit: To try and visualize what I have written:
The terminal looks like this:
----------------------------------------------
| |
| |
| <scolling command output here> |
| |
| <scolling command output here> |
| |
|: <user inputted text here> |
----------------------------------------------
So the output text is constantly scrolling as new logs are coming through. At the same time, the user is currently editing and writing a new command that will be inserted once the hit enter. So it looks like the python console, but with output always being appended.
Might I suggest Python curses?
Here is the Basic how-to
The curses module provides an interface to the curses library, the de-facto standard for portable advanced terminal handling.
While curses is most widely used in the Unix environment, versions are available for DOS, OS/2, and possibly other systems as well. This extension module is designed to match the API of ncurses, an open-source curses library hosted on Linux and the BSD variants of Unix.
Alternatively
I found Terminal Controller here: Using terminfo for portable color output & cursor control. It looks to be more portable than the sitename would suggest (MacOS mentioned in the comments - though with changes).
Here is a usage example, displaying a progress bar:
class ProgressBar:
"""
A 3-line progress bar, which looks like::
Header
20% [===========----------------------------------]
progress message
The progress bar is colored, if the terminal supports color
output; and adjusts to the width of the terminal.
"""
BAR = '%3d%% ${GREEN}[${BOLD}%s%s${NORMAL}${GREEN}]${NORMAL}\n'
HEADER = '${BOLD}${CYAN}%s${NORMAL}\n\n'
def __init__(self, term, header):
self.term = term
if not (self.term.CLEAR_EOL and self.term.UP and self.term.BOL):
raise ValueError("Terminal isn't capable enough -- you "
"should use a simpler progress dispaly.")
self.width = self.term.COLS or 75
self.bar = term.render(self.BAR)
self.header = self.term.render(self.HEADER % header.center(self.width))
self.cleared = 1 #: true if we haven't drawn the bar yet.
self.update(0, '')
def update(self, percent, message):
if self.cleared:
sys.stdout.write(self.header)
self.cleared = 0
n = int((self.width-10)*percent)
sys.stdout.write(
self.term.BOL + self.term.UP + self.term.CLEAR_EOL +
(self.bar % (100*percent, '='*n, '-'*(self.width-10-n))) +
self.term.CLEAR_EOL + message.center(self.width))
def clear(self):
if not self.cleared:
sys.stdout.write(self.term.BOL + self.term.CLEAR_EOL +
self.term.UP + self.term.CLEAR_EOL +
self.term.UP + self.term.CLEAR_EOL)
self.cleared = 1

Dragon NaturallySpeaking Programmers

Is there anyway to encorporate Dragon NaturallySpeaking into an event driven program? My boss would really like it if I used DNS to record user voice input without writing it to the screen and saving it directly to XML. I've been doing research for several days now and I can not see a way for this to happen without the (really expensive) SDK, I don't even know that it would work then.
Microsoft has the ability to write a (Python) program where it's speech recognizer can wait until it detects a speech event and then process it. It also has the handy quality of being able to suggest alternative phrases to the one that it thinks is the best guess and recording the .wav file for later use. Sample code:
spEngine = MsSpeech()
spEngine.setEventHandler(RecoEventHandler(spEngine.context))
class RecoEventHandler(SpRecoContext):
def OnRecognition(self, StreamNumber, StreamPosition, RecognitionType, Result):
res = win32com.client.Dispatch(Result)
phrase = res.PhraseInfo.GetText()
#from here I would save it as XML
# write reco phrases
altPhrases = reco.Alternates(NBEST)
for phrase in altPhrases:
nodePhrase = self.doc.createElement(TAG_PHRASE)
I can not seem to make DNS do this. The closest I can do-hickey it to is:
while keepGoing == True:
yourWords = raw_input("Your input: ")
transcript_el = createTranscript(doc, "user", yourWords)
speech_el.appendChild(transcript_el)
if yourWords == 'bye':
break
It even has the horrible side effect of making the user say "new-line" after every sentence! Not the preferred solution at all! Is there anyway to make DNS do what Microsoft Speech does?
FYI: I know the logical solution would be to simply switch to Microsoft Speech but let's assume, just for grins and giggles, that that is not an option.
UPDATE - Has anyone bought the SDK? Did you find it useful?
Solution: download Natlink - http://qh.antenna.nl/unimacro/installation/installation.html
It's not quite as flexible to use as SAPI but it covers the basics and I got almost everything that I needed out of it. Also, heads up, it and Python need to be downloaded for all users on your machine or it won't work properly and it works for every version of Python BUT 2.4.
Documentation for all supported commands is found under C:\NatLink\NatLink\MiscScripts\natlink.txt after you download it. It's under all the updates at the top of the file.
Example code:
#make sure DNS is running before you start
if not natlink.isNatSpeakRunning():
raiseError('must start up Dragon NaturallySpeaking first!')
shutdownServer()
return
#connect to natlink and load the grammer it's supposed to recognize
natlink.natConnect()
loggerGrammar = LoggerGrammar()
loggerGrammar.initialize()
if natlink.getMicState() == 'off':
natlink.setMicState('on')
userName = 'Danni'
natlink.openUser(userName)
#natlink.waitForSpeech() continuous loop waiting for input.
#Results are sent to gotResultsObject method of the logger grammar
natlink.waitForSpeech()
natlink.natDisconnect()
The code's severely abbreviated from my production version but I hope you get the idea. Only problem now is that I still have to returned to the mini-window natlink.waitForSpeech() creates to click 'close' before I can exit the program safely. A way to signal the window to close from python without using the timeout parameter would be fantastic.

Categories