pyttsx3 forced allineament during text to speech genaration

pyttsx3 forced allineament during text to speech genaration - python

I'm tryng to create a video creation bot, using Python, pyttsx3 and moviepy.
I'm having some trouble with forced allineament( matching audio and the text of the audio shown on video). I would like to make the forced allineament during the text-to-speech genaration. I thought that a possible solution could be to split the input string and generate more audio files. But I'd rather find a way to know what the length of the generated audio is up to a certain word/location.
Thank you ;)

Related

How to use Transcribe() in Python Collab to convert runtime audio to text?

I have implemented speech recognition to record runtime audio from a microphone using JAVASCRIPT code in Collab because Pyaudio is causing issues. Now I have to transcribe audio to text in runtime (i.e without saving it into the file). I didn't find any appropriate function for this except Transcribe(). But I do not know How to use this function, passing audio and returning text.
How to use Transcribe function? or is there any other method to transcribe runtime audio to text?

Speech to ext conversion

I am trying to covert audio file to text.I have seen many post on it.But nothing is working for me.Could you please suggest me pythonic way to covert speech to text.Thank you.

Thats a pretty big question, but my best suggestion to you is to go with something like https://pypi.python.org/pypi/pygsr
from pygsr import Pygsr
speech = Pygsr()
speech.record(3) # durationin seconds (3)
phrase, complete_response =speech.speech_to_text('en_US') # select the language
print phrase
Edit1: If its an audio file you could open a google voice account, call yourself, play the audio file to leave as a voicemail. Google converts all your messages to text so it's essentially a free transcription service.

Python get Audio/Video frames separately from video file

What i'm trying to do:
Hi!
I'm trying to store the Video and Audio information from a video file. I would like to store video frames and audio frames separately in different variables.
My intention is to manage video/files and do some actions with the audio and video frame list, but to do what I'm plannign to do I need to store this audio/video frames separately. I've read a lot of questions in StackOverflow about python and audio/video managing.
Most people recommend to use OpenCV or ffmpeg to manage videos. I saw some scripts using these libraries to get video(only video) frames, but none of them are getting audio, most of them are just getting video frames and save them as RGB images. I also check some scripts where people get audio frames from a mp3 file, but I'm not sure if you can do that in a video file
Most important thing to me is to know the best way to manage video and audio separately. I'm not looking for people to do my code, just asking to point me in a good direction.
One of the things I'm trying to do is to send this information via socket, but as I said I need the audio and video frames to be in separated variables (yes, i'm wondering about an stream app, but that's not the only thing I'm trying to do)
I know I should give more information, and maybe show some code, but I don't have any concret code I tried some things, but I've never been capable to separate audio and video. I know that each format has his own encryption, and at the end I decided to use "mp4" as video format but I don't know neither if this is the best format for what I'm trying to do.
Resume:
Is openCV the best way to manage video and audio separately ?
Wich is the easiest way to separate video and audio frames ? Is it possible ?
Wich is the best documentation I should read to learn about video/audio management ?
I would like to do the things with my own code, and use in the less way possible openCV or other libraries.
My "basic" idea is to get a "list" of audio and video frames, and then I would like to do some operations, but right now I can't find the best way for me to manage a vide using python. I even wonder if could be possible to manage a video as raw data
I need to know wich is the best library to manage videos using python, for me the best library, will be the one that allows me to manage the videos more "freely"
I've already checked:
I've read too many questions on this theme, the most recent are :
How to extract audio from video file
Split audio video separately from given video using MLT
Embed audio video in python gui

Python Creating raw audio

I use Windows 7. All I want to do is create raw audio and stream it to a speaker. After that, I want to create classes that can generate sine progressions (basically, a tone that slowly gets more and more shrill). After that, I want to put my raw audio into audio codecs and containers like .WAV and .MP3 without going insane. How might I be able to achieve this in Python without using dependencies that don't come with a standard install?
I looked up a great deal of files, descriptions, and related questions from here and all over the internet. I read about PCM and ADPCM, as well as A/D Converters. Where I get lost is somewhere between the ratio of byte input --> Kbps output, and all that stuff.
Really, all I want is for somebody to please be able to point me in the right direction to learn the audio formats precisely, and how to use them in Python (but first I want to start with raw audio).

This questions really has 2 parts:
How do I generate audio signals
How do I play audio signals through the speakers.
I wrote a simple wrapper around the python std lib's wave module, called pydub, which you can look at (on github) as a point of reference for how to manipulate raw audio data.
I generally just export the audio data to a file and then play it using VLC player. IMHO there's no reason to write a bunch of code to playback audio unless you're making a synthesizer or a game or some other realtime app.
Anyway, I hope that helps you get started :)

Python Speech Compare

I have two .wav files that I need to compare and decide if they contain the same words (same order too).
I have been searching for the best method for a while now. I can't figure out how to have pyspeech use a file as input. I've tried getting the CMU sphinx project working but I cant seem to get GStreamer to work with Python 27 let alone their project. I've messed around with DragonFly as well with no luck.
I am using Win7 64bit with Python27. Does anyone have any ideas?
Any help is greatly appreciated.

You could try PySpeech. For some more info see pyspeech (python) - Transcribe mp3 files?. I have never used this, but I believe it leverages the built in speech recognition engine of Windows. This will let you convert the Wav files to text and then you can do a text compare.
To use the Windows speech engine and use a wav file for input there are two requirements.
Use an inproc recognizer (SpeechRecognitionEngine). Shared recognizers cannot use Wav files as input.
On the recognizer object call SetInputToWaveFile to specify your input wav file.
You may have to resample the wav files because the speech recognition engines only support certain sample rates.
8 bits per sample
single channel mono
22,050 samples per second
PCM encoding
works well on Windows. See https://stackoverflow.com/a/6203533/90236 for some more info.
For some more background on the windows speech engines, you might take a look at SAPI and Windows 7 Problem and What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.