I have two .wav files that I need to compare and decide if they contain the same words (same order too).
I have been searching for the best method for a while now. I can't figure out how to have pyspeech use a file as input. I've tried getting the CMU sphinx project working but I cant seem to get GStreamer to work with Python 27 let alone their project. I've messed around with DragonFly as well with no luck.
I am using Win7 64bit with Python27. Does anyone have any ideas?
Any help is greatly appreciated.
You could try PySpeech. For some more info see pyspeech (python) - Transcribe mp3 files?. I have never used this, but I believe it leverages the built in speech recognition engine of Windows. This will let you convert the Wav files to text and then you can do a text compare.
To use the Windows speech engine and use a wav file for input there are two requirements.
Use an inproc recognizer (SpeechRecognitionEngine). Shared recognizers cannot use Wav files as input.
On the recognizer object call SetInputToWaveFile to specify your input wav file.
You may have to resample the wav files because the speech recognition engines only support certain sample rates.
8 bits per sample
single channel mono
22,050 samples per second
PCM encoding
works well on Windows. See https://stackoverflow.com/a/6203533/90236 for some more info.
For some more background on the windows speech engines, you might take a look at SAPI and Windows 7 Problem and What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?
Related
I'm tryng to create a video creation bot, using Python, pyttsx3 and moviepy.
I'm having some trouble with forced allineament( matching audio and the text of the audio shown on video). I would like to make the forced allineament during the text-to-speech genaration. I thought that a possible solution could be to split the input string and generate more audio files. But I'd rather find a way to know what the length of the generated audio is up to a certain word/location.
Thank you ;)
I recently copied a bunch of audio files, which are feedback left during a phone call.
The vast majority of them are mp3, but a small percentage are files ending in a .ul extension, which I believe is ULAW.
I have tried to play them in Audacity and VLC, but get garbled sounds. I suspect they are corrupted, but I'd like to confirm that by attempting to convert them to another audio format.
Would anyone be able to recommend a library to do that?
I know Python has the audioop module but I do not know enough to start messing with the audio data.
I use Windows 7. All I want to do is create raw audio and stream it to a speaker. After that, I want to create classes that can generate sine progressions (basically, a tone that slowly gets more and more shrill). After that, I want to put my raw audio into audio codecs and containers like .WAV and .MP3 without going insane. How might I be able to achieve this in Python without using dependencies that don't come with a standard install?
I looked up a great deal of files, descriptions, and related questions from here and all over the internet. I read about PCM and ADPCM, as well as A/D Converters. Where I get lost is somewhere between the ratio of byte input --> Kbps output, and all that stuff.
Really, all I want is for somebody to please be able to point me in the right direction to learn the audio formats precisely, and how to use them in Python (but first I want to start with raw audio).
This questions really has 2 parts:
How do I generate audio signals
How do I play audio signals through the speakers.
I wrote a simple wrapper around the python std lib's wave module, called pydub, which you can look at (on github) as a point of reference for how to manipulate raw audio data.
I generally just export the audio data to a file and then play it using VLC player. IMHO there's no reason to write a bunch of code to playback audio unless you're making a synthesizer or a game or some other realtime app.
Anyway, I hope that helps you get started :)
I am trying to use the Opencv VideoWriter object with the mpeg-1 encoding to create videos, I am aiming at writing only two images on that video, using mpeg-1 encoding, I would like to know how much the first image that I wrote first helps in compressing the second image. In other words find the file size before writing the 2nd image and after. My questions are:
Is there any way to perform this process using Opencv?
Is there a way to avoid writing on disks and just have the information of the size of the compreesed video( after adding the second image)?
Is there any other good alternatives reach my goals?
I suggest you learn GStreamer framework which has Python bindings available.
http://gstreamer.freedesktop.org/modules/gst-python.html
It works best on Linux platforms, some OSX support is available.
GStreamer provides "sane", but very powerful and very complex, APIs for procedural video and audio generation.
See also:
GStreamer: status of Python bindings and encoding video with mixed audio
Alternative you can write out frames to raw image images files and parse them to a video using ffmpeg command. Might work on Microsoft Windows platforms too.
I am looking for a high level audio library that supports crossfading for python (and that works in linux). In fact crossfading a song and saving it is about the only thing I need.
I tried pyechonest but I find it really slow. Working with multiple songs at the same time is hard on memory too (I tried to crossfade about 10 songs in one, but I got out of memory errors and my script was using 1.4Gb of memory). So now I'm looking for something else that works with python.
I have no idea if there exists anything like that, if not, are there good command line tools for this, I could write a wrapper for the tool.
A list of Python sound libraries.
Play a Sound with Python
PyGame or Snack would work, but for this, I'd use something like audioop.
— basic first steps here : merge background audio file
A scriptable solution using external tools AviSynth and avs2wav or WAVI:
Create an AviSynth script file:
test.avs
v=ColorBars()
a1=WAVSource("audio1.wav").FadeOut(50)
a2=WAVSource("audio2.wav").Reverse.FadeOut(50).Reverse
AudioDub(v,a1+a2)
Script fades out on audio1 stores that in a1 then fades in on audio2 and stores that in a2.
a1 & a2 are concatenated and then dubbed with a Colorbar screen pattern to make a video.
You can't just work with audio alone - a valid video must be generated.
I kept the script as simple as possible for demonstration purposes. Google for more details on audio processing via AviSynth.
Now using avs2wav (or WAVI) you can render the audio:
avs2wav.exe test.avs combined.wav
or
wavi.exe test.avs combined.wav
Good luck!
Some references:
How to edit with Avisynth
AviSynth filters reference