I recently copied a bunch of audio files, which are feedback left during a phone call.
The vast majority of them are mp3, but a small percentage are files ending in a .ul extension, which I believe is ULAW.
I have tried to play them in Audacity and VLC, but get garbled sounds. I suspect they are corrupted, but I'd like to confirm that by attempting to convert them to another audio format.
Would anyone be able to recommend a library to do that?
I know Python has the audioop module but I do not know enough to start messing with the audio data.
Related
I have some some audio samples (from SampleSwap) which I am working with in pydub. Most of them have a sample-depth / bits per sample of 16, while others are 24 or 32. Looks something like this:
import pydub
a = pydub.AudioSegment.from_file('16bit_file.wav')
b = pydub.AudioSegment.from_file('24bit_file.wav')
The problem I am running into is when I try to get them to play back:
from pydub.playback import play
play(a)
play(b)
While the 16-bit files play normally, the 24-bit files are all Earth-shatteringly loud, like seriously to the point of potential speaker damage. With my computer set to minimum volume, the 24-bit play back is about as loud as regular music would play back on maximum volume. It's super distorted, sharp, and clipped.
I'm pretty sure I've isolated it to be a problem of bit-depth. The sounds all play normally when played in other software. I can convert the problem sounds to be 16-bit either using sox or using pydub.AudioSegment.set_sample_width(2) and the issue goes away. I have also gone directly through simpleaudio to do the playback (copying the code from pydub, here) and get the same issue.
The main problem is I am writing some code for working with audio which I would like to share, but I do not want users to experience the physical or mental damage from hearing one of these busted sounds. My only idea of a workaround is to immediately convert the bit-depth of any use loaded sounds/lock audio playback to 16-bit files only; this works for the files I am testing, but a) I don't know if it holds true for all sounds/computers, and b) I thought this shouldn't be an issue in pydub anyway. I also thought to somehow check the volume of the sound before playing (using e.g. a.dBFS or a.max), but I haven't found anything that seems to be reliable (either the metric isn't really correlated with the volume, or the value seems to be more of an indication of the dynamic range provided by the extra bits).
So my questions are:
Why do I get this alarmingly loud, distorted playback in pydub when playing non-16-bit files?
What can I do to prevent it?
Am I missing something obvious here about audio playback?
I understand this is (hopefully) not so reproducible; I could try to record it and post if that would be helpful. I can also point out the sounds I am using on SampleSwap, but the problem really seems to be caused by any file that is not 16-bit (i.e. I can convert a sound to be 32-bit and generate the issue).
Here's some version info:
ffmpeg 4.4
PyAudio 0.2.11
pydub 0.25.1
simpleaudio 1.0.4
And the issue is on a 2019 MacBook Pro, Catalina 10.15.7. I've also tested my Windows 10 Desktop (with similar versions as above), but rather than the issue above, I just get silence.
What i'm trying to do:
Hi!
I'm trying to store the Video and Audio information from a video file. I would like to store video frames and audio frames separately in different variables.
My intention is to manage video/files and do some actions with the audio and video frame list, but to do what I'm plannign to do I need to store this audio/video frames separately. I've read a lot of questions in StackOverflow about python and audio/video managing.
Most people recommend to use OpenCV or ffmpeg to manage videos. I saw some scripts using these libraries to get video(only video) frames, but none of them are getting audio, most of them are just getting video frames and save them as RGB images. I also check some scripts where people get audio frames from a mp3 file, but I'm not sure if you can do that in a video file
Most important thing to me is to know the best way to manage video and audio separately. I'm not looking for people to do my code, just asking to point me in a good direction.
One of the things I'm trying to do is to send this information via socket, but as I said I need the audio and video frames to be in separated variables (yes, i'm wondering about an stream app, but that's not the only thing I'm trying to do)
I know I should give more information, and maybe show some code, but I don't have any concret code I tried some things, but I've never been capable to separate audio and video. I know that each format has his own encryption, and at the end I decided to use "mp4" as video format but I don't know neither if this is the best format for what I'm trying to do.
Resume:
Is openCV the best way to manage video and audio separately ?
Wich is the easiest way to separate video and audio frames ? Is it possible ?
Wich is the best documentation I should read to learn about video/audio management ?
I would like to do the things with my own code, and use in the less way possible openCV or other libraries.
My "basic" idea is to get a "list" of audio and video frames, and then I would like to do some operations, but right now I can't find the best way for me to manage a vide using python. I even wonder if could be possible to manage a video as raw data
I need to know wich is the best library to manage videos using python, for me the best library, will be the one that allows me to manage the videos more "freely"
I've already checked:
I've read too many questions on this theme, the most recent are :
How to extract audio from video file
Split audio video separately from given video using MLT
Embed audio video in python gui
I have an audio track in AIFF format. I would like to open this audio file with Python, and import the amplitudes of the sound and perform some mathematical analysis such as Fourier Transform, etc.
Is this possible in Python?
Are there libraries or modules, which allow me to acquire an audio file?
Throughout my search, I have found scipy.io.wavfile, which works for WAV audio files.
Are there other libraries to import audio files in Python?
Is there something similar for AIFF files?
Obviously, I can convert the AIFF into a WAV file, but I would like to import the AIFF file directly, if possible.
As a side question: are there some more specific (by specific, I mean better than Python) programming languages to perform such kind of analysis and acquisition of audio files?
Python comes with AIFF support as part of the standard library -- see the aifc module.
This module provides support for reading and writing AIFF and AIFF-C
files. AIFF is Audio Interchange File Format, a format for storing
digital audio samples in a file. AIFF-C is a newer version of the
format that includes the ability to compress the audio data.
Depending on what your end goals are, you may be more productive using a tool like PureData that's designed just for working with audio and has things like reading audio files and performing ffts as primitives.
Yes, I also came across this problem using scipy.io.wavfile. I looked up the problem and see that Scikits might be interesting to get around this wave only solution.
https://sites.google.com/site/ldpyproject/scikits-audiolab
As for Pure Data I use this a lot, but of course it does depend on what you wishing to do with your sound file...?
I use Windows 7. All I want to do is create raw audio and stream it to a speaker. After that, I want to create classes that can generate sine progressions (basically, a tone that slowly gets more and more shrill). After that, I want to put my raw audio into audio codecs and containers like .WAV and .MP3 without going insane. How might I be able to achieve this in Python without using dependencies that don't come with a standard install?
I looked up a great deal of files, descriptions, and related questions from here and all over the internet. I read about PCM and ADPCM, as well as A/D Converters. Where I get lost is somewhere between the ratio of byte input --> Kbps output, and all that stuff.
Really, all I want is for somebody to please be able to point me in the right direction to learn the audio formats precisely, and how to use them in Python (but first I want to start with raw audio).
This questions really has 2 parts:
How do I generate audio signals
How do I play audio signals through the speakers.
I wrote a simple wrapper around the python std lib's wave module, called pydub, which you can look at (on github) as a point of reference for how to manipulate raw audio data.
I generally just export the audio data to a file and then play it using VLC player. IMHO there's no reason to write a bunch of code to playback audio unless you're making a synthesizer or a game or some other realtime app.
Anyway, I hope that helps you get started :)
I have two .wav files that I need to compare and decide if they contain the same words (same order too).
I have been searching for the best method for a while now. I can't figure out how to have pyspeech use a file as input. I've tried getting the CMU sphinx project working but I cant seem to get GStreamer to work with Python 27 let alone their project. I've messed around with DragonFly as well with no luck.
I am using Win7 64bit with Python27. Does anyone have any ideas?
Any help is greatly appreciated.
You could try PySpeech. For some more info see pyspeech (python) - Transcribe mp3 files?. I have never used this, but I believe it leverages the built in speech recognition engine of Windows. This will let you convert the Wav files to text and then you can do a text compare.
To use the Windows speech engine and use a wav file for input there are two requirements.
Use an inproc recognizer (SpeechRecognitionEngine). Shared recognizers cannot use Wav files as input.
On the recognizer object call SetInputToWaveFile to specify your input wav file.
You may have to resample the wav files because the speech recognition engines only support certain sample rates.
8 bits per sample
single channel mono
22,050 samples per second
PCM encoding
works well on Windows. See https://stackoverflow.com/a/6203533/90236 for some more info.
For some more background on the windows speech engines, you might take a look at SAPI and Windows 7 Problem and What is the difference between System.Speech.Recognition and Microsoft.Speech.Recognition?