I'm trying to build something in python that can analyze an uploaded mp3 and generate the necessary data to build a waveform graphic. Everything I've found is much more complex than I need. Ultimately, I'm trying to build something like you'd see on SoundCloud.
I've been looking into numpy and fft's, but it all seem more complicated than I need. What's the best approach to this? I'll build the actual graphic using canvas, so don't worry about that part of it, I just need the data to plot.
An MP3 file is an encoded version of a waveform. Before you can work with the waveform, you must first decode the MP3 data into a PCM waveform. Once you have PCM data, each sample represents the waveform's amplitude at the point in time. If we assume an MP3 decoder outputs signed, 16-bit values, your amplitudes will range from -16384 to +16383. If you normalize the samples by dividing each by 16384, the waveform samples will then range between +/- 1.0.
The issue really is one of MP3 decoding to PCM. As far as I know, there is no native python decoder. You can, however, use LAME, called from python as a subprocess or, with a bit more work, interface the LAME library directly to Python with something like SWIG. Not a trivial task.
Plotting this data then becomes an exercise for the reader.
I suggest you using Pygame if you don't want to deal with the inner workings of the mp3 file format.
Pygame is a multimedia library which can open common audio file formats - including .mp3 and .ogg as "Sound" objects - if you have Numpy instaled underneath, you can browse the uncompressed (and therefore, post fft transforms) sound, using the pygame.sndarray.array call - which returns a numpy array object with the sound samples.
I've found a little trick - be shure to call pygame.mixer.init with the same parameters (for frequency, bit sample size and n.of channels) as your .mp3 file has, or the call to sndarray.array may raise an Exception.
Check the documentation at http://www.pygame.org/docs/
Related
I'm working on a project and try to compare audio file with another one.
I want to know the hz height of each signal in the audio and than check if it is accurate note or not.
For this I have tried librosa library but with it you need to crate mel spectogram and than work on the spec in order to find notes...
is there a simpler way?I prefer without CNN
Thanks
*maybe it's not so complicated to convert the spectogram as I thought...waiting to your help.
I am working with wav files analysis using the librosa library in python. I used librosa.load() to load the audio file. Apparently this function loads the wav file into a numpy array with normalised amplitude values in the range -1 to 1. But I need to get the actual amplitude values for processing. How can I find that?
Thanks in advance!
You observed correctly that librosa always normalizes the samples to mono [-1:1] (and also 22050 Hz). That said, it's digital audio, so could multiply with whatever you want to get a different scale. If you insist, that your samples are on a scale of -2^15 to 2^15, simply multiply with 2^15. It pretty much means the same.
You won't gain anything, except dragging a peculiarity of the encoding audio format into your data.
That said, if that's what you want, you could use PySoundFile like this:
import soundfile as sf
y, sr = sf.read('existing_file.wav', dtype='int16')
The parameter dtype='int16' tells the library to assume a signed 16bit format per sample.
You can't. As Hendrik mentioned, the signal is digital and the amplitude in the WAV file won't tell you anything about the actual sound wave amplitude / sound power. That's completely lost the moment it was digitalised to WAV.
That being said, you can compute e.g. loudness, a relative perception of the sound power. If you are dealing with human auditory system, one of the recommended approaches is to:
Use to the Bark scale (Bark scale better reflects how we hear).
Compute energy in each bin.
(Optional) Normalise by the overall sum.
If you don't want to compute it yourself, check out e.g. YAAFE.
I'm an experienced Python programmer with plenty of image manipulation and computer vision experience. I'm very familiar with all of the standard tools like PIL, Pillow, opencv, numpy, and scikit-image.
How would I go about reading an image into a Python data format like a nested list, bytearray, or similar, if I only had the standard library to work with?
I realize that different image formats have different specifications. My question is how I would even begin to build a function that reads any given format.
NOTE Python 2.6 had a jpeg module in the standard library that has since been deprecated. Let's not discuss that since it is unsupported.
If you're asking how to implement these formats "from scratch" (since the standard libraries don't do this), then a good starting point would be the format specification.
For PNG, this is https://www.w3.org/TR/2003/REC-PNG-20031110/. It defines the makeup of a PNG stream, consisting of the signature (eight bytes, 8950 4e47 0d0a 1a0a, which identifies the file as a PNG image) and a number of data chunks that contain meta data, palette information and the image itself. (It's certainly a substantial project to take on, if you really don't want to use the existing libraries, but not overly so.)
For BMP, it's a bit easier since the file already contains the uncompressed pixel data and you only need to know how to find the size and offset; some of the format definition is on Wikipedia (https://en.wikipedia.org/wiki/BMP_file_format) and here: http://www.digicamsoft.com/bmp/bmp.html
JPG is much trickier. The file doesn't store pixels, but rather "wavelets" which are transformed into the pixel map you see on the screen. To read this format, you'll need to implement this transformation function.
I have one source wav. file where i have recorded 5 tones separated by silence.
I have to make 5 different wav. files containing only this tones (without any silence)
I am using scipy.
I was trying to do sth similar as in this post: constructing a wav file and writing it to disk using scipy
but it does not work for me.
Can you please advise me how to do it ?
Thanks in advance
You need to perform silence detection to identify the tones starts and ends. You can use a measurement of magnitude, such as the root mean square.
Here's how to do a RMS using scipy
Depending on your noise levels, this may be hard to do programmatically. You may want to consider simply using a audio editor, such as audacity
I'm having a little bit of programing and conversion trouble. I'm designing an AI to recognize notes played by instruments and need to extract the raw sound data from a wave file. My objective is to perform a FFT operation over chunks of time in the file for use by the AI. For this I need an amplitude list of the audio file, but I can't seem to find a conversion technique that will work. The files start as MP3's and then I convert them to wav file, but I always end up with a compressed file that spits out gibberish when I try to read it. Does anyone know how I might convert the wav file to something that would be compatible with Python's wave module or even something that would directly convert the data into an amplitude list?
The default Python wave module isn't very thorough. You might try the one included in scipy as an alternative.
Check out: Reading *.wav files in Python
If you're going to do any numerical heavy lifting with the audio, scipy might be your best option anyway.
I believe Python can read .dat files. You can use SoX to turn mp3s or wavs or whatever into .dat files that are simply a text list of "time - Left amp - Right amp"
The code is simply
sox soundfile.mp3 soundfile.dat
http://sox.sourceforge.net/
Sox is command line - I run it with Terminal on my mac, but anything that understands Bash or Linux commands should work depending on what cpu you're using.
Hope that helps!
You might want to look at Pure Data too, it's got some nice FFT transforms built into an intuitive graphical programming language.