Read amplitude data from mp3 - python

I am trying to write some code that will extract the amplitude data from an mp3 as a function of time. I wrote up a rough version on MATLAB a while back using this function: http://labrosa.ee.columbia.edu/matlab/mp3read.html However I am having trouble finding a Python equivalent.
I've done a lot of research, and so far I've gathered that I need to use something like mpg321 to convert the .mp3 into a .wav. I haven't been able to figure out how to get that to work.
The next step will be reading the data from the .wav file, which I also haven't had any success with. Has anyone done anything similar or could recommend some libraries to help with this? Thanks!

You can use the subprocess module to call mpg123:
import subprocess
import sys
inname = 'foo.mp3'
outname = 'out.wav'
try:
subprocess.check_call(['mpg123', '-w', outname, inname])
except CalledProcessError as e:
print e
sys.exit(1)
For reading wav files you should use the wave module, like this:
import wave
import numpy as np
wr = wave.open('input.wav', 'r')
sz = 44100 # Read and process 1 second at a time.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
wr.close()
left, right = da[0::2], da[1::2]
After that, left and right contain the samples of the same channels.
You can find a more elaborate example here.

Here is a project in pure python where you can decode an MP3 file about 10x slower than realtime: http://portalfire.wordpress.com/category/pymp3/
The rest is done by Fourier mathematics etc.:
How to analyse frequency of wave file
and have a look at the python module wave:
http://docs.python.org/2/library/wave.html

The Pymedia library seems to be stable and to deals with what you need.

Related

How can I fix this value error: Incomplete wav chunk error? [duplicate]

My problem
I'm trying to fit a (machine-learning) model that takes in an audiofile (.wav) and predicts the emotion from it (multi-label classification).
I'm trying to read the sample rate and signal from the file, but when calling read(filename) from scipy.io.wavfile, I'm getting ValueError: Incomplete wav chunk.
What I've tried
I've tried switching from scipy.read() to librosa.read().
They both output the signal and sample rate, but for some reason librosa takes exponentially longer time than scipy, and is impractical for my task.
I've tried sr, y = scipi.io.wavfile.read(open(filename, 'r')) as suggested here, to no avail.
I've tried looking into my files and checking what might cause it:
Out of all 2084 wav files, 1057 were good (=scipy managed to read them), and
1027 were bad (=raised the error).
I couldn't seem to find any thing pointing as to what makes a file pass or fail, but nonetheless it's a weird result, as all files are taken from the same dataset from the same origin.
I've heard people saying I could just re-export the files as wav using some software, and it should work.
I didn't try this because a) I don't have any audio-processing software and it seems like an overkill, and b) I want to understand the actual problem rather than put a bandaid on it.
Minimal, reproducible example
Assume filenames is a subset of all my audio files, containing fn_good and fn_bad, where fn_good is an actual file that gets processed, and fn_bad is an actual file that raises an error.
def extract_features(filenames):
for fn in filenames:
sr, y = scipy.io.wavfile.read(fn)
print('Signal is: ', y)
print('Sample rate is: ', sr)
Additional info
Using VLC, it seems that the codecs are supported by scipy.io.wavfile, but in either case, both files have the same codec, so it's weird they don't have the same effect...
Codec of the GOOD file:
Codec of the BAD file:
I don't know why scipy.io.wavfile can't read the file--there might be an invalid chunk in there that other readers simply ignore. Note that even when I read a "good" file with scipy.io.wavfile, a warning (WavFileWarning: Chunk (non-data) not understood, skipping it.) is generated:
In [22]: rate, data = wavfile.read('fearful_song_strong_dogs_act10_f_1.wav')
/Users/warren/mc37/lib/python3.7/site-packages/scipy/io/wavfile.py:273: WavFileWarning: Chunk (non-data) not understood, skipping it.
WavFileWarning)
I can read 'fearful_song_strong_dogs_act06_f_0.wav' using wavio (source code on github: wavio), a package I created that wraps Python's standard wave library with functions that understand NumPy arrays:
In [13]: import wavio
In [14]: wav = wavio.read('fearful_song_strong_dogs_act06_f_0.wav')
In [15]: wav
Out[15]: Wav(data.shape=(198598, 1), data.dtype=int16, rate=48000, sampwidth=2)
In [16]: plot(np.arange(wav.data.shape[0])/wav.rate, wav.data[:,0])
Out[16]: [<matplotlib.lines.Line2D at 0x117cd9390>]
I solve the problem by changing this number "4" to "1" in the file wavefile.py file,
in this condition of the code:
- len(chunk_id) < 1
if not chunk_id:
raise ValueError("Unexpected end of file.")
elif len(chunk_id) < 1:
raise ValueError("Incomplete wav chunk.")
but it was by just intuition and good luck, now i wonder why this works and what are the possible reasons?

Increasing the playback speed of combined wav file in python?

I was trying to combine multiple wav files in python using pydub but the output song's playback speed was kinda slower than I wanted. So I referred to this question and tried the same.
import os, glob
import random
from pydub import AudioSegment
FRAMERATE = 44100 # The frequency of default wav file
OUTPUT_FILE = 'MySong/random.wav'
audio_data = [AudioSegment.from_wav(wavfile)
for wavfile in glob.glob(os.path.join('wav_files/', '*.wav'))]
my_music = sum([random.choice(audio_data)for i in range(100)])
my_music = my_music.set_frame_rate(FRAMERATE * 4)
my_music.export(OUTPUT_FILE, format='wav')
But this isn't working. Is there any technical reason I'm unaware of, or is there any better way of doing it?
to increase pace without changing pitch, you’ll need to do something a little fancier than changing the frame rate (which will give you a “chipmunk” effect).
If you’re dealing with spoken word, you can try stripping out silence with the (unfortunately undocumented) functions in pydub.silence.
You can also look at AudioSegment().speedup() which is a naive attempt at resampling. You can also make a copy of that function and try to improve it (and contribute back to pydub?)

Convolving Room Impulse Response with a Wav File (python)

I have written the following code which is supposed to put echo over an available sound file. Unfortunately the output is a very noisy result which I don't exactly understand. Can anybody help me with regard to this? Is there any skipped step?
#convolving a room impulse response function with a sound sample both of stereo type
from scipy.io import wavfile
inp=wavfile.read(sound_path+sound_file_name)
IR=wavfile.read(IR_path+IR_file_name)
if inp[0]!=IR[0]:
print "Size mismatch"
sys.exit(-1)
else:
rate=inp[0]
print sound_file_name
out_0=fftconvolve(inp[1][:,1],IR[1][:,0])
out_1=fftconvolve(inp[1][:,1],IR[1][:,1])
in_counter+=1
out=np.vstack((out_0,out_1)).T
out[:inp[1].shape[0]]=out[:inp[1].shape[0]]+inp[1]
wavfile.write(sound_path+sound_file_name+'_echoed.wav',rate,out)
Adding echo to a sound file is just that... adding echo. Your code doesn't look like it's adding two sounds together; it looks like it's transforming the input sound into something else.
Your data flow should look something like this:
source sound ------------------------------>|
| + ----------> target sound
---------> convolution echo --------->|
Note that your echo sound is going to be longer than your original sound (i.e. it has a "tail.")
Adding two sounds together is simply a matter of adding each of the individual samples together from both sounds to produce a new output wave. I don't think vstack does that.
Apparently Wav files are imported as int16 files and modification should be done after converting them to floats:
http://nbviewer.ipython.org/github/mgeier/python-audio/blob/master/audio-files/audio-files-with-pysoundfile.ipynb
After convolution one needs to renormalize again. And thats it.
Hope this helps the others too.
from utility import pcm2float,float2pcm
input_rate,input_sig=wavfile.read(sound_path+sound_file_name)
input_sig=pcm2float(input_sig,'float32')
IR_rate,IR_sig=wavfile.read(IR_path+IR_file_name)
IR_sig=pcm2float(IR_sig,'float32')
if input_rate!=IR_rate:
print "Size mismatch"
sys.exit(-1)
else:
rate=input_rate
print sound_file_name
con_len=-1
out_0=fftconvolve(input_sig[:con_len,0],IR_sig[:con_len,0])
out_0=out_0/np.max(np.abs(out_0))
out_1=fftconvolve(input_sig[:con_len,1],IR_sig[:con_len,1])
out_1=out_0/np.max(np.abs(out_1))
in_counter+=1
out=np.vstack((out_0,out_1)).T
wavfile.write(sound_path+sound_file_name+'_'+IR_file_name+'_echoed.wav',rate,float2pcm(out,'int16'))
One can download utility from the above link.
UPDATE: Although it generates a working output its still not as good as the result when using the original website Openair for convolving.

When using the Python Image Library, does open() immediately decompress the image file?

Short question
When using the Python Image Library, does open() immediately decompress the image file?
Details
I would like to measure the decompression time of compressed images (jpeg, png...), as I read that it's supposed to be a good measure of an image's "complexity" (a blank image will be decompressed quickly, and so will a purely random image, since it will not have been compressed at all, so the most "interesting" images are supposed to have the longest decompression time). So I wrote the following python program:
# complexity.py
from PIL import Image
from cStringIO import StringIO
import time
import sys
def mesure_complexity(image_path, iterations = 10000):
with open(image_path, "rb") as f:
data = f.read()
data_io = StringIO(data)
t1 = time.time()
for i in xrange(iterations):
data_io.seek(0)
Image.open(data_io, "r")
t2 = time.time()
return t2 - t1
def main():
for filepath in sys.argv[1:]:
print filepath, mesure_complexity(filepath)
if __name__ == '__main__':
main()
It can be used like this:
#python complexity.py blank.jpg blackandwhitelogo.jpg trees.jpg random.jpg
blank.jpg 1.66653203964
blackandwhitelogo.jpg 1.33399987221
trees.jpg 1.62251782417
random.jpg 0.967066049576
As you can see, I'm not getting the expected results at all, especially for the blank.jpg file: it should be the one with the lowest "complexity" (quickest decompression time). So either the article I read is utterly wrong (I really doubt it, it was a serious scientific article), or PIL is not doing what I think it's doing. Maybe the actual conversion to a bitmap is done lazily, when it's actually needed? But then why would the open delays differ? The smallest jpg file is of course the blank image, and the largest is the random image. This really does not make sense.
Note 1: when running the program multiple times, I get roughly the same results: the results are absurd, but stable. ;-)
Note 2: all images have the same size (width x height).
Edit
I just tried with png images instead of jpeg, and now everything behaves as expected. Cool! I just sorted about 50 images by complexity, and they do look more and more "complex". I checked the article (BTW, it's an article by Jean-Paul Delahaye in 'Pour la Science', April 2013): the author actually mentions that he used only loss-less compression algorithms. So I guess the answer is that open does decompress the image, but my program did not work because I should have used images compressed with loss-less algorithms only (png, but not jpeg).
Glad you got it sorted out. Anyway, the open() method is indeed a lazy operation – as stated in the documentation, to ensure that the image will be loaded, use image.load(), as this will actually force PIL / Pillow to interpret the image data (which is also stated in the linked documentation).

Playing a sound from a wave form stored in an array

I'm currently experimenting with generating sounds in Python, and I'm curious how I can take a n array representing a waveform (with a sample rate of 44100 hz), and play it. I'm looking for pure Python here, rather than relying on a library that supports more than just .wav format.
or use the sounddevice module. Install using pip install sounddevice, but you need this first: sudo apt-get install libportaudio2
absolute basic:
import numpy as np
import sounddevice as sd
sd.play(myarray)
#may need to be normalised like in below example
#myarray must be a numpy array. If not, convert with np.array(myarray)
A few more options:
import numpy as np
import sounddevice as sd
#variables
samplfreq = 100 #the sampling frequency of your data (mine=100Hz, yours=44100)
factor = 10 #incr./decr frequency (speed up / slow down by a factor) (normal speed = 1)
#data
print('..interpolating data')
arr = myarray
#normalise the data to between -1 and 1. If your data wasn't/isn't normalised it will be very noisy when played here
sd.play( arr / np.max(np.abs(arr)), samplfreq*factor)
You should use a library. Writing it all in pure python could be many thousands of lines of code, to interface with the audio hardware!
With a library, e.g. audiere, it will be as simple as this:
import audiere
ds = audiere.open_device()
os = ds.open_array(input_array, 44100)
os.play()
There's also pyglet, pygame, and many others..
Edit: audiere module mentioned above appears no longer maintained, but my advice to rely on a library stays the same. Take your pick of a current project here:
https://wiki.python.org/moin/Audio/
https://pythonbasics.org/python-play-sound/
The reason there's not many high-level stdlib "batteries included" here is because interactions with the audio hardware can be very platform-dependent.
I think you may look this list
http://wiki.python.org/moin/PythonInMusic
It list many useful tools for working with sound.
To play sound given array input_array of 16 bit samples. This is modified example from pyadio documentation page
import pyaudio
# instantiate PyAudio (1)
p = pyaudio.PyAudio()
# open stream (2), 2 is size in bytes of int16
stream = p.open(format=p.get_format_from_width(2),
channels=1,
rate=44100,
output=True)
# play stream (3), blocking call
stream.write(input_array)
# stop stream (4)
stream.stop_stream()
stream.close()
# close PyAudio (5)
p.terminate()
Here's a snippet of code taken from this stackoverflow answer, with an added example to play a numpy array (scipy loaded sound file):
from wave import open as waveOpen
from ossaudiodev import open as ossOpen
from ossaudiodev import AFMT_S16_NE
import numpy as np
from scipy.io import wavfile
# from https://stackoverflow.com/questions/307305/play-a-sound-with-python/311634#311634
# run this: sudo modprobe snd-pcm-oss
s = waveOpen('example.wav','rb')
(nc,sw,fr,nf,comptype, compname) = s.getparams( )
dsp = ossOpen('/dev/dsp','w')
print(nc,sw,fr,nf,comptype, compname)
_, snp = wavfile.read('example.wav')
print(snp)
dsp.setparameters(AFMT_S16_NE, nc, fr)
data = s.readframes(nf)
s.close()
dsp.write(snp.tobytes())
dsp.write(data)
dsp.close()
Basically you can just call the tobytes() method; the returned bytearray then can be played.
P.S. this method is supa fast

Categories