I am working with wav files analysis using the librosa library in python. I used librosa.load() to load the audio file. Apparently this function loads the wav file into a numpy array with normalised amplitude values in the range -1 to 1. But I need to get the actual amplitude values for processing. How can I find that?
Thanks in advance!
You observed correctly that librosa always normalizes the samples to mono [-1:1] (and also 22050 Hz). That said, it's digital audio, so could multiply with whatever you want to get a different scale. If you insist, that your samples are on a scale of -2^15 to 2^15, simply multiply with 2^15. It pretty much means the same.
You won't gain anything, except dragging a peculiarity of the encoding audio format into your data.
That said, if that's what you want, you could use PySoundFile like this:
import soundfile as sf
y, sr = sf.read('existing_file.wav', dtype='int16')
The parameter dtype='int16' tells the library to assume a signed 16bit format per sample.
You can't. As Hendrik mentioned, the signal is digital and the amplitude in the WAV file won't tell you anything about the actual sound wave amplitude / sound power. That's completely lost the moment it was digitalised to WAV.
That being said, you can compute e.g. loudness, a relative perception of the sound power. If you are dealing with human auditory system, one of the recommended approaches is to:
Use to the Bark scale (Bark scale better reflects how we hear).
Compute energy in each bin.
(Optional) Normalise by the overall sum.
If you don't want to compute it yourself, check out e.g. YAAFE.
Related
I'm working on a project and try to compare audio file with another one.
I want to know the hz height of each signal in the audio and than check if it is accurate note or not.
For this I have tried librosa library but with it you need to crate mel spectogram and than work on the spec in order to find notes...
is there a simpler way?I prefer without CNN
Thanks
*maybe it's not so complicated to convert the spectogram as I thought...waiting to your help.
I try to use the librosa and pitch_shift from librosa.
I recorded some my voice and used this code:
sampling_rate= 44100
y, sr = librosa.load(directory, sr=sampling_rate) # y is a numpy array of the wav file, sr = sample rate
y_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4, bins_per_octave=24) # shifted by 4 half steps
librosa.output.write_wav(directory, y_shifted, sr=sampling_rate, norm=False)
It works fine - almost.
I hear some noise in my new voice (after pitch_shifting)
Is there something what I need to use?
Without shift:
https://vocaroo.com/i/s1qEEDvzcUHN
With shift (n_steps = 4):
https://vocaroo.com/i/s0cOiC0cFJSB
Pitch-shifting typically involves an STFT, the shift—usually of a magnitude spectrum along the frequency axis, and then signal reconstruction via the Griffin-Lim-algorithm (Quora-explanation on how Griffin-Lim works).
The problem is that when we shift the magnitude spectrum, we do just that—and ignore the phase! Griffin-Lim tries to find a reasonable solution to find the correct phase when reconstructing the time domain signal, but it's often just that: a reasonable solution, not a perfect one. And that is why you hear this metallic twang. That's the phases of your signal not being quite right (also called "phasiness").
I believe your function call to librosa is perfectly alright. It may just not be the greatest implementation on earth. Give PyRubberband a try. It's based on Rubberband (a C++ library) and has a good reputation.
I have a numpy array that represents audio data(dtype is np.int16). Here is a plot of the audio data(me saying "one, two"):
the sampling rate is 100HZ. I saved this array into a wav file. However, the wav file is not audible from other music players(iTunes, vlc, Audacity etc). It's just complete silence.
Here is how I saved the array:
scipy.io.wavfile.write('output.wav',100,waveform) # 'waveform' is the numpy array
I am wondering what could the cause be ?
sampling rate too low?
amplitude not enough ? i tried to normalize to -32767 to 32767, but still no sound
Any help is appreciated
PS:
This is how the file looks in Audacity(I'm not very familiar with this software):
With a sampling frequency of 100Hz the highest audible frequency you get is 50Hz.
The range of human hearing is from about 20 to about 20000Hz.
For "telephone quality" you need 8000Hz and for "cd quality" you need 44100Hz (that is the standard sampling frequency for consumer audio).
I'm trying to build something in python that can analyze an uploaded mp3 and generate the necessary data to build a waveform graphic. Everything I've found is much more complex than I need. Ultimately, I'm trying to build something like you'd see on SoundCloud.
I've been looking into numpy and fft's, but it all seem more complicated than I need. What's the best approach to this? I'll build the actual graphic using canvas, so don't worry about that part of it, I just need the data to plot.
An MP3 file is an encoded version of a waveform. Before you can work with the waveform, you must first decode the MP3 data into a PCM waveform. Once you have PCM data, each sample represents the waveform's amplitude at the point in time. If we assume an MP3 decoder outputs signed, 16-bit values, your amplitudes will range from -16384 to +16383. If you normalize the samples by dividing each by 16384, the waveform samples will then range between +/- 1.0.
The issue really is one of MP3 decoding to PCM. As far as I know, there is no native python decoder. You can, however, use LAME, called from python as a subprocess or, with a bit more work, interface the LAME library directly to Python with something like SWIG. Not a trivial task.
Plotting this data then becomes an exercise for the reader.
I suggest you using Pygame if you don't want to deal with the inner workings of the mp3 file format.
Pygame is a multimedia library which can open common audio file formats - including .mp3 and .ogg as "Sound" objects - if you have Numpy instaled underneath, you can browse the uncompressed (and therefore, post fft transforms) sound, using the pygame.sndarray.array call - which returns a numpy array object with the sound samples.
I've found a little trick - be shure to call pygame.mixer.init with the same parameters (for frequency, bit sample size and n.of channels) as your .mp3 file has, or the call to sndarray.array may raise an Exception.
Check the documentation at http://www.pygame.org/docs/
I'm working on a project where I need to know the amplitude of sound coming in from a microphone on a computer.
I'm currently using Python with the Snack Sound Toolkit and I can record audio coming in from the microphone, but I need to know how loud that audio is. I could save the recording to a file and use another toolkit to read in the amplitude at given points in time from the audio file, or try and get the amplitude while the audio is coming in (which could be more error prone).
Are there any libraries or sample code that can help me out with this? I've been looking and so far the Snack Sound Toolkit seems to be my best hope, yet there doesn't seem to be a way to get direct access to amplitude.
Looking at the Snack Sound Toolkit examples, there seems to be a dbPowerSpectrum function.
From the reference:
dBPowerSpectrum ( )
Computes the log FFT power spectrum of the sound (at the sample number given in the start option) and returns a list of dB values. See the section item for a description of the rest of the options. Optionally an ending point can be given, using the end option. In this case the result is the average of consecutive FFTs in the specified range. Their default spacing is taken from the fftlength but this can be changed using the skip option, which tells how many points to move the FFT window each step. Options:
EDIT: I am assuming when you say amplitude, you mean how "loud" the sound appears to a human, and not the time domain voltage(Which would probably be 0 throughout the entire length since the integral of sine waves is going to be 0. eg: 10 * sin(t) is louder than 5 * sin(t), but their average value over time is 0. (You do not want to send non-AC voltages to a speaker anyways)).
To get how loud the sound is, you will need to determine the amplitudes of each frequency component. This is done with a Fourier Transform (FFT), which breaks down the sound into it's frequency components. The dbPowerSpectrum function seems to give you a list of the magnitudes (forgive me if this differs from the exact definition of a power spectrum) of each frequency. To get the total volume, you can just sum the entire list (Which will be close, xept it still might be different from percieved loudness since the human ear has a frequency response itself).
I disagree completely with this "answer" from CookieOfFortune.
granted, the question is poorly phrased... but this answer is making things much more complex than necessary. I am assuming that by 'amplitude' you mean perceived loudness. as technically each sample in the (PCM) audio stream represents an amplitude of the signal at a given time-slice. to get a loudness representation try a simple RMS calculation:
RMS
|K<
I'm not sure if this will help, but
skimpygimpy
provides facilities for parsing WAVE files into python
sequences and back -- you could potentially use this
to examine the wave form samples directly and do
what you like. You will have to read some source,
these subcomponents are not documented.