Get VU meter value from audio using Python - python

I found many meanings of the Volume Unit meter (VU meter). For example, the average of loudness of sound, the average of frequencies of sound, and the average of power in dB scale.
I read audio by using AudioSegment and segmented an audio sound into small windows. Then I got an array of values for each window (I guess that the values that I got are amplitudes).
from pydub import AudioSegment
from pydub.utils import get_array_type
#def vu(arr):
#
# return vu_value
sound = AudioSegment.from_file(fullfilename) #also read file
# stereo signal to two mono signal for left and right channel
split_sound = sound.split_to_mono()
left_channel = split_sound[0]
right_channel = split_sound[1]
left_channel = np.array(left_channel.get_array_of_samples())
right_channel = np.array(right_channel.get_array_of_samples())
# print(vu(left_channel))
I would like to know the exact meaning of the VU meter and how to get VU value for each window (ex. formula). I also confused between VU meter, Peak Programme Meter (PPM), and RMS. If anyone knows the answer, please help me.
Thank you

Related

Can I change the amplitude of wav audio file in python from the numpy array?

I'm still starting with audio processing using python and still trying to understand the nature of data and how to process it. I've been stuck trying to increase/decrease the amplitude (volume/loudness if I can say so) of a wav file (without using Pydub Audiosegment) I haven't seen this answered somewhere from what I saw
I extract the audio data using the following code but I don't know what to do next :
import numpy as np
import wave
filename = 'violin.wav'
audiofile = wave.open(filename,'rb')
nch = audiofile.getnchannels()
if nch == 2:
print('Stereo audio file')
elif nch == 1:
print('Mono audio file')
sw = audiofile.getsampwidth()
n_frames = audiofile.getnframes()
fr = audiofile.getframerate()
frames = audiofile.readframes(-1)
typ = { 1: np.int8, 2: np.int16, 4: np.int32 }.get(sw)
data = np.frombuffer(frames,dtype=typ)
I have tried increasing the values of the data array by certain amount but seems that's not how it works I also was trying to do it using the Fourier Transform but I get stuck reversing the process
How can I change the amplitude from the numpy array? is it necessary to go through Fourier Transform for that?
Thank Youu!
Well works fine with be.
The missing piece might be how to convert back to bytes without changing the size (since you did not use unpack, you might have done something wrong when pack. Good news is you can't pack the same way you unpacked)
import numpy as np
import wave
filename = 'violin.wav'
audiofile = wave.open(filename,'rb')
nch = audiofile.getnchannels()
if nch == 2:
print('Stereo audio file')
elif nch == 1:
print('Mono audio file')
sw = audiofile.getsampwidth()
n_frames = audiofile.getnframes()
fr = audiofile.getframerate()
frames = audiofile.readframes(-1)
typ = { 1: np.int8, 2: np.int16, 4: np.int32 }.get(sw)
data = np.frombuffer(frames,dtype=typ)
# Your code, so far
p = audiofile.getparams() # Just to get all params
outfile = wave.open("out.wav", 'wb')
outfile.setparams(p) # same params
outfile.writeframes((data*0.5).astype(typ).tobytes())
outfile.close()
Note that data*0.5 is float type (data//2 would have kept the correct type, but I assume you may want to scale with any scalar value).
So, we need to put back to the correct int type. And you already computed it, so it's easy: I just reuse you typ variabe
(data*0.5).astype(typ) is the data, scale by 0.5, with the correct type.
So (data*0.5).astype(type).tobytes() are the bytes.
Note on adding value
Note that adding a value does nothing (except, if too big, saturating the file). It's a wave. So it's the frequency, and the amplitude that counts. Not the 0.
This remark is not computer science. Just basic physics: sound is the variation of atmospheric pressure. When you hit a La (or A) on a piano, the cord vibrate at 440 Hz, and so it creates an variation of air pressure, at 440 Hz. Whether weather is good and pressure is 102000 Pa, and, because of the piano, oscillate between 101999.9 Pa and 102000.1 Pa, or weather is bad, and pressure oscillate between 98999.9 Pa and 99000.1 Pa, or even you are in a plane, listening to a piano, making pressure in your ear oscillate between 69999.9 Pa and 70000.1 Pa, you won't really think the sound is different. It is the oscilation (its frequency, its amplitude) that matters, not the medium value it oscillates around.
Another way to put it: the direct way to play a sound is to move a membrane following the data in data.
Imagine you have, connected to your computer (in fact, it is what you have, more or less) an engine able to displace a metal plaque. If you send 0 to that engine, plaque is positioned at 0 mm. If you send 1000, it is positioned at 1mm, 2000 -> 2mm. Etc.
Now, if, at a rate of 44100 samples per second (assuming 44100 Hz sampling), you send the data in data, to make the motion of the plaque follows your data, you'll here the sound.
Now, if you add 1000 to all your data, and play it again.
Well, plaque will follows the exact same movement. Only 1mm more to the right. But that is the same movement. It is exactly as if you move yourself 1mm to the left before playing again. Of course, you don't expect any change from that.
On the other hand, if you multiply all the data by 2, and play again, then motion of the plaque will be the same, but twice as big (a displacement of 1mm is turned into one of 2mm). So, not surprisingly, you hear the same sound. But louder.
So, long story short: adding a value to sound samples does nothing. Except maybe making the values go out of the possible range.

Add random noise to tone using python

I am trying to detect sudden loud noises in audio recordings. One way I have found to do this is by creating a spectrogram of the audio and adding the values of each column. By graphing the sum of the values in each column, one can see a spike every time there is a sudden loud noise. The problem is that, in my use case, I need to play a beep tone (with a frequency of 2350 Hz), while the audio is being recorded. The spectrogram of the beep looks like this:
As you can see, at the beginning and the end of this beep (which is a simple tone with a frequency of 2350 Hz), there are other frequencies present, which I have been unsuccessful in removing. These unwanted frequencies cause a spike when summing up the columns of the spectrogram, at the beginning and at the end of the beep. I want to avoid this because I don't want my beep to be detected as a sudden loud noise. See the spectrogram below for reference:
Here is what the graph of the sum of each column in the spectrogram:
Obviously, I want to avoid having false positives in my algorithm. So I need some way of getting rid of the spikes caused by the beginning and end of the beep. One idea that I have had so far is to add random noise with a low decibel value above and/or below the 2350 Hz line in the beep spectrogram above. This would ideally, create a tone that sounds very similar to the original, but instead of creating a spike when I add up all the values in the column, it would create more of a plateau. Is this idea a feasible solution to my problem? If so, how would I go about creating a beep sound that has random noise like I described above using python? Is there another, easier solution to my problem that I am overlooking?
Currently, I am using the following code to generate my beep sound:
import math
import wave
import struct
audio = []
sample_rate = 44100.0
def append_sinewave(
freq=440.0,
duration_milliseconds=500,
volume=1.0):
"""
The sine wave generated here is the standard beep. If you want something
more aggresive you could try a square or saw tooth waveform. Though there
are some rather complicated issues with making high quality square and
sawtooth waves... which we won't address here :)
"""
global audio # using global variables isn't cool.
num_samples = duration_milliseconds * (sample_rate / 1000.0)
for x in range(int(num_samples)):
audio.append(volume * math.sin(2 * math.pi * freq * ( x / sample_rate )))
return
def save_wav(file_name):
# Open up a wav file
wav_file=wave.open(file_name,"w")
# wav params
nchannels = 1
sampwidth = 2
# 44100 is the industry standard sample rate - CD quality. If you need to
# save on file size you can adjust it downwards. The stanard for low quality
# is 8000 or 8kHz.
nframes = len(audio)
comptype = "NONE"
compname = "not compressed"
wav_file.setparams((nchannels, sampwidth, sample_rate, nframes, comptype, compname))
# WAV files here are using short, 16 bit, signed integers for the
# sample size. So we multiply the floating point data we have by 32767, the
# maximum value for a short integer. NOTE: It is theortically possible to
# use the floating point -1.0 to 1.0 data directly in a WAV file but not
# obvious how to do that using the wave module in python.
for sample in audio:
wav_file.writeframes(struct.pack('h', int( sample * 32767.0 )))
wav_file.close()
return
append_sinewave(volume=1, freq=2350)
save_wav("output.wav")
Not really an answer - more of a question.
You're asking the speaker to go from stationary to a sine wave instantaneously - that is quite hard to do (though the frequencies aren't that high). If it does manage it, then the received signal should be the convolution of the top hat and the sine wave (sort of like what you are seeing, but without having some data and knowing what you're doing for the spectrogram it's hard to tell).
In either case you could check this by smoothing the start and end of your tone. Something like this for your tone generation:
tr = 0.05 # rise time, in seconds
tf = duration_milliseconds / 1000 # finish time of tone, in seconds
for x in range(int(num_samples)):
t = x / sample_rate # Time of sample in seconds
# Calculate a bump function
bump_function = 1
if 0 < t < tr: # go smoothly from 0 to 1 at the start of the tone
tp = 1 - t / tr
bump_function = math.e * math.exp(1/(tp**2 - 1))
elif tf - tr < t < tf: # go smoothly from 1 to 0 at the end of the tone
tp = 1 + (t - tf) / tr
bump_function = math.e * math.exp(1/(tp**2 - 1))
audio.append(volume * bump_function * math.sin(2 * math.pi * freq * t))
You might need to tune the rise time a bit. With this form of bump function you know that you have a full volume tone from tr after the start to tr before the end. Lots of other functions exist, but if this smooths the start/stop effects in your spectrogram then you at least know why they are there. And prevention is generally better than trying to remove the effect in post-processing.

Python implementation of MFCC algorithm

I have a database which contains a videos streaming. I want to calculate the LBP features from images and MFCC audio and for every frame in the video I have some annotation. The annotation is inlined with the video frames and the time of the video. Thus, I want to map the time that i have from the annotation to the result of the mfcc. I know that the sample_rate = 44100
from python_speech_features import mfcc
from python_speech_features import logfbank
import scipy.io.wavfile as wav
audio_file = "sample.wav"
(rate,sig) = wav.read(audio_file)
mfcc_feat = mfcc(sig,rate)
print len(sig) # 2130912
print len(mfcc_feat) # 4831
Firstly, why the result of the length of the mfcc is 4831 and how to map that in the annotation that i have in seconds? The total duration of the video is 48second. And the annotation of the video is 0 everywhere except the 19-29sec windows where is is 1. How can i locate the samples within the window (19-29) from the results of the mfcc?
Run
mfcc_feat.shape
You should get (4831,13) . 13 is your MFCC length (default numcep is 13). 4831 is the windows. Default winstep is 10 msec, and this matches your sound file duration. To get to the windows corresponding to 19-29 sec, just slice
mfcc_feat[1900:2900,:]
Remember, that you can not listen to the MFCC. It just represents the slice of audio of 0.025 sec (default value of winlen parameter).
If you want to get to the audio itself, it is
sig[time_beg_in_sec*rate:time_end_in_sec*rate]

Python change pitch of wav file [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I need any python library to change pitch of my wav file without any raw audio data processing.
I spent couple hours to find it, but only found some strange raw data processing code snippets and video, that shows real-time pitch shift, but without source code.
Since a wav file basically is raw audio data, you won't be able to change the pitch without "raw audio processing".
Here is what you could do.
You will need the wave (standard library) and numpy modules.
import wave
import numpy as np
Open the files.
wr = wave.open('input.wav', 'r')
# Set the parameters for the output file.
par = list(wr.getparams())
par[3] = 0 # The number of samples will be set by writeframes.
par = tuple(par)
ww = wave.open('pitch1.wav', 'w')
ww.setparams(par)
The sound should be processed in small fractions of a second. This cuts down on reverb. Try setting fr to 1; you'll hear annoying echos.
fr = 20
sz = wr.getframerate()//fr # Read and process 1/fr second at a time.
# A larger number for fr means less reverb.
c = int(wr.getnframes()/sz) # count of the whole file
shift = 100//fr # shifting 100 Hz
for num in range(c):
Read the data, split it in left and right channel (assuming a stereo WAV file).
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # left and right channel
Extract the frequencies using the Fast Fourier Transform built into numpy.
lf, rf = np.fft.rfft(left), np.fft.rfft(right)
Roll the array to increase the pitch.
lf, rf = np.roll(lf, shift), np.roll(rf, shift)
The highest frequencies roll over to the lowest ones. That's not what we want, so zero them.
lf[0:shift], rf[0:shift] = 0, 0
Now use the inverse Fourier transform to convert the signal back into amplitude.
nl, nr = np.fft.irfft(lf), np.fft.irfft(rf)
Combine the two channels.
ns = np.column_stack((nl, nr)).ravel().astype(np.int16)
Write the output data.
ww.writeframes(ns.tostring())
Close the files when all frames are processed.
wr.close()
ww.close()
I recommend trying Librosa's pitch shift function:
https://librosa.github.io/librosa/generated/librosa.effects.pitch_shift.html
import librosa
y, sr = librosa.load('your_file.wav', sr=16000) # y is a numpy array of the wav file, sr = sample rate
y_shifted = librosa.effects.pitch_shift(y, sr, n_steps=4) # shifted by 4 half steps
You can try pydub for quick and easy pitch change across entire audio file and for different formats (wav, mp3 etc).
Here is a working code. Inspiration from here and refer here for more details on pitch change.
from pydub import AudioSegment
from pydub.playback import play
sound = AudioSegment.from_file('in.wav', format="wav")
# shift the pitch up by half an octave (speed will increase proportionally)
octaves = 0.5
new_sample_rate = int(sound.frame_rate * (2.0 ** octaves))
# keep the same samples but tell the computer they ought to be played at the
# new, higher sample rate. This file sounds like a chipmunk but has a weird sample rate.
hipitch_sound = sound._spawn(sound.raw_data, overrides={'frame_rate': new_sample_rate})
# now we just convert it to a common sample rate (44.1k - standard audio CD) to
# make sure it works in regular audio players. Other than potentially losing audio quality (if
# you set it too low - 44.1k is plenty) this should now noticeable change how the audio sounds.
hipitch_sound = hipitch_sound.set_frame_rate(44100)
#Play pitch changed sound
play(hipitch_sound)
#export / save pitch changed sound
hipitch_sound.export("out.wav", format="wav")

Transforming small WAV files in a single value of frequency (PYTHON)

I'm in a need for a program that transforms tones recorded by the microphone into keybord presses. Example: if somebody sings at a frequency between 400hz and 600hz at the microphone and the average tone is 550hz, then i store the average frequency in the var 'tom', and the key "G" of my keyboard is pressed.
Even tho i'm newbye at programming, i searched and figured out a way to do so,
by using Audiopy at python language, by recording small WAV files, i could then read those and get a number as an average frequency, and with this number and some ifs and elifs, press keys (not that hard to find a code to press keys), in an enormous WHILE that repeats the process while the program runs, and so i would have the process of talking, reading the small files the talk would produce, and then transforming into key presses, according to the tone.
The main problem is that I have no idea on how to transform the WAV files i've been recording on a single average number of frequency. Can somebody help me with this? Or with the big picture? Cuz i know this method is not a really good one. Thanks! I was using this code to record, that I found on the Audiopy website:
import pyaudio
import wave
import numpy as np
import pyaudio
CHUNK = 2048
FORMAT = pyaudio.paInt16
CHANNELS = 2
RATE = 44100
RECORD_SECONDS = 1
WAVE_OUTPUT_FILENAME = "output1.wav"
p = pyaudio.PyAudio()
stream = p.open(format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("* recording")
frames = []
for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
data = stream.read(CHUNK)
frames.append(data)
print("* done chunk")
stream.stop_stream()
stream.close()
p.terminate()
To press the keys, this other code:
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
if tom >= 400 and tom<=500:
shell.SendKeys("G")
PS.: I'm using Windows
You can use the Fourier transform to convert sound into frequencies.
More specifically, use the one-dimensional discrete Fourier Transform provided by numpy.fft.rfft.
An example to read a single second from a stereo WAV file and extract the frequencies.
import wave
import numpy as np
with wave.open('input.wav', 'r') as wr:
sz = wr.getframerate() # Read and process 1 second.
da = np.fromstring(wr.readframes(sz), dtype=np.int16)
left, right = da[0::2], da[1::2] # separate into left and right channel
lf, rf = np.absolute(np.fft.rfft(left)), np.absolute(np.fft.rfft(right))
The lf and rf are numpy arrays containing the intensity of each frequency. Using numpy.argmax you can get the index (frequency) with the highest strength.
But try it and graph the result using e.g. matplotlib. You'll see that there are probably multiple peaks in the data. For example you might find a peak at 50 Hz or 60 Hz. This is most probably interference from mains electricity and should be ignored by zero-ing out the data.
Example for 60 Hz:
lf[55:65], rf[55:65] = 0, 0
Below is an example plot made with matplotlib from a one-second sound clip. The top graph shows the samples from the WAV file while the bottom one shows the same data converted to frequencies. This is a graph of a person speaking, so there are many peaks. The highest is around 200 Hz.

Categories