I am using wave files for making deep learning model
they are in different length , so i want to pad all of them
to 16 sec length using python
If I understood correctly, the question wants to fix all lengths to a given length. Therefore, the solution will be slightly different:
from pydub import AudioSegment
pad_ms = 1000 # Add here the fix length you want (in milliseconds)
audio = AudioSegment.from_wav('you-wav-file.wav')
assert pad_ms > len(audio), "Audio was longer that 1 second. Path: " + str(full_path)
silence = AudioSegment.silent(duration=pad_ms-len(audio)+1)
padded = audio + silence # Adding silence after the audio
padded.export('padded-file.wav', format='wav')
This answer differs from this one in the sense that this one creates all audios from the same length where the other adds the same size of silence at the end.
Using pydub:
from pydub import AudioSegment
pad_ms = 1000 # milliseconds of silence needed
silence = AudioSegment.silent(duration=pad_ms)
audio = AudioSegment.from_wav('you-wav-file.wav')
padded = audio + silence # Adding silence after the audio
padded.export('padded-file.wav', format='wav')
AudioSegment objects are immutable
You can use Librosa. The Librosa.util.fix_length function adds silent patch to audio file by appending zeros to the end the numpy array containing the audio data:
from librosa import load
from librosa.util import fix_length
file_path = 'dir/audio.wav'
sf = 44100 # sampling frequency of wav file
required_audio_size = 5 # audio of size 2 second needs to be padded to 5 seconds
audio, sf = load(file_path, sr=sf, mono=True) # mono=True converts stereo audio to mono
padded_audio = fix_length(audio, size=5*sf) # array size is required_audio_size*sampling frequency
print('Array length before padding', np.shape(audio))
print('Audio length before padding in seconds', (np.shape(audio)[0]/fs))
print('Array length after padding', np.shape(padded_audio))
print('Audio length after padding in seconds', (np.shape(padded_audio)[0]/fs))
Output:
Array length before padding (88200,)
Audio length before padding in seconds 2.0
Array length after padding (220500,)
Audio length after padding in seconds 5.0
Although after looking through a number of similar questions, it seems like pydub.AudioSegment is the go to solution.
Related
I am using pydub to do some experiments with an audio file. After I load it I want to take some parts and analyze them further with numpy, so I extract the raw data as explained here Pydub raw audio data
song = AudioSegment.from_file("dodos_delight.m4a")# Size of segments to break song into for volume calculations
# Take the first 17.5 seconds
start = 0
end = 17.5*1000
guitar = song[start:end]
#Now get raw data as an array
bit_depth = song.sample_width * 8
array_type = get_array_type(bit_depth)
fs = song.frame_rate
guitar_np = array.array(array_type, guitar.raw_data)
guitar_t = np.arange(0,len(guitar_np)/fs,1/fs)
However len(guitar_np)/fs = 35 which does not make sense. It's the exact double of what it should be. The only way to be 17.5 would be if fs was doubled, but the points are taken at 1/fs time apart.
If I try to save the data like this
from scipy.io.wavfile import write
rate = fs
scaled = np.int16(guitar_np / np.max(np.abs(guitar_np)) * 32767)
write('test.wav', rate, scaled)
I get a super slow version of it, and the only way to make it sound as the original is to save it with rate = fs*2
Any thought?
Background: I am writing a python script that will take in an audio file and modify it using pydub. Pydub seems to require converting audio input to a wav format though, which has a 4GB limit. So I put in a 400MB .m4a file into pydub and get an error that the file is too large.
Instead of having pydub run for a couple minutes, then throw an error if the converted decompressed size is too large, I would like to quickly calculate ahead of time what the decompressed filesize would be. If over 4GB, my script will chop the original audio and then run through pydub.
Thanks.
It's simple arithmetic to calculate the size of a theoretical .WAV file. The size, in bytes, is the bit depth divided by 8, multiplied by the sample rate, multiplied by the duration, multiplied by the number of channels.
So if you had an audio clip that was 3:20 long, 44100Hz, 16-bit and stereo, the calculation would be:
sample_rate = 44100 # Hz/Samples per second - CD Quality
bit_depth = 16 # 2 bytes; CD quality
channels = 2 # stereo
duration = 200.0 # seconds
file_size = sample_rate * (bit_depth / 8) * channels * duration
# = 44100 * (2) * 2 * 200
# = 35280000 bytes
# = 35.28 MB (megabytes)
I found this online audio file size calculator which you can also use to confirm your math: https://www.colincrawley.com/audio-file-size-calculator/
If instead you wanted to figure out the other direction, i.e. the size of a theoretical compressed file, it depends on how you're doing the compression. Typical compression, thankfully, uses just a fixed bitrate, meaning the math to figure out the resulting compressed file size is really simple.
So, if you had a 3:20 audio clip you wanted to convert to MP3, at a bitrate of 128kbps (kilobits per second, 128 is a common mid-range quality setting), the calculation would just be the bit rate, divided by 8 (bits per byte) multiplied by the duration:
bits_per_kb = 1000
bitrate_kbps = 128
bits_per_byte = 8
duration_seconds = 200
filesize_bytes = (bitrate_kbps * bits_per_kb / bits_per_byte) * duration_seconds
# = (128000 / 8) * 200
# = (16) * 200
# = 3200000 bytes
# = 3.2 MB
I have a database which contains a videos streaming. I want to calculate the LBP features from images and MFCC audio and for every frame in the video I have some annotation. The annotation is inlined with the video frames and the time of the video. Thus, I want to map the time that i have from the annotation to the result of the mfcc. I know that the sample_rate = 44100
from python_speech_features import mfcc
from python_speech_features import logfbank
import scipy.io.wavfile as wav
audio_file = "sample.wav"
(rate,sig) = wav.read(audio_file)
mfcc_feat = mfcc(sig,rate)
print len(sig) # 2130912
print len(mfcc_feat) # 4831
Firstly, why the result of the length of the mfcc is 4831 and how to map that in the annotation that i have in seconds? The total duration of the video is 48second. And the annotation of the video is 0 everywhere except the 19-29sec windows where is is 1. How can i locate the samples within the window (19-29) from the results of the mfcc?
Run
mfcc_feat.shape
You should get (4831,13) . 13 is your MFCC length (default numcep is 13). 4831 is the windows. Default winstep is 10 msec, and this matches your sound file duration. To get to the windows corresponding to 19-29 sec, just slice
mfcc_feat[1900:2900,:]
Remember, that you can not listen to the MFCC. It just represents the slice of audio of 0.025 sec (default value of winlen parameter).
If you want to get to the audio itself, it is
sig[time_beg_in_sec*rate:time_end_in_sec*rate]
I need to extract exaclly 8 seconds from the middle of the audio data from the wav. file with lenght 0:27 sec.
--All what I did already, it took the middle 9 sec by divided wav. file on 3 parts and took the middle one, but it's 9s I need 8s.
And how find a number of bits in that numpy array?
import scipy.io.wavfile
import pyaudio
import numpy as np
(samplRate,data)=scipy.io.wavfile.read('Track48.wav')
print
CHANNELS=2
p= pyaudio.PyAudio()
#
nine_sec=len(data)/3
eight_sec=2*len(data)/3
stream = p.open(format=pyaudio.paInt16,
channels=CHANNELS,
rate=44100,
output=True
)
cuted_data=data[nine_sec:eight_sec]
newdata = cuted_data.astype(np.int16).tostring()
stream.write(newdata)
print(cuted_data)
Thank you for your help.
You can use pydub to slice middle 8 seconds very easily.
Details on pydub are here
And you can install as pip install pydub
I had a wav file of 348 sec duration whose middle 8 seconds are sliced.
>>> song.duration_seconds
348.05551020408166
You can also use different file formats such as wav, mp3, m4a, ogg etc. for import (convert to data-segments) and export.
Source Code
from pydub import AudioSegment
from pydub.playback import play
song = AudioSegment.from_wav("music.wav")
#slice middle eight seconds of audio
midpoint = song.duration_seconds // 2
left_four_seconds = (midpoint - 4) * 1000 #pydub workds in milliseconds
right_four_seconds = (midpoint + 4) * 1000 #pydub workds in milliseconds
eight_sec_slice = song[left_four_seconds:right_four_seconds ]
#Play slice
play(eight_sec_slice )
#or save to file
eight_sec_slice.export("eight_sec_slice.wav", format="wav")
As you can see length of middle 8 seconds slice is exactly as desired.
>>> eight_sec_slice.duration_seconds
8.0
I know this question is old but someone may like the solution i want to offer which uses numpy only. No need of pydub.
import scipy.io.wavfile as wavfile
fs, data - wavefile.read("Track48.wav")
# number of samples N
N = data.shape[0]
# Convert seconds to samples
eight_secs_in_samples = float(fs)*8 # time = number_of_samples / rate
midpoint_sample = N//2 # Midpoint of sample
# substract 4 seconds from midpoint
left_side = midpoint_sample-(eight_secs_in_samples//2)
# Add 4 seconds from midpoint up
right_side = midpoint_sample + (eight_secs_in_samples//2)
# The midpoint of samples is therefore:
mid8secs = array_data[int(left_side):int(right_side)] # this range contains the required samples
# Save the file
wavfile.write("eightSecSlice.wav",fs,mid8secs)
I am analyzing the spectrogram's of .wav files. But after getting the code to finally work, I've run into a small issue. After saving the spectrograms of 700+ .wav files I realize that they all essentially look the same!!! This is not because they are the same audio file, but because I don't know how to change the scale of the plot to be smaller(so I can make out the differences).
I've already tried to fix this issue by looking at this StackOverflow post
Changing plot scale by a factor in matplotlib
I'll show the graph of two different .wav files below
This is .wav #1
This is .wav #2
Believe it or not, these are two different .wav files, but they look super similar. And a computer especially won't be able to pick up the differences in these two .wav files if the scale is this broad.
My code is below
def individualWavToSpectrogram(myAudio, fileNameToSaveTo):
print(myAudio)
#Read file and get sampling freq [ usually 44100 Hz ] and sound object
samplingFreq, mySound = wavfile.read(myAudio)
#Check if wave file is 16bit or 32 bit. 24bit is not supported
mySoundDataType = mySound.dtype
#We can convert our sound array to floating point values ranging from -1 to 1 as follows
mySound = mySound / (2.**15)
#Check sample points and sound channel for duel channel(5060, 2) or (5060, ) for mono channel
mySoundShape = mySound.shape
samplePoints = float(mySound.shape[0])
#Get duration of sound file
signalDuration = mySound.shape[0] / samplingFreq
#If two channels, then select only one channel
#mySoundOneChannel = mySound[:,0]
#if one channel then index like a 1d array, if 2 channel index into 2 dimensional array
if len(mySound.shape) > 1:
mySoundOneChannel = mySound[:,0]
else:
mySoundOneChannel = mySound
#Plotting the tone
# We can represent sound by plotting the pressure values against time axis.
#Create an array of sample point in one dimension
timeArray = numpy.arange(0, samplePoints, 1)
#
timeArray = timeArray / samplingFreq
#Scale to milliSeconds
timeArray = timeArray * 1000
plt.rcParams['agg.path.chunksize'] = 100000
#Plot the tone
plt.plot(timeArray, mySoundOneChannel, color='Black')
#plt.xlabel('Time (ms)')
#plt.ylabel('Amplitude')
print("trying to save")
plt.savefig('/Users/BillyBobJoe/Desktop/' + fileNameToSaveTo + '.jpg')
print("saved")
#plt.show()
#plt.close()
How can I modify this code to increase the sensitivity of the graphing so that the differences between two .wav files is made more distinct?
Thanks!
[UPDATE]
I have tried using
plt.xlim((0, 16000))
But this just adds whitespace to the right of the graph
like
I need a way to change the scale of each unit. so that the graph is filled out when I change the x axis from 0 - 16000
If the question is: how to limit the scale on the xaxis, say to between 0 and 1000, you can do as follows:
plt.xlim((0, 1000))