I am new to Python, and I am trying to train my audio voice recognition model. I want to read a .wav file and get output of that .wav file into Numpy arrays. How can I do that?
In keeping with #Marco's comment, you can have a look at the Scipy library and, in particular, at scipy.io.
from scipy.io import wavfile
To read your file ('filename.wav'), simply do
output = wavfile.read('filename.wav')
This will output a tuple (which I named 'output'):
output[0], the sampling rate
output[1], the sample array you want to analyze
This is possible with a few lines with wave (built in) and numpy (obviously). You don't need to use librosa, scipy or soundfile. The latest gave me problems reading wav files and it's the whole reason I'm writting here now.
import numpy as np
import wave
# Start opening the file with wave
with wave.open('filename.wav') as f:
# Read the whole file into a buffer. If you are dealing with a large file
# then you should read it in blocks and process them separately.
buffer = f.readframes(f.getnframes())
# Convert the buffer to a numpy array by checking the size of the sample
# with in bytes. The output will be a 1D array with interleaved channels.
interleaved = np.frombuffer(buffer, dtype=f'int{f.getsampwidth()*8}')
# Reshape it into a 2D array separating the channels in columns.
data = np.reshape(interleaved, (-1, f.getnchannels()))
I like to pack it into a function that returns the sampling frequency and works with pathlib.Path objects. In this way it can be played using sounddevice
# play_wav.py
import sounddevice as sd
import numpy as np
import wave
from typing import Tuple
from pathlib import Path
# Utility function that reads the whole `wav` file content into a numpy array
def wave_read(filename: Path) -> Tuple[np.ndarray, int]:
with wave.open(str(filename), 'rb') as f:
buffer = f.readframes(f.getnframes())
inter = np.frombuffer(buffer, dtype=f'int{f.getsampwidth()*8}')
return np.reshape(inter, (-1, f.getnchannels())), f.getframerate()
if __name__ == '__main__':
# Play all files in the current directory
for wav_file in Path().glob('*.wav'):
print(f"Playing {wav_file}")
data, fs = wave_read(wav_file)
sd.play(data, samplerate=fs, blocking=True)
Related
I've develop a script that given an input file, extract the voice signal and give in output the signal WITHOUT voice (so the signal that containts the noise):
!pip install pydub
from pydub import AudioSegment
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
audio = AudioSegment.from_file('fileInput.mp3')
Download fileInput.mp3
samples = audio.get_array_of_samples()
plt.plot(list(samples))
from scipy import signal
sos = signal.butter(10, [100, 4000], 'bandstop', fs=44100, output='sos')
filtered = signal.sosfilt(sos, np.array(samples))
plt.figure(figsize=(10,10))
plt.plot(np.array(samples))
plt.plot(filtered)
plt.title('After 1 - 10 Hz pass-band filter')
plt.tight_layout()
plt.show()
To export the file filtered (so the file that contains the noise) i write that following line:
from scipy.io.wavfile import write
write('./test.wav', 44100, filtered.astype(np.int16))
That codes save a file but the file don't have the same lenght of the original (input) one.
As you can notice, the input file has 36second lenght instead the output is 1:12 ...
Download Output file
The input file is stereo. The pydub documentation states that:
AudioSegment(…).get_array_of_samples()
Returns the raw audio data as an array of (numeric) samples. Note: if the audio has multiple channels, the samples for each channel will be serialized – for example, stereo audio would look like [sample_1_L, sample_1_R, sample_2_L, sample_2_R, …]
for scipy this is just 1 "long" channel. it can not know that the samples are split like this. A filter also has state. Meaning it can not process data that is shuffled like this and produce the desired output.
either you reshape the data from AudioSegment for example into 2 mono channels like:
[sample1L, sample2L, ...]
and
[sample1R, sample2R, ...]
and process these individually.
OR
you simply convert the AudioSegment to mono. like so:
audio = AudioSegment.from_file('fileInput.mp3')
audio = audio.set_channels(1)
either way I highly recommend you use the sample rate of the input file, wherever a sample rate is required. else loading a file with other sample rate will shift the filter frequencies and change the length and playback speed of the output file. e.g.
sos = signal.butter(10, [100, 4000], 'bandstop', fs=audio.frame_rate, output='sos')
I would like to convert a 32-bit wav numpy array to a 24-bit wav numpy array using python3 and the numpy library.
I am reading the file like this:
import numpy as np
sample_rate, file_info = read(filepath)
np_array = np.array(file_info)
Now based on the dtype which can be extracted via
something like this
if (str(np_array.dtype) == 'int32'):
I would like to retranscode the np_array to become a 24bit array.
I need to do this for analysis purposes. The goal is - not to generate a new file.
Any hints in how to do this effectively?
Thank you
import sox
transformer = sox.Transformer()
transformer.convert(samplerate=sample_rate, bitdepth=16)
sox_array = transformer.build_array(sample_rate_in=sample_rate, input_array=np_array)
This will convert the array to 16bit, which allows for analysis workflows, which will always be the same.
I have a problem in importing a wavefile into the Jupyter Notebook. I want to take the audio file from my desktop and perform fft on it. Does anyone know how to do this?
You can follow the examples at http://people.csail.mit.edu/hubert/pyaudio/docs/.
I also want to do FFT analyses on WAV files and use this approach (only the essential bits shown):
NOTE: this is a 16bit stereo WAV file, the "unpack" doesn't work with 24bit
import pyaudio
import wave
import numpy as np
import struct
wf = wave.open(sound_file_name, 'r')
n_frames = wf.getnframes()
all_frames = wf.readframes(n_frames)
wf.close()
value_list = []
for x in range(0, len(all_frames), 2):
value_list += struct.unpack('<h', all_frames[x:x+2])
two_channel_values = np.transpose(np.reshape(np.asanyarray(value_list), (int(len(value_list)/2), 2)))
Now you have an array of two vectors, each containing the amplitude values of one stereo channel.
I would like to use sounddevice's playrec feature. To start I would like to just get sd.play() to work, I am new to Python and have never worked with NumPy, I have gotten audio to play using pyaudio, but I need the simultaneous play record feature in sounddevice. When I try to play an audio .wav file I get: TypeError: Unsupported data type: 'string288'. I think it has something to do with having to store the .wav file in a numpy array, but I have no idea how to do that. Here is what I have:
import sounddevice as sd
import numpy as np
sd.default.samplerate = 44100
sd.play('test.wav')
sd.wait
The documentation of sounddevice.play() says:
sounddevice.play(data, samplerate=None, mapping=None, blocking=False, loop=False, **kwargs)
where data is an "array-like".
It can't work with an audio file name, as you tried. The audio file has first to be read, and interpreted as a numpy array.
This code should work:
data, fs = sf.read(filename, dtype='float32')
sd.play(data, fs)
You'll find more examples here.
the application I am working on convert a proprietary tiff file format (Nikon nd2 file) containing multiple images (fields of view), planes (Z) and fluorescence channels to numpy arrays that are then saved in an HDF5 file. Usually A tipical dataset has 50 field of view (fov) each of which has 5 channels and each channel has 40 z-planes). The overall file in around 6 Gb.
This is the code I wrote:
Steps:
0) Import of all the required libraries
import nd2reader as nd2
from matplotlib import pyplot as plt
import numpy as np
import h5py as h5
import itertools
import ast
import glob as glob
from joblib import Parallel, delayed
import time
1) Function for running the conversion of the nd2 file.
The conversion to numpy arrays is done using nd2reader a python program and is quick.
To reduce the number of loops and use list comprehension I make a list of tuples each containing the channel and the fov
Example:
[('DAPI', 0),
('DAPI', 1)]
where DAPI is the channel and fov is the number.
NB: THe experiment channel list is a file containing a dictionary than match the channel (key) with the gene of interest (value).
def ConvertND2File(ND2file):
ChannelFileName=ND2file.replace('.nd2','ChannelsInfo.txt')
# Read the file with the channels and raise an error if the file is missing
try:
ExperimentChannelList = ast.literal_eval(open(ChannelFileName).read())
except IOError:
print("The file:", ChannelFileName, "with the channels dictionary is missing")
raise
DataFileName=ND2file.replace('.nd2','.h5')
with h5.File(DataFileName, 'w') as DataFile:
ImgRef=nd2.Nd2(ND2file)
Channels_Fields=itertools.product(ImgRef.channels,ImgRef.fields_of_view)
# Create the empty array that will contain the 3D image
ImgStack=np.empty([len(ImgRef.z_levels),ImgRef.height,ImgRef.width])
# Use list comprehension to save the 3D arrays of the fov for each channel
_=[SaveImg(DataFile,ImgRef,ExperimentChannelList,ImgStack,*x) for x in Channels_Fields]
2) Function to combine the images in a 3D array that is then written in an HDF5 file. I use h5py. I write each of the 3D numpy array on disk right after are generated.
def SaveImg(DataFile,ImgRef,ExperimentChannelList,ImgStack,*args):
channel=args[0]
fov=args[1]
for idx,image in enumerate(ImgRef.select(channels=channel,z_levels=ImgRef.z_levels,fields_of_view=fov)):
ImgStack[idx,:,:]=image
gene=ExperimentChannelList[channel]
ChannelGroup=DataFile.require_group(gene)
FovDataSet=ChannelGroup.create_dataset(str(fov), data=ImgStack,dtype=np.float64,compression="gzip")
3) Body of the script and joblib call for parallel processing of all the files in a directory.
if __name__=='__main__':
# Run the
# Directory where ND2 file is stored (Ex. User/Data/)
WorkingDirectory=input('Enter the directory with the files to process (ex. /User/): ')
#WorkingDirectory='/Users/simone/Box Sync/test/ND2conversion/'
NumberOfProcesses=int(input('Enter the number of processes to use: '))
#NumberOfProcesses=2
FileExt='nd2'
# Iterator with the name of the files to process
FilesIter=glob.iglob(WorkingDirectory+'*.'+FileExt)
now = time.time()
Parallel(n_jobs=NumberOfProcesses,verbose=5)(delayed(ConvertND2File)(ND2file) for ND2file in FilesIter)
print("Finished in", time.time()-now , "sec")
Running time
Total time for the conversion of two file of 5.9 Gb
[Parallel(n_jobs=2)]: Done 1 out of 2 | elapsed: 7.4min remaining: 7.4min
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 7.4min finished
Finished in 444.8717038631439 sec
Question:
I was just wondering if there is a better way to handle the io to hdf5 files in order to speed up the conversion considering that if I want to scale up the process I will not be able to keep in memory all the 3D numpy arrays (fov) and then write them after each channel is processed.
Thanks!