I am new to Python, and want to train an audio model. I converted my audio file to .wav format.
How can i parse those audio .wav file into the tensorflow?
You can use Librosa, it is sound processing library. You can install it by
pip install librosa
Then,
import librosa
import tensorflow as tf
data , sampling_rate = librosa.load('data/sound.wav')
# for use in tensorflow
data_tensor = tf.convert_to_tensor( data )
What you need is documented in the link below:
https://www.tensorflow.org/api_docs/python/tf/audio/decode_wav
With
tf.audio.decode_wav(
contents, desired_channels=-1, desired_samples=-1, name=None
)
you can decode a 16-bit PCM WAV file to a float tensor. In return, you get a tuple of Tensor objects (audio, sample_rate) where audio is a tensor of type float32, and sample_rate a tensor of type int32.
Related
I've develop a script that given an input file, extract the voice signal and give in output the signal WITHOUT voice (so the signal that containts the noise):
!pip install pydub
from pydub import AudioSegment
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
audio = AudioSegment.from_file('fileInput.mp3')
Download fileInput.mp3
samples = audio.get_array_of_samples()
plt.plot(list(samples))
from scipy import signal
sos = signal.butter(10, [100, 4000], 'bandstop', fs=44100, output='sos')
filtered = signal.sosfilt(sos, np.array(samples))
plt.figure(figsize=(10,10))
plt.plot(np.array(samples))
plt.plot(filtered)
plt.title('After 1 - 10 Hz pass-band filter')
plt.tight_layout()
plt.show()
To export the file filtered (so the file that contains the noise) i write that following line:
from scipy.io.wavfile import write
write('./test.wav', 44100, filtered.astype(np.int16))
That codes save a file but the file don't have the same lenght of the original (input) one.
As you can notice, the input file has 36second lenght instead the output is 1:12 ...
Download Output file
The input file is stereo. The pydub documentation states that:
AudioSegment(…).get_array_of_samples()
Returns the raw audio data as an array of (numeric) samples. Note: if the audio has multiple channels, the samples for each channel will be serialized – for example, stereo audio would look like [sample_1_L, sample_1_R, sample_2_L, sample_2_R, …]
for scipy this is just 1 "long" channel. it can not know that the samples are split like this. A filter also has state. Meaning it can not process data that is shuffled like this and produce the desired output.
either you reshape the data from AudioSegment for example into 2 mono channels like:
[sample1L, sample2L, ...]
and
[sample1R, sample2R, ...]
and process these individually.
OR
you simply convert the AudioSegment to mono. like so:
audio = AudioSegment.from_file('fileInput.mp3')
audio = audio.set_channels(1)
either way I highly recommend you use the sample rate of the input file, wherever a sample rate is required. else loading a file with other sample rate will shift the filter frequencies and change the length and playback speed of the output file. e.g.
sos = signal.butter(10, [100, 4000], 'bandstop', fs=audio.frame_rate, output='sos')
My intention is to process MP3 file using Librosa library (normalize volume, trim silences, etc). However, as Librosa doesn't support MP3 format I use audioread library to load audio; however, I could not find the function in audioread that writes back the file, for that purpose I have loaded soundfile and saved processed file into WAV. Unfortunately, I am able to save only one channel (MONO) not Stereo.
Kindly advise, what library can I use to load and write MP3 file and process it using Librosa? or how can I write both channels into WAV or MP3 using soundfile?
import audioread, librosa
import soundfile as sf
filename="../sounds/music.mp3"
audio_file = audioread.audio_open(filename)
audio, sr = librosa.load(audio_file, sr= 44100)
clip = librosa.effects.trim(audio, top_db= 10)
sf.write('../sounds/output/out.wav', clip[0], sr, 'PCM_24')
Soundfile supports multichannel saving just fine. However, Librosa works with audio arrays where the dimensions are: (N_channels, N_samples). Soundfile on the other hand works with: (N_samples, N_channels). You can use numpy to transpose from one format to the other:
sf.write('../sounds/output/out.wav', np.transpose(clip), sr, 'PCM_24')
I have a numpy array from a some.npy file that contains data of an audio file that is encoded in the .wav format.
The some.npy was created with sig = librosa.load(some_wav_file, sr=22050) and np.save('some.npy', sig).
I want to convert this numpy array as if its content was encoded with .mp3 instead.
Unfortunately, I am restricted to the use of in-memory file objects for two reasons.
I have many .npy files. They are cached in advance and it would be highly inefficient to have that much "real" I/O when actually running the application.
Conflicting access rights of people who are executing the application on a server.
First, I was looking for a way to convert the data in the numpy array directly, but there seems to be no library function. So is there a simple way to achieve this with in-memory file objects?
NOTE: I found this question How to convert MP3 to WAV in Python and its solution could be theoretically adapted but this is not in-memory.
You can read and write memory using BytesIO, like this:
import BytesIO
# Create "in-memory" buffer
memoryBuff = io.BytesIO()
And you can read and write MP3 using pydub module:
from pydub import AudioSegment
# Read a file in
sound = AudioSegment.from_wav('stereo_file.wav')
# Write to memory buffer as MP3
sound.export(memoryBuff, format='mp3')
Your MP3 data is now available at memoryBuff.getvalue()
You can convert between AudioSegments and Numpy arrays using this answer.
I finally found a working solution. This is what I wanted.
from pydub import AudioSegment
wav = np.load('some.npy')
with io.BytesIO() as inmemoryfile:
compression_format = 'mp3'
n_channels = 2 if wav.shape[0] == 2 else 1 # stereo and mono files
AudioSegment(wav.tobytes(), frame_rate=my_sample_rate, sample_width=wav.dtype.itemsize,
channels=n_channels).export(inmemoryfile, format=compression_format)
wav = np.array(AudioSegment.from_file_using_temporary_files(inmemoryfile)
.get_array_of_samples())
There exists a wrapper package (audiosegment) with which one could convert the last line to:
wav = audiosegment.AudioSegment.to_numpy_array(AudioSegment.from_file_using_temporary_files(inmemoryfile))
I would like to use sounddevice's playrec feature. To start I would like to just get sd.play() to work, I am new to Python and have never worked with NumPy, I have gotten audio to play using pyaudio, but I need the simultaneous play record feature in sounddevice. When I try to play an audio .wav file I get: TypeError: Unsupported data type: 'string288'. I think it has something to do with having to store the .wav file in a numpy array, but I have no idea how to do that. Here is what I have:
import sounddevice as sd
import numpy as np
sd.default.samplerate = 44100
sd.play('test.wav')
sd.wait
The documentation of sounddevice.play() says:
sounddevice.play(data, samplerate=None, mapping=None, blocking=False, loop=False, **kwargs)
where data is an "array-like".
It can't work with an audio file name, as you tried. The audio file has first to be read, and interpreted as a numpy array.
This code should work:
data, fs = sf.read(filename, dtype='float32')
sd.play(data, fs)
You'll find more examples here.
Just found out this interesting python package pydub which converts any audio file to mp3, wav, etc.
As far as I have read its documentation, the process is as follows:
read the mp3 audio file using from_mp3()
creates a wav file using export().
Just curious if there is a way to access the sampling rate and the audio signal(of 1-dimensional array, supposing it is a mono) directly from the mp3 file without converting it to a wav file. I am working on thousands of audio files and it might be expensive to convert all of them to wav file.
If you aren't interested in the actual audio content of the file, you may be able to use pydub.utils.mediainfo():
>>> from pydub.utils import mediainfo
>>> info = mediainfo("/path/to/file.mp3")
>>> print info['sample_rate']
44100
>>> print info['channels']
1
This uses avlib's avprobe utility, and returns all kinds of info. I suggest giving it a try :)
Should be much faster than opening each mp3 using AudioSegment.from_mp3(…)
frame_rate means sample_rate, so you can get like below;
from pydub import AudioSegment
filename = "hoge.wav"
myaudio = AudioSegment.from_file(filename)
print(myaudio.frame_rate)