Isolating audio foreground and converting back to audio stream using librosa - python

I'm trying to isolate the foreground of an audio stream and then save it as a standalone audio stream using librosa.
Starting with this seemingly relevant example.
I have the full, foreground and background data isolated as the example does in S_full, S_foreground and S_background but I'm unsure as to what to do to use those as audio.
I attempted to use librosa.istft(...) to convert those and then save that as a .wav file using soundfile.write(...) but I'm left with a file of roughly the right size but unusable(?) data.
Can anyone describe or point me at an example?
Thanks.

in putting together the minimal example,
istft() with the original sampling rate does in fact work.
I'll find my bug, somewhere.
FWIW here's the working code
import numpy as np
import librosa
from librosa import display
import soundfile
import matplotlib.pyplot as plt
y, sr = librosa.load('audio/rb-testspeech.mp3', duration=5)
S_full, phase = librosa.magphase(librosa.stft(y))
S_filter = librosa.decompose.nn_filter(S_full,
aggregate=np.median,
metric='cosine',
width=int(librosa.time_to_frames(2, sr=sr)))
S_filter = np.minimum(S_full, S_filter)
margin_i, margin_v = 2, 10
power = 2
mask_v = librosa.util.softmask(S_full - S_filter,
margin_v * S_filter,
power=power)
S_foreground = mask_v * S_full
full = librosa.amplitude_to_db(S_full, ref=np.max)
librosa.display.specshow(full, y_axis='log', sr=sr)
plt.title('Full spectrum')
plt.colorbar()
plt.tight_layout()
plt.show()
print("y({}): {}".format(len(y),y))
print("sr: {}".format(sr))
full_audio = librosa.istft(S_full)
foreground_audio = librosa.istft(S_foreground)
print("full({}): {}".format(len(full_audio), full_audio))
soundfile.write('orig.WAV', y, sr)
soundfile.write('full.WAV', full_audio, sr)
soundfile.write('foreground.WAV', foreground_audio, sr)

Related

How to struct pack an array of frequencies and get a wave in a wav file? Python [duplicate]

I want to create "heart rate monitor" effect from a 2D array in numpy and want the tone to reflect the values in the array.
You can use the write function from scipy.io.wavfile to create a wav file which you can then play however you wish. Note that the array must be integers, so if you have floats, you might want to scale them appropriately:
import numpy as np
from scipy.io.wavfile import write
rate = 44100
data = np.random.uniform(-1, 1, rate) # 1 second worth of random samples between -1 and 1
scaled = np.int16(data / np.max(np.abs(data)) * 32767)
write('test.wav', rate, scaled)
If you want Python to actually play audio, then this page provides an overview of some of the packages/modules.
For the people coming here in 2016 scikits.audiolab doesn't really seem to work anymore. I was able to get a solution using sounddevice.
import numpy as np
import sounddevice as sd
fs = 44100
data = np.random.uniform(-1, 1, fs)
sd.play(data, fs)
in Jupyter the best option is:
from IPython.display import Audio
wave_audio = numpy.sin(numpy.linspace(0, 3000, 20000))
Audio(wave_audio, rate=20000)
In addition, you could try scikits.audiolab. It features file IO and the ability to 'play' arrays. Arrays don't have to be integers. To mimick dbaupp's example:
import numpy as np
import scikits.audiolab
data = np.random.uniform(-1,1,44100)
# write array to file:
scikits.audiolab.wavwrite(data, 'test.wav', fs=44100, enc='pcm16')
# play the array:
scikits.audiolab.play(data, fs=44100)
I had some problems using scikit.audiolabs, so I looked for some other options for this task. I came up with sounddevice, which seems a lot more up-to-date. I have not checked if it works with Python 3.
A simple way to perform what you want is this:
import numpy as np
import sounddevice as sd
sd.default.samplerate = 44100
time = 2.0
frequency = 440
# Generate time of samples between 0 and two seconds
samples = np.arange(44100 * time) / 44100.0
# Recall that a sinusoidal wave of frequency f has formula w(t) = A*sin(2*pi*f*t)
wave = 10000 * np.sin(2 * np.pi * frequency * samples)
# Convert it to wav format (16 bits)
wav_wave = np.array(wave, dtype=np.int16)
sd.play(wav_wave, blocking=True)
PyGame has the module pygame.sndarray which can play numpy data as audio. The other answers are probably better, as PyGame can be difficult to get up and running. Then again, scipy and numpy come with their own difficulties, so maybe it isn't a large step to add PyGame into the mix.
http://www.pygame.org/docs/ref/sndarray.html
Another modern and convenient solution is to use pysoundfile, which can read and write a wide range of audio file formats:
import numpy as np
import soundfile as sf
data = np.random.uniform(-1, 1, 44100)
sf.write('new_file.wav', data, 44100)
Not sure of the particulars of how you would produce the audio from the array, but I have found mpg321 to be a great command-line audio player, and could potentially work for you.
I use it as my player of choice for Anki, which is written in python and has libraries that could be a great starting place for interfacing your code/arrays with audio.
Check out:
anki.sound.py
customPlayer.py

How to write a video file using Moviepy

I am trying to implement a startle burst into a .mp4 file. I have been able to do this successfully but how I have altered the original audio results in an array. The trouble I am having is that the 'write_videofile' function seems to only take a specific type of structure in order to write the audio file. This is what I have so far any help would be great.
#Necessary imports
import os as os
import moviepy.editor as mp
from playsound import playsound
import librosa as librosa
import librosa.display
import sounddevice as sd
import numpy as np
from moviepy.audio.AudioClip import AudioArrayClip
#Set working directory
os.chdir('/Users/matthew/Desktop/Python_startle')
#Upload the video and indicate where in the clip the burst should be inserted
vid = mp.VideoFileClip("37vids_balanced/Baby_Babble.mp4")
audio, sr = librosa.load('37vids_balanced/Baby_Babble.mp4')
librosa.display.waveplot(audio, sr = sr)
sd.play(audio, sr)
audio_duration = librosa.get_duration(filename = '37vids_balanced/Baby_Babble.mp4')
start_point_sec = 6
start_point_samp = start_point_sec * sr
burst_samp = int(sr * .05) #how many samples are in the noise burst
burst_window = audio[start_point_samp : start_point_samp + burst_samp].astype(np.int32) #Isolating part of the audio clip to add burst to
#Creating startle burst
noise = np.random.normal(0,1,len(burst_window))
#Inserting the noise created into the portion of the audio that is wanted
audio[start_point_samp : start_point_samp + burst_samp] = audio[start_point_samp : start_point_samp + burst_samp].astype(np.int64) + noise #Puts the modified segment into the original audio
librosa.display.waveplot(audio, sr = sr)
sd.play(audio, sr)
new_vid = vid.set_audio(audio)
sd.play(new_vid.audio, sr)
#Trying to get the audio into the proper form to convert into a mp4
new_vid.audio = [[i] for i in new_vid.audio] #This makes the data into a list because the 'AudioArrayClip' could not take in an array because an array is not 'iterable' this might not be the best solution but it has worked so far
new_vid.audio = AudioArrayClip(new_vid.audio, fps = new_vid.fps)
new_vid.write_videofile('37vids_balanced_noise/Baby_Babble.mp4')
Maybe there is an easier way to take the array into an iterable form that would work.

how to repeat the audio wav file such that it becomes at least 6-seconds long in python

I am working on a sound classification problem and I want that my audio file should be atleast 6 second long. If it is not I want to extend it by running in the loop. and then find its STFT which is in librosa library of python.
This might help a bit
import math
import soundfile as sf
import numpy as np
import librosa
data, samplerate = sf.read('test1.wav')
channels = len(data.shape)
length_s = len(data)/float(samplerate)
if(length_s < 6.0):
n = math.ceil(6*samplerate/len(data))
if(channels == 2):
data = np.tile(data,(n,1))
else:
data = np.tile(data,n)
sf.write('new.wav', data, samplerate)
# now calculate stft for each channel if stereo

How to add matplotlib output to a video

I'm trying to create a video with annotation (currently) formed from matplotlib, with the original video on the left and a FFT of some parameters on the right.
The only way I've gotten it to work is by saving a .png file for each frame, which seems tedious. I was hoping someone could point out the 'correct' method.
import cv2
import numpy as np
import matplotlib
import scipy.fftpack
from moviepy.editor import VideoFileClip
from moviepy.editor import AudioFileClip
vid = VideoFileClip('VID.mp4')
aud = AudioFileClip('VID.mp4')
out = cv2.VideoWriter('1234.avi',cv2.VideoWriter_fourcc('M','J','P','G'), vid.fps, (vid.w*2, vid.h))
audIndex = 0
vidIndex = 0
numberOfSamples = 600
sampleRate = 800;
T = 1.0 / sampleRate;
x = np.linspace(0.0, numberOfSamples*T, numberOfSamples)
for frame in vid.iter_frames():
# Put the recorded movie on the left side of the video frame
frame2 = np.zeros((frame.shape[0], 2*frame.shape[1], 3)).astype('uint8')
frame2[:720, :1280,:] = frame
# Put, say, a graph of the FFT on the right side of the video frame
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = scipy.fftpack.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), numberOfSamples/2)
fig, ax = matplotlib.pyplot.subplots()
ax.plot(xf, 2.0/numberOfSamples * np.abs(yf[:numberOfSamples//2]))
matFigureForThisFrame = ????????
# Put the FFT graph on the left side of this video frame
frame2[720:, 1280:, :] = matFigureForThisFrame
out.write(frame2)
vidIndex = vidIndex+1;
out.release()
#cv2.destroyAllWindows()
You could try to take the path of writing to a video file directly, but I wouldn't recommend it (see here why). Writing to video file is more complicated than just changing the frames, you need to need to get the right coders and other painful issues. Personally, I would settle. Some options:
1) Generate the pngs, and afterwards concatenating them to a video file using ffmpeg
2) Save each frame to a buffer, and generate .gif file afterwards, directly in python (so you don't have to run multiple things). See this stackoverflow question on how to do that.

How to generate audio from a numpy array?

I want to create "heart rate monitor" effect from a 2D array in numpy and want the tone to reflect the values in the array.
You can use the write function from scipy.io.wavfile to create a wav file which you can then play however you wish. Note that the array must be integers, so if you have floats, you might want to scale them appropriately:
import numpy as np
from scipy.io.wavfile import write
rate = 44100
data = np.random.uniform(-1, 1, rate) # 1 second worth of random samples between -1 and 1
scaled = np.int16(data / np.max(np.abs(data)) * 32767)
write('test.wav', rate, scaled)
If you want Python to actually play audio, then this page provides an overview of some of the packages/modules.
For the people coming here in 2016 scikits.audiolab doesn't really seem to work anymore. I was able to get a solution using sounddevice.
import numpy as np
import sounddevice as sd
fs = 44100
data = np.random.uniform(-1, 1, fs)
sd.play(data, fs)
in Jupyter the best option is:
from IPython.display import Audio
wave_audio = numpy.sin(numpy.linspace(0, 3000, 20000))
Audio(wave_audio, rate=20000)
In addition, you could try scikits.audiolab. It features file IO and the ability to 'play' arrays. Arrays don't have to be integers. To mimick dbaupp's example:
import numpy as np
import scikits.audiolab
data = np.random.uniform(-1,1,44100)
# write array to file:
scikits.audiolab.wavwrite(data, 'test.wav', fs=44100, enc='pcm16')
# play the array:
scikits.audiolab.play(data, fs=44100)
I had some problems using scikit.audiolabs, so I looked for some other options for this task. I came up with sounddevice, which seems a lot more up-to-date. I have not checked if it works with Python 3.
A simple way to perform what you want is this:
import numpy as np
import sounddevice as sd
sd.default.samplerate = 44100
time = 2.0
frequency = 440
# Generate time of samples between 0 and two seconds
samples = np.arange(44100 * time) / 44100.0
# Recall that a sinusoidal wave of frequency f has formula w(t) = A*sin(2*pi*f*t)
wave = 10000 * np.sin(2 * np.pi * frequency * samples)
# Convert it to wav format (16 bits)
wav_wave = np.array(wave, dtype=np.int16)
sd.play(wav_wave, blocking=True)
PyGame has the module pygame.sndarray which can play numpy data as audio. The other answers are probably better, as PyGame can be difficult to get up and running. Then again, scipy and numpy come with their own difficulties, so maybe it isn't a large step to add PyGame into the mix.
http://www.pygame.org/docs/ref/sndarray.html
Another modern and convenient solution is to use pysoundfile, which can read and write a wide range of audio file formats:
import numpy as np
import soundfile as sf
data = np.random.uniform(-1, 1, 44100)
sf.write('new_file.wav', data, 44100)
Not sure of the particulars of how you would produce the audio from the array, but I have found mpg321 to be a great command-line audio player, and could potentially work for you.
I use it as my player of choice for Anki, which is written in python and has libraries that could be a great starting place for interfacing your code/arrays with audio.
Check out:
anki.sound.py
customPlayer.py

Categories