How to write a video file using Moviepy - python

I am trying to implement a startle burst into a .mp4 file. I have been able to do this successfully but how I have altered the original audio results in an array. The trouble I am having is that the 'write_videofile' function seems to only take a specific type of structure in order to write the audio file. This is what I have so far any help would be great.
#Necessary imports
import os as os
import moviepy.editor as mp
from playsound import playsound
import librosa as librosa
import librosa.display
import sounddevice as sd
import numpy as np
from moviepy.audio.AudioClip import AudioArrayClip
#Set working directory
os.chdir('/Users/matthew/Desktop/Python_startle')
#Upload the video and indicate where in the clip the burst should be inserted
vid = mp.VideoFileClip("37vids_balanced/Baby_Babble.mp4")
audio, sr = librosa.load('37vids_balanced/Baby_Babble.mp4')
librosa.display.waveplot(audio, sr = sr)
sd.play(audio, sr)
audio_duration = librosa.get_duration(filename = '37vids_balanced/Baby_Babble.mp4')
start_point_sec = 6
start_point_samp = start_point_sec * sr
burst_samp = int(sr * .05) #how many samples are in the noise burst
burst_window = audio[start_point_samp : start_point_samp + burst_samp].astype(np.int32) #Isolating part of the audio clip to add burst to
#Creating startle burst
noise = np.random.normal(0,1,len(burst_window))
#Inserting the noise created into the portion of the audio that is wanted
audio[start_point_samp : start_point_samp + burst_samp] = audio[start_point_samp : start_point_samp + burst_samp].astype(np.int64) + noise #Puts the modified segment into the original audio
librosa.display.waveplot(audio, sr = sr)
sd.play(audio, sr)
new_vid = vid.set_audio(audio)
sd.play(new_vid.audio, sr)
#Trying to get the audio into the proper form to convert into a mp4
new_vid.audio = [[i] for i in new_vid.audio] #This makes the data into a list because the 'AudioArrayClip' could not take in an array because an array is not 'iterable' this might not be the best solution but it has worked so far
new_vid.audio = AudioArrayClip(new_vid.audio, fps = new_vid.fps)
new_vid.write_videofile('37vids_balanced_noise/Baby_Babble.mp4')
Maybe there is an easier way to take the array into an iterable form that would work.

Related

How to save with .mp3 instead of .wav?

I'm trying to make a voice recorder, in this portion of code the file is saved with the .wavformat but i prefere the .mp3 format, how to save the file with the .mp3 extension? I think that SoundFile can't use mp3 because if i write .mp3 instead of .wav the program gives me an error, thanks
with sf.SoundFile("output.wav", mode='w', samplerate=44100,channels=2) as file:
#Create an input stream to record audio without a preset time
with sd.InputStream(samplerate=44100, channels=2, callback=callback):
while recording == True:
#Set the variable to True to allow playing the audio later
file_exists =True
#write into file
file.write(q.get())
Soundfile, or, more precisely, libsndfile, does not support mp3. Consider using a different library (e.g. pydub). Here's an example where I am writing a sine wave to mp3:
import numpy as np
from pydub import AudioSegment
sample_rate = 48000
frequency = 440 # Hz
length = 3 # in seconds
t = np.linspace(0, length, sample_rate * length)
y = np.sin(frequency * 2 * np.pi * t).astype("float32") # Has frequency of 440Hz
audio_segment = AudioSegment(
y.astype("float32").tobytes(),
frame_rate=sample_rate,
sample_width=y.dtype.itemsize,
channels=1
)
audio_segment.export("test.mp3", format="mp3", bitrate="128k")

Isolating audio foreground and converting back to audio stream using librosa

I'm trying to isolate the foreground of an audio stream and then save it as a standalone audio stream using librosa.
Starting with this seemingly relevant example.
I have the full, foreground and background data isolated as the example does in S_full, S_foreground and S_background but I'm unsure as to what to do to use those as audio.
I attempted to use librosa.istft(...) to convert those and then save that as a .wav file using soundfile.write(...) but I'm left with a file of roughly the right size but unusable(?) data.
Can anyone describe or point me at an example?
Thanks.
in putting together the minimal example,
istft() with the original sampling rate does in fact work.
I'll find my bug, somewhere.
FWIW here's the working code
import numpy as np
import librosa
from librosa import display
import soundfile
import matplotlib.pyplot as plt
y, sr = librosa.load('audio/rb-testspeech.mp3', duration=5)
S_full, phase = librosa.magphase(librosa.stft(y))
S_filter = librosa.decompose.nn_filter(S_full,
aggregate=np.median,
metric='cosine',
width=int(librosa.time_to_frames(2, sr=sr)))
S_filter = np.minimum(S_full, S_filter)
margin_i, margin_v = 2, 10
power = 2
mask_v = librosa.util.softmask(S_full - S_filter,
margin_v * S_filter,
power=power)
S_foreground = mask_v * S_full
full = librosa.amplitude_to_db(S_full, ref=np.max)
librosa.display.specshow(full, y_axis='log', sr=sr)
plt.title('Full spectrum')
plt.colorbar()
plt.tight_layout()
plt.show()
print("y({}): {}".format(len(y),y))
print("sr: {}".format(sr))
full_audio = librosa.istft(S_full)
foreground_audio = librosa.istft(S_foreground)
print("full({}): {}".format(len(full_audio), full_audio))
soundfile.write('orig.WAV', y, sr)
soundfile.write('full.WAV', full_audio, sr)
soundfile.write('foreground.WAV', foreground_audio, sr)

how to repeat the audio wav file such that it becomes at least 6-seconds long in python

I am working on a sound classification problem and I want that my audio file should be atleast 6 second long. If it is not I want to extend it by running in the loop. and then find its STFT which is in librosa library of python.
This might help a bit
import math
import soundfile as sf
import numpy as np
import librosa
data, samplerate = sf.read('test1.wav')
channels = len(data.shape)
length_s = len(data)/float(samplerate)
if(length_s < 6.0):
n = math.ceil(6*samplerate/len(data))
if(channels == 2):
data = np.tile(data,(n,1))
else:
data = np.tile(data,n)
sf.write('new.wav', data, samplerate)
# now calculate stft for each channel if stereo

How to write pyaudio output into audio file?

I currently have the following code, which produces a sine wave of varying frequencies using the pyaudio module:
import pyaudio
import numpy as np
p = pyaudio.PyAudio()
volume = 0.5
fs = 44100
duration = 1
f = 440
samples = (np.sin(2 * np.pi * np.arange(fs * duration) * f /
fs)).astype(np.float32).tobytes()
stream = p.open(format = pyaudio.paFloat32,
channels = 1,
rate = fs,
output = True)
stream.write(samples)
However, instead of playing the sound, is there any way to make it so that the sound is written into an audio file?
Add this code at the top of your code.
from scipy.io.wavfile import write
Also, add this code at the bottom of your code.
This worked for me.
scaled = numpy.int16(s/numpy.max(numpy.abs(s)) * 32767)
write('test.wav', 44100, scaled)
Using scipy.io.wavfile.write as suggested by #h lee produced the desired results:
import numpy
from scipy.io.wavfile import write
volume = 1
sample_rate = 44100
duration = 10
frequency = 1000
samples = (numpy.sin(2 * numpy.pi * numpy.arange(sample_rate * duration)
* frequency / sample_rate)).astype(numpy.float32)
write('test.wav', sample_rate, samples)
Another example can be found on the documentation: https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.wavfile.write.html
Handle your audio input as a numpy array like I did here in the second answer, but instead of just processing the frames and sending the data back to PyAudio, save each frame in a new output_array. Then when the processing is done you can use that output_array to write it to .wav or .mp3 file.
If you do that, however, the sound would still play. If you don't want to play the sound you have two options, either using the blocking mode or, if you wanna stick with the non blocking mode and the callbacks, do the following:
Erase output=True so that it is False as default.
Add a input=True parameter.
In your callback do not return ret_data, instead return None.
Keep count of the frames you've processed so that when you're done you return paComplete as second value of the returned tuple.

Demodulating an FSK signal in Python

I am trying to write a Python script that can demodulate an FSK modulated audio file and return the data encoded in the audio. The data being transmitted is GPS NMEA strings which are embedded as the audio channel in video files. Basically, text is encoded with FSK modulation, and I am trying to retrieve the text using Python. The device I am using to encode the data can also decode it, so I have been able to generate the correct output, but I need to be able to do it using software.
I have done some background reading to introduce myself to signal processing and FSK, and I have looked at example scripts (e.g. this one and minimodem).
I managed to write a Python script that runs successfully, although the output is incorrect. The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786. I think I am missing a symbol synchronizer, but I'm not sure how this works.
My question now is: how can I add symbol synchronization to the script and/or symbol timing? Are there better examples or explanations of how to do FSK demodulation in Python? I would appreciate any feedback or direction. Thank you.
Here's my script so far:
from scipy.io.wavfile import read
import numpy as np
import wave
import matplotlib.pyplot as plt
import scipy.signal as signal
from scipy.signal import blackman, butter
from scipy.fftpack import fft, rfft, rfftfreq, irfft
import scipy.signal.signaltools as sigtool
import binascii
# Read in data; 'wav' allows getting paramters, 'wav1' is actual signal data
wavfile = 'Sample4_160224_mono.wav'
wavfile1 = open(wavfile, 'r')
wav = wave.open(wavfile, 'r')
wav_1 = read(wavfile1)
params = wav.getparams()
N = params[3] #Sample size
wav1 = read(wavfile1)
wav2 = wav1[1][0:N]
duration = float(params[3] / params[2])
n_samples = len(wav2)
Fs = params[2]
nyq = 0.5 * Fs #Nyquist rate
Fbit = (params[2]*params[0]*16)/100
print "Fbit", Fbit
# Windowing function
w = blackman(n_samples)
print "W is", w
# FFT
wfft = rfft(wav2 * w)
wfft_norm = wfft/N
wfft_norm = abs(wfft_norm[range(N/2)])
# Working with frequencies...
freqs = rfftfreq(len(wfft_norm))
index = np.argmax(np.abs(wfft)) #Returns the index of the maximum absolute value of the windowed FFT
freq = freqs[index] #Returns the frequency from the above index
freq_range = [freq - 0.01, freq + 0.01]
freq_in_Hz = abs(freq * params[2]) #Converts the Hz
freq_range_Hz = [abs(freq_range[0] * params[2]), abs(freq_range[1] * params[2])]
# Differentiator
diff = np.diff(wav2)
# Envelope detector
env = np.abs(sigtool.hilbert(diff))
print "ENV", len(env)
# Low-pass filter
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
filt = signal.lfilter(h, 1, env)
# Signal's mean
mean = np.mean(filt)
#Do some crazy stuff to get binary **maybe wrong**
rx_data = []
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
for bit in sampled_signal:
if bit > mean:
rx_data.append(int(1))
else:
rx_data.append(int(0))
# Save raw binary output
rx_data1 = ''.join(map(str, (rx_data)))
outfile1 = open('FSK_wav6_output_binary.txt', 'w')
outfile1.write(rx_data1)
outfile1.close()
Seems that you use multiple channles and the sound you need is embedded in one of them.
So far I have found few problems in your scripts:
Nyquist rate is not a half rate of your sound. It is the rate which could sample the original sound wave, and should be at least 2 times bigger than the sound sampling rate. Hence,
nyq = 0.5 * Fs
is wrong.
If you take advantage of the noiseless sound to demodulate, then the Differentiator can be omitted.
For the low-pass filter:
h = signal.firwin(numtaps = 10, cutoff = freq_range[1], nyq = nyq)
the cutoff frequency is your data sample rate, please read this.
filt is the final signal which can extract the specific data you desire.
How to choose points in sampled_signal to recreate the original signal actually depends on the ratio between the original signal rate and the sampling rate. Just like the first link you provided, assuming the data were written in 11025 Hz and the sampling or recording rate is 44100 Hz, then the code you gave:
sampled_signal = env[Fs/Fbit/2:params[3]+1:]
should be:
sampled_signal = filt[Fs/Fbit*2:params[3]:Fs/Fbit*4]
where Fs/Fbit*2 is the beginning, params[3] is the ending, Fs/Fbit*4 is the step length.
The correct output derived from the encoding/decoding device has 8,280 raw binary (0 and 1) characters, the Python output has 1,344,786.
It is normal, because of different sample rates, you can add some special characters acting like a start-sign and end-sign in you text, and try to find them, then you might find the data with correct lenght you need.

Categories