I have audio recordings that starts and ends at different times.
audio 1: -----t1--------------------------s1->time
audio 2: ---------t2----s2------------------->time
audio 3: ------------------------t3-------s3->time
audio 1 is the longest and it overlaps with both audio 2 and 3.
audio 2, and audio 3 are short segments but they do not overlap at all.
Is there a python library that does this?
You could first use a python library to read the audio file (numpy or scipy for instance, see https://stackoverflow.com/a/26716031/3244382).
Then you have to determine t and s for each file. If the files are not too noisy a simple threshold on the audio signal could be sufficient. A little bit more sophisticated approach would be to compute the RMS energy or the envelope (that average the signal), and use a threshold on it.
Once you know s and t, you could write a new audio file from this boundaries with the same audio library.
Related
I've been using a bit of numpy and pysinewave to generate and analyse some sine waves and turn them into audio, but now that I have the audio I couldn't think of a way to detect the frequency or amplitude of the waves at some point of time, for example, I want to detect the amplitude of a wave in the audio in second one. Is there a way to do this in python with some library?
I tried trying to convert the audio files back into numpy to get the waves and try to sort of detect the amplitude every x seconds but I think my problem is way more simple to solve and I am overthinking it
To perform an end-to-end test of an embedded platform that plays musical notes, we are trying to record via a microphone and identify whether a specific sound were played using the device' speakers. The testing setup is not a real-time system so we don't really know when (or even if) the expected sound begins and ends.
The expected sound is represented in a wave file (or similar) we can read from disk.
How can we run a test that asserts whether the sound were played as expected?
There are a few ways to tackle this problem:
Convert the expected sound into a sequence of frequency amplitude
pairs. Then, record the sound via the microphone and convert that
recording into a corresponding sequence of frequency amplitude
pairs. Finally, compare the two sequences to see if they match.
This task can be accomplished using the modules scipy, numpy,
and matplotlib.
We'll need to generate a sequence of frequency amplitude pairs
for the expected sound. We can do this by using the
scipy.io.wavfile.read() function to read in a wave file
containing the expected sound. This function will return a tuple
containing the sample rate (in samples per second) and a numpy
array containing the amplitudes of the waveform. We can then use
the numpy.fft.fft() function to convert the amplitudes into a
sequence of frequency amplitude pairs.
We'll need to record the sound via the microphone. For this,
we'll use the pyaudio module. We can create a PyAudio object
using the pyaudio.PyAudio() constructor, and then use the
open() method to open a stream on the microphone. We can then
read in blocks of data from the stream using the read() method.
Each block of data will be a numpy array containing the
amplitudes of the waveform at that particular moment in time.
We can then use the numpy.fft.fft() function to convert the
amplitudes into a sequence of frequency amplitude pairs.
Finally, we can compare the two sequences of frequency
amplitude pairs to see if they match. If they do match, then we
can conclude that the expected sound was recorded correctly. If
they don't match, then we can conclude that the expected sound
was not recorded correctly.
Use a sound recognition system to identify the expected sound in the recording.
from pydub import AudioSegment
from pydub.silence import split_on_silence, detect_nonsilent
from pydub.playback import play
def get_sound_from_recording():
sound = AudioSegment.from_wav("recording.wav") # detect silent chunks and split recording on them
chunks = split_on_silence(sound, min_silence_len=1000, keep_silence=200) # split on silences longer than 1000ms. Anything under -16 dBFS is considered silence. keep 200ms of silence at the beginning and end
for i, chunk in enumerate(chunks):
play(chunk)
return chunks
Cross-correlate the recording with the expected sound. This will produce a sequence of values that indicates how closely the recording matches the expected sound. A high value at a particular time index indicates that the recording and expected sound match closely at that time.
# read in the wav file and get the sampling rate
sampling_freq, audio = wavfile.read('audio_file.wav')
# read in the reference image file
reference = plt.imread('reference_image.png')
# cross correlate the image and the audio signal
corr = signal.correlate2d(audio, reference)
# plot the cross correlation signal
plt.plot(corr)
plt.show()
This way you can set up your test to check if you are getting the correct output.
I need to
read in variable data from sensors
use those data to generate audio
spit out the generated audio to individual audio output channels in real time
My trouble is with item 3.
Parts 1&2 have a lot in common with a guitar effects pedal, I should think: take in some variable and then adjust the audio output in real time as the input variable changes but don't ever stop sending a signal while doing it.
I have had no trouble using pyaudio to drive wav files to specific channels using the mapping[] parameter of pyaudio.play nor have I had trouble generating sine waves dynamically and sending them out using pyaudio.stream.play.
I'm working with 8 audio output channels. My problem is that stream.play only lets you specify a count of channels and as far as I can tell I can't say, for example, "stream generated_audio to channel 5".
I am super new to audio processing. I have one reference audio file and several other audio recordings (same sentence spoken by different speakers - differ in dialect and duration) and I want to align the all the audio files to the one audio reference file with the least warping. I tried using MFCC and Chroma features (python/librosa) but I don't know what to do next. I was reading about DTW (Dynamic Time Warping) for alignment, would that work? Is there an example/open source project or audio tool which already does this? It seems to be a solved problem but I couldn't find it. Please help.
I was following read this -
https://librosa.github.io/librosa_gallery/auto_examples/plot_music_sync.html but how do I save back the aligned audio in time domain?
This seems related - Dynamic time warping with python (final mapping)
We have a some screen casts that need to be dubbed to various languages for which we have textual script for the target language as shown below:
Begining Time Audio Narration
0:0 blah nao lorep iposm...
1:20 xao dok dkjv dwv....
..
We can record each of the above units separately and then align it at the proper beginning times as mentioned in the above script.
Example:
Input:
Input the N timing values: 0:0,1:20 ...
Then input the N audio recordings
Output:
Audio recordings aligned to the above timings. An overflow should be detected by the system individually whereas an underflow is padded by silence.
Are there any platform independent audio apis \ software or a code snippet preferably in python that allows us to align these audio units based on the times provided?
If the input audio files are uncompressed (i.e., WAV files, etc.), the audio library I like to use is libsndfile. It appears to have a python wrapper here: https://code.google.com/p/libsndfile-python/. With that in mind, the rest could be accomplished like such:
Open an output audio stream to write audio data to with libsndfile
For each input audio file, open an input stream with libsndfile
Extract the meta-data information for the given audio file based on your textual description 'script'
Write any silence needed to your master output stream, and then write the data from the input stream to the output stream. Note current position/time. Repeat this step for each input audio file, checking that the audio clips target start time is always >= the current position/time noted earlier. If not then you have an overlap.
Of course, you have to worry about sample rates matching etc., but that should be enough to get started. Also, I'm not exactly sure if you are trying to write a single output file, or one for each input-file, but this answer should be tweekable enough. libsndfile will give you all the information you need (such as clip lengths, etc.) assuming it supports the input file format.