I am trying to wrap my head around how I'd go about isolating and amplifying specific sound streams in real time. I am playing with code that enables you to classify specific sounds in a given sound environment (i.e. isolating the guitar in a clip of a song) -- not difficult.
The issue is, how does one then selecitvely amplify the target audio stream? The most common output from existing audio diarizers is a probability of the audio signal belonging to a given class. The crux appears to be using that real-time class probability to identify the stream and then amplify it.
Related
My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sounds is being played out of the speakers, by applications such as games for example
My "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect for example can be found here.
If I break that down, i believe I need two things:
A way to sample the audio that is being streamed to an audio device -- perhaps something based on this? or potentially sounddevice - but I can't determine how to make this work by looking at their api?
A way to compare each sample with my "audio to detect" sound file.
The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
My question is - can somebody help me with the two parts to my question:
How do I sample the audio device on my PC using python
How should I attempt this comparison (ideally with a little example)
Many thanks in advance.
I need to
read in variable data from sensors
use those data to generate audio
spit out the generated audio to individual audio output channels in real time
My trouble is with item 3.
Parts 1&2 have a lot in common with a guitar effects pedal, I should think: take in some variable and then adjust the audio output in real time as the input variable changes but don't ever stop sending a signal while doing it.
I have had no trouble using pyaudio to drive wav files to specific channels using the mapping[] parameter of pyaudio.play nor have I had trouble generating sine waves dynamically and sending them out using pyaudio.stream.play.
I'm working with 8 audio output channels. My problem is that stream.play only lets you specify a count of channels and as far as I can tell I can't say, for example, "stream generated_audio to channel 5".
Let's say I have a few very long audio files (for ex., radio recordings). I need to extract 5 seconds after particular sound (for ex., ad start sound) from each file. Each file may contain 3-5 such sounds, so I should get *(3-5)number of source files result files.
I found librosa and scipy python libraries, but not sure if they can help. What should I start with?
You could start by calculating the correlation of the signal with your particular sound. Not sure if librosa offers this. I'd start with scipy.signal.correlate or scipy.signal.convolve.
Not sure what your background is. Start here if you need some theory.
Basically the correlation will be high if the audio matches your particular signal or is very similar to it. After identifying these positions you can select an area around them.
I have a piece of python script that runs through a continuous loop (~5Hz) to obtain data from a set of sensors connected to my PC, much like a proximity sensor.
I would like to translate this sensor data into audio output, using python and in a continuous manner. That is: whilst my sensor loop is running I want to generate and play a continuous sinusoidal audio sound, of which the frequency is modulated by the sensor output (e.g. higher sensor value = higher frequency). This is sort of the output that I want (without the GUI, of course: http://www.szynalski.com/tone-generator/)
I've looked through a lot of the available packages (pyDub, pyAudio, Winsound)but all seem to solve a piece of the puzzle, either signal generation, saving or playing, but I can't seem to find out how to combine them.
It's possible to perform frequency modulation and link different frequencies together and then save them, how to play them in real-time and without clogging up my sensor
It's possible to play threaded audio using WinSound -> but how to update frequence in real-time?
Or is this not a feasible route to walk on using python and should I write a script that inputs the sensor data into another more audio-friendly language?
Thanks.
I have a piece of python script that runs through a continuous loop (~5Hz)
Does it not work if you just add winsound.Beep(frequency_to_play, 1) in the loop?
I am super new to audio processing. I have one reference audio file and several other audio recordings (same sentence spoken by different speakers - differ in dialect and duration) and I want to align the all the audio files to the one audio reference file with the least warping. I tried using MFCC and Chroma features (python/librosa) but I don't know what to do next. I was reading about DTW (Dynamic Time Warping) for alignment, would that work? Is there an example/open source project or audio tool which already does this? It seems to be a solved problem but I couldn't find it. Please help.
I was following read this -
https://librosa.github.io/librosa_gallery/auto_examples/plot_music_sync.html but how do I save back the aligned audio in time domain?
This seems related - Dynamic time warping with python (final mapping)