I am trying to wrap my head around how I'd go about isolating and amplifying specific sound streams in real time. I am playing with code that enables you to classify specific sounds in a given sound environment (i.e. isolating the guitar in a clip of a song) -- not difficult.
The issue is, how does one then selecitvely amplify the target audio stream? The most common output from existing audio diarizers is a probability of the audio signal belonging to a given class. The crux appears to be using that real-time class probability to identify the stream and then amplify it.
My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sounds is being played out of the speakers, by applications such as games for example
My "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect for example can be found here.
If I break that down, i believe I need two things:
A way to sample the audio that is being streamed to an audio device -- perhaps something based on this? or potentially sounddevice - but I can't determine how to make this work by looking at their api?
A way to compare each sample with my "audio to detect" sound file.
The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
My question is - can somebody help me with the two parts to my question:
How do I sample the audio device on my PC using python
How should I attempt this comparison (ideally with a little example)
Many thanks in advance.
Let's say I have a few very long audio files (for ex., radio recordings). I need to extract 5 seconds after particular sound (for ex., ad start sound) from each file. Each file may contain 3-5 such sounds, so I should get *(3-5)number of source files result files.
I found librosa and scipy python libraries, but not sure if they can help. What should I start with?
You could start by calculating the correlation of the signal with your particular sound. Not sure if librosa offers this. I'd start with scipy.signal.correlate or scipy.signal.convolve.
Not sure what your background is. Start here if you need some theory.
Basically the correlation will be high if the audio matches your particular signal or is very similar to it. After identifying these positions you can select an area around them.
Pure tones in Psychopy are ending with clicks. How can I remove these clicks?
Tones generated within psychopy and tones imported as .wav both have the same problem. I tried adding 0.025ms of fade out in the .wav tones that I generated using Audacity. But still while playing them in psychopy, they end with a click sound.
Now I am not sure how to go ahead with this. I need to perform a psychoacoustic experiment and it can not proceed with tone presentation like that.
Crackling sounds or clicks are, to my knowledge, often associated with buffering errors. Many years back, I experienced similar problems on Linux systems when an incorrect bitrate was set. So there could be at least two possible culprits at work here: the bitrate, and the buffer size.
You already applied both an onset and offset ramp to allow the membranes to swing in/out, so this should not be the issue. (By the way, I think you meant 0.025 seconds instead of ms? Otherwise, the ramps would be too short!)
PyGame initializes the sound system with the following settings:
initPygame(rate=22050, bits=16, stereo=True, buffer=1024)
Whereas Pyo initializes it the following way:
initPyo(rate=44100, stereo=True, buffer=128)
The documentation of psychopy.sound states:
For control of bitrate and buffer size you can call psychopy.sound.init before
creating your first Sound object:
from psychopy import sound
sound.init(rate=44100, stereo=True, buffer=128)
s1 = sound.Sound('ding.wav')
So, I would suggest you:
Try out both sound backends, Pyo and PyGame -- you can change which one to use in the PsychoPy preferences under General / audio library. Change the field to ['pyo'] to use Pyo only, or to ['pygame'] to use only PyGame.
Experiment with different settings for bitrate and buffer size with both backends (Pyo, PyGame).
If you want to get started with serious psychoacoustics, however, I would suggest you do not use either of the proposed solutions, and get some piece of professional sound hardware or a data-acquisition board with analog outputs, which will deliver undistorted sound with sub-millisecond precision, such as the devices produced by National Instruments or competitors. The NI boards can be controlled from Python via PyLibNIDAQmx.
Clicks in the beginning and end of sounds often occur because the sound is stopped mid-way so that the wave abruptly goes from some value to zero. This waveform can only be made using high-amplitude high-frequency waves superimposed on the signal, i.e. a click. So the solution is to make the wave stop while on zero.
Are you using an old version of psychopy? If yes, then upgrade. Newer versions add a Hamming window (fade in/out) to self-generated tones which should avoid the click.
For the .wav files, try adding (extra) silence in the end, e.g. 50 ms. It might be that psychopy stops the sound prematurely.
I am not really sure where to start on a project. I designed a channel for the Roku using Brightscript over the past several months. I now need to design a similar project for a different device but using Python. I don't know a lot about Python, but from what I have read it looks fairly easy to learn. My question is in Brightscript I had to draw a canvas passing certain parameters like size, location and color. This essentially was the video player. In Python to create a video player, do you have to draw a canvas or the like? Brightscript comes with components like roVideoPlayer where the code passes needed information into this object. Are there modules for Python that can be imported that create the components?
Thanks for advice