Detecting a noise within an audio stream in Python - python

My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sounds is being played out of the speakers, by applications such as games for example
My "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect for example can be found here.
If I break that down, i believe I need two things:
A way to sample the audio that is being streamed to an audio device -- perhaps something based on this? or potentially sounddevice - but I can't determine how to make this work by looking at their api?
A way to compare each sample with my "audio to detect" sound file.
The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
My question is - can somebody help me with the two parts to my question:
How do I sample the audio device on my PC using python
How should I attempt this comparison (ideally with a little example)
Many thanks in advance.

Related

How would you approach selectively isolating and amplifying specific audio streams?

I am trying to wrap my head around how I'd go about isolating and amplifying specific sound streams in real time. I am playing with code that enables you to classify specific sounds in a given sound environment (i.e. isolating the guitar in a clip of a song) -- not difficult.
The issue is, how does one then selecitvely amplify the target audio stream? The most common output from existing audio diarizers is a probability of the audio signal belonging to a given class. The crux appears to be using that real-time class probability to identify the stream and then amplify it.

How to extract audio after particular sound?

Let's say I have a few very long audio files (for ex., radio recordings). I need to extract 5 seconds after particular sound (for ex., ad start sound) from each file. Each file may contain 3-5 such sounds, so I should get *(3-5)number of source files result files.
I found librosa and scipy python libraries, but not sure if they can help. What should I start with?
You could start by calculating the correlation of the signal with your particular sound. Not sure if librosa offers this. I'd start with scipy.signal.correlate or scipy.signal.convolve.
Not sure what your background is. Start here if you need some theory.
Basically the correlation will be high if the audio matches your particular signal or is very similar to it. After identifying these positions you can select an area around them.

OpenCV decentralized processing for stereo vision

I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.

Filter applause from videos

I am currently watching a video from 27C3 and I would like to filter the applause, as it is very loud. Is this possible? I have heard something like this was made for Vuvuzelas.
I use Ubuntu. If this filter would work via ffmpeg this would be great. If it is written in Python it would also be ok.
Here is an example: http://www.youtube.com/watch?v=TIViQuCX7XM#t=5m4s
No, this isn't possible. The sound of applause covers a very wide band. Filtering vuvuzelas was somewhat possible because they were all close to the same pitch. Applause is all over.
If you want to experiment, pull up this video and play with your EQ in VLC. Even with the wide bands of a 7-band EQ, you'll be dropping quite a few to cut the audience, thus leaving you with nothing.
As Brad said, this is not possible with a static frequency filter, however, if you have some knowledge in signal theory, and a fair lot of free time, you could write an active noise control system. See google scholar for some examples of such a filter (like this one).
You could use a dynamic range compressor, this will not filter out the applause, but at least it will smooth out the loudness. You can give it a threshold so that it doesn't affect sound below that threshold.

How can I detect and track people using OpenCV?

I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
None of the Python samples really apply
The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?
The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:
from cv import *
storage = CreateMemStorage(0)
img = LoadImage(file) # or read from camera
found = list(HOGDetectMultiScale(img, storage, win_stride=(8,8),
padding=(32,32), scale=1.05, group_threshold=2))
So instead of tracking, you might just run the detector in each frame and use its output directly.
See src/cvaux/cvhog.cpp for the implementation and samples/python/peopledetect.py for a more complete python example (both in the OpenCV sources).
Nick,
What you are looking for is not people detection, but motion detection. If you tell us a lot more about what you are trying to solve/do, we can answer better.
Anyway, there are many ways to do motion detection depending on what you are going to do with the results. Simplest one would be differencing followed by thresholding while a complex one could be proper background modeling -> foreground subtraction -> morphological ops -> connected component analysis, followed by blob analysis if required. Download the opencv code and look in samples directory. You might see what you are looking for. Also, there is an Oreilly book on OCV.
Hope this helps,
Nand
This is clearly a non-trivial task. You'll have to look into scientific publications for inspiration (Google Scholar is your friend here). Here's a paper about human detection and tracking: Human tracking by fast mean shift mode seeking
This is similar to a project we did as part of a Computer Vision course, and I can tell you right now that it is a hard problem to get right.
You could use foreground/background segmentation, find all blobs and then decide that they are a person. The problem is that it will not work very well since people tend to go together, go past each other and so on, so a blob might very well consist of two persons and then you will see that blob splitting and merging as they walk along.
You will need some method of discriminating between multiple persons in one blob. This is not a problem I expect anyone being able to answer in a single SO-post.
My advice is to dive into the available research and see if you can find anything there. The problem is not unsolvavble considering that there exists products which do this: Autoliv has a product to detect pedestrians using an IR-camera on a car, and I have seen other products which deal with counting customers entering and exiting stores.

Categories