How can I detect and track people using OpenCV? - python

I have a camera that will be stationary, pointed at an indoors area. People will walk past the camera, within about 5 meters of it. Using OpenCV, I want to detect individuals walking past - my ideal return is an array of detected individuals, with bounding rectangles.
I've looked at several of the built-in samples:
None of the Python samples really apply
The C blob tracking sample looks promising, but doesn't accept live video, which makes testing difficult. It's also the most complicated of the samples, making extracting the relevant knowledge and converting it to the Python API problematic.
The C 'motempl' sample also looks promising, in that it calculates a silhouette from subsequent video frames. Presumably I could then use that to find strongly connected components and extract individual blobs and their bounding boxes - but I'm still left trying to figure out a way to identify blobs found in subsequent frames as the same blob.
Is anyone able to provide guidance or samples for doing this - preferably in Python?

The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:
from cv import *
storage = CreateMemStorage(0)
img = LoadImage(file) # or read from camera
found = list(HOGDetectMultiScale(img, storage, win_stride=(8,8),
padding=(32,32), scale=1.05, group_threshold=2))
So instead of tracking, you might just run the detector in each frame and use its output directly.
See src/cvaux/cvhog.cpp for the implementation and samples/python/peopledetect.py for a more complete python example (both in the OpenCV sources).

Nick,
What you are looking for is not people detection, but motion detection. If you tell us a lot more about what you are trying to solve/do, we can answer better.
Anyway, there are many ways to do motion detection depending on what you are going to do with the results. Simplest one would be differencing followed by thresholding while a complex one could be proper background modeling -> foreground subtraction -> morphological ops -> connected component analysis, followed by blob analysis if required. Download the opencv code and look in samples directory. You might see what you are looking for. Also, there is an Oreilly book on OCV.
Hope this helps,
Nand

This is clearly a non-trivial task. You'll have to look into scientific publications for inspiration (Google Scholar is your friend here). Here's a paper about human detection and tracking: Human tracking by fast mean shift mode seeking

This is similar to a project we did as part of a Computer Vision course, and I can tell you right now that it is a hard problem to get right.
You could use foreground/background segmentation, find all blobs and then decide that they are a person. The problem is that it will not work very well since people tend to go together, go past each other and so on, so a blob might very well consist of two persons and then you will see that blob splitting and merging as they walk along.
You will need some method of discriminating between multiple persons in one blob. This is not a problem I expect anyone being able to answer in a single SO-post.
My advice is to dive into the available research and see if you can find anything there. The problem is not unsolvavble considering that there exists products which do this: Autoliv has a product to detect pedestrians using an IR-camera on a car, and I have seen other products which deal with counting customers entering and exiting stores.

Related

Python quality inspection with opencv (ssim)

I'm currently an intern at a quality inspector company. My job is to write a program that can detect faulty products (for example, missing screw). They take a picture of every single product. My idea is that I choose an image which could serve as a benchmark and I would compare the other images to that, with the SSIM score, and maybe display the faulty part with a rectangle. Is this a viable idea? (Its a strange internship, because it seems like I'm the only one who can code there...) that's why I'm asking here.
It sounds good idea if your goal is to classify different objects within images comparing benchmark image.
But in my experience, SSIM score was sensitive to angle, light or environment.
So in conclusion, if your goal is to classify different objects in images, your idea would work. But if your goal is to classify exactly same objects, it might not be able to classify.

Discard images from a group of similar images

I am generating images (thumbnails) from a video every 3 seconds. Now I need to discard/remove all the similar images. Is there a way I could this?
I generate thumbnails using FFMPEG. I read about various image-diff solutions like given in this SO post, but I do not want to do this manually. How and what parameters should be considered that could tell if a particular image is similar to other images present.
You can calculate the Structural Similarity Index between images and based on the score keep or discard an image. There are other measures you can use, but basically a method that returns a score. Try PIL or OpenCV
https://pillow.readthedocs.io/en/3.1.x/reference/ImageChops.html?highlight=difference
https://www.pyimagesearch.com/2017/06/19/image-difference-with-opencv-and-python/
I dont have enough reputation to comment my idea on your problem, so i will just go ahead and post it as an answer in hope of helping you.
I am quite confused about the term "similar" but since you are reffering on video frames i am going to assume that you want to avoid having "similar" frames that have been captured because of poor camera movement. If that's the case you might want to consider using salient point descriptors.
To be more specific you can detect salient points (using for instance Harris) and then use a point descriptor algorithm (such as SURF) and discard the frames that have been found to have "too many" similar points with a pre-selected frame.
Keep in mind that in order for the above process to be successful, the frames must be as sharp as possible, i guess you don't want to extract as a thubnail a blurred frame anyway. So applying a blurred images detection might be useful in your case.

OpenCV decentralized processing for stereo vision

I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.

Deep learning person detection with opencv

so I'm really new here. Currently working on a public art project where I need a little help with the programming because I'm kind off lost between codes.
First I'll give you a short description of the goal of the work and then state my problem.
I'm putting a webcam in the shopwindow of a gallery that is facing out on a public street. This webcam is connected to a tv screen that is facing outwards on the street so people see themselves being filmed (like cctv). Then if people stand still long enough for the camera the webcam makes an automatic screenshot what will be emailed to a site which hold a script for automatic attachment printing and the people from the street instantly come in to my gallery, on paper.
(and yes I have permission from the gallery to do this since it is slightly in the grey area of legality)
I come from a art background with interest in programming so this was all very very new for me and made it already quite far I think. I have a raspberry pi running with open cv and put a script on it for deep learning object detection (https://www.pyimagesearch.com/2017/09/18/real-time-object-detection-with-deep-learning-and-opencv/) < the link I used for that.
I also come across loads of pedestrian tracking but did not find a suitable code yet for a real time video stream.
So what I need from you guys, is a little help with how to make a timer in the script so that when people stand still long enough for the camera, it wil make the screenshot. It is a bit like reversed security cams script because they react on movement and I want it to react to no movement at all.
The automatic attachment printing part I got covered I think because there are a lot of scripts already on the internet.
If you have any tips or tricks.. please let me know.
Help a girl out!
Marije
There are a number of things you can try.
Is the camera faced towards a shopping street? In that case you could go for simple background subtraction. For each frame, apply some preprocessing (e.g. blurring, morpholoy operations), call findContours and compute the center of minEnclosingRect for each of these.
Another option is to use the inbuilt (and pretrained) HOG PeopleDetector. This is based on SVM (Support Vector Machines), which is another machine learning technique. For this to work efficiently you'd have to tune the parameters adequately. Since you're using a Pi you'd also need to consider the tradeoff between speed and accuracy. Using this technique, we'd be left with rectangles as well, so we can again compute the center.
For both techniques, you'd want to make sure that the center point doesn't fluctuate too much from frame to frame (that would mean the person is moving). For this you'd also want to take into account the framerate and understand that you can't guarantee person detection for every frame.
The caveat of the first technique, whilst having more explanatory power, would be that it'd detect ANYTHING that changes from frame to frame, that includes pets, bikes, cars (if on a public street) and so on. You could then consider filtering (e.g. by area, color).

Image recognition - finding similar images [duplicate]

This question already has answers here:
Checking images for similarity with OpenCV
(6 answers)
Closed 8 years ago.
Setup is as follows:
Database with paintings
robot that takes shots of paintings
I want to compare the shots the robot made with the images in our database.
Problem is that the shots won't be perfect. The painting will most likely be IN the shot, but the shot will also contain wall/other objects. The incidence of light will also cause problems. Therefore I want to be finding images in the database that are similar to a certain degree.
I've been reading up on PIL, scipy, openCV, machine learning.
Is there anything you guys can recommend for this problem?
Thanks in advance.
edit: I'm aware of the solutions presented at other posts. Such as: comparing histograms/template matching and feature matching. Comparing histograms is not going to cut it in my application. Neither will feature matching. As it is to much of a workload. Template matching might, however the angles at which the shots will be taken won't be any near perfect.
You could use the SSIM index. There is a python implementation in scikit-image package.
Your problem sounds more like an application of feature detection and matching. Given a shot captured by the robot, you extract features from it, and compare them against the list of features you have in your database (each image having a lot of features). You might want to look at SURF, or some other descriptor that does your job. OpenCV has very well documented implementations for many variants. Feature matching would be the last stage where you actually make a decision about a match or a non-match.
Note that all of this is really heavy on processing, so forget real-time.

Categories