I am trying to find the start + end time of a person appears in a video.
My current approach is to find the person using face detection, and then track his face using dlib object tracking (i.e. if the person is turning around in the video, i can't know that he is still in the video using face recognition. Therefore i need both detection and tracking techniques).
The problem is that the object tracking still tracks after an object, even if there was a camera shot cut or scene changed.
So, I tried to initialize the tracking object every shot. But, it's not so easy to detect the shots. even with very high sensitivity, ffmpeg and http://mklab.iti.gr/project/video-shot-segm don't return all of the shot cuts.
So, it turns out that I need to compare the object rectangle of the previous frame, with the rectangle detected in the current frame.
Any idea of a function that can give me a "similarity score" between two rectangles in two frames?
Related
I am trying to stitch a moving car, which is not completely visible horizontally, ( although completely visible vertically ) from camera view port.
Camera is stationary, 1-2 meters apart from the moving object ( similar to a gate setup ), taking side view of car. But since Camera is too close it can only capture a part of car.
I tried stitching multiple frames using this tutorial. But this only works when camera itself is rotated around axis or moved. Otherwise, because of background features it places frames onto each other since background is same in each frame ( see tutorial for reference ).
Basically what i'm trying to achieve is, given a video clip of moving car (such that complete car is never in one frame), build image of complete car by stitching the video frames.
Any algorithm, library or reference would be helpful.
For Example
Merging these two images to give complete view of the car.
Thanks for your time!
If you have a video file with a moving object, as well as a moving camera, is it possible to track the distance that the object moved between 10 or 20 frames? I'm using OpenCV to track the object and have no problem finding the distance it travels when the camera is stationary, but I can't seem to wrap my head around a non-stationary camera.
The feed is only a single 2D camera feed and no other tracking is being done.
The only thing I could think of is to grey out the frame horizontally from the lowest to the highest coordinate of the object within the frame, attempt to layer the frames together, then measure the distance the object traveled already knowing the x,y coordinate of the object in each frame. This doesn't seem like a very clean solution so I'm curious to know if there is anything else out there to solve this problem.
Assuming the object to be tracked is not covering the whole image and there is enough background visible, you could try to track the camera movement with visual odometry on the static background.
You can then track the relative motion of the object with respect to the camera as if it where static and then transform the motion back to world coordinates with the known camera movement.
I am building a type of "person counter" that is getting face images from live video footage.
If a new face is detected in some frame the program will count that face/person. I thus need a way to check if a particular face has already been detected.
I have tried using a training program to recognize a template image to avoid counting the same face multiple times but due to there being only one template, the system was massively inaccurate and slightly too slow to run for every frame of the feed.
To better understand the process: at the beginning, as a face is detected the frame is cropped and the (new) face is saved in a file location. Afterwards, faces detected in subsequent frames need to go through a process to detect whether a similar face has been detected before and exist in the database (if they do, they shouldn't get added to the database).
One recipe to face (pun! ;) this could be, for every frame:
get all the faces for all the frames (with opencv you can detect those and crop them)
generate face embeddings for the faces collected (e.g. using a tool for the purpose <- most likely this is the pre-trained component you are looking for, and allows you to "condense" the face image into a vector)
add all the the so-obtained face embeddings to a list
With some pre-defined time interval, run a clustering algorithm (see also Face clustering using Chinese Whispers algorithm) on the list of face embeddings collected. This will allow to group together faces belonging to the same person, and thus count the people appearing in the video.
Once that clusters are consolidated, you could prune some of the faces belonging to the same clusters/persons (to save storage in case you wanted)
I am building a system which detects coins that are picked up from a tray. This tray will be kept in a public place. People will pick up one or more coins, but would be expected to keep them back after some time.
I would have a live stream through a webcam placed at the top. I will have a calibration step, say at the beginning of the day, that captures the initial state of the tray to be used for comparing with the live feed. A few slots might be empty to begin with, as you can see in the sample image.
I need to detect slots that had a coin initially, but are missing the same at any given point of time during the day.
I am trying out a few approaches using OpenCV:
SSIM difference: I can use SSIM to find diff between my live image frame and initial state. However, a number of slots are larger than the corresponding coin sizes (e.g. top two rows). This could mean that if the coin was originally placed at the center, but was later put back to touch one of the edges, we may get a false positive.
Blob detection: Alternatively, I can pre-feed (or detect) slot co-ordinates. Then do a blob detection within every slot. If a blob was present in the original state, but is missing in a camera frame, this would mean a coin has been picked up. However, accurate blob detection could be a challenge if the contrast between the coin and the tray is low.
I might also need to watch out for slight variations in lighting due to shadows of people moving around.
Any thoughts on these or any pointers on alternate approaches that can be tried out? Is there any analogous implementation that I can learn from?
Many thanks in advance.
Edit: Thanks to #I.Newton's suggestion. For those who stumble upon this question and would benefit from a sample implementation, look here: https://github.com/kewats/computer-vision-samples/tree/master/image-processing/missing-coins-detection
If you complete control over the lighting conditions, you can use simple color thresholding to solve the problem.
First make a mask for the boxes. You can do it in multiple ways by color threshold or by using adaptive threshold or canny edge etc. I did by color threshold
Then make a mask for the coins by the same method.
Now flood fill your box mask from from the center of each of this coins. It'll retain only those which do not have the coins.
Now you can compare this with your initial mask to figure out if all the coins are present
This does not include frame subtraction. So you need not worry about different position of coin in the box. Only thing you need to make sure is the lighting conditions for making the masks. If you want to make sure the coins are returned to the same box, you should go for template matching etc which again needs effort.
I'm trying to build a python program to count the number of people crossing the road in 2 directions. The video file is something like this
Now for the detection phase I'm using BackgroundSubtractorMOG() to detect the peoples , now the problem is I want to identify each object separately and track their movements in each consecutive frames .
I'm thinking of using MeanShift for that purpose, now the problem is I'm not getting how to transfer to tracking phase for an object, or initialize the tracking window. In my case I'm ending up detecting the objects as separate in each frame.
I want to know how to detect that an if an object is already detected previously.
Provide some of your code here for reference.
And Instead of object detection try object tracking with detection algorithm being run continuously after some interval. This might solve your issue of finding the previously detected object.
The various tracking algorithms are Boosting, MIL, KCF, TLD