I am building a type of "person counter" that is getting face images from live video footage.
If a new face is detected in some frame the program will count that face/person. I thus need a way to check if a particular face has already been detected.
I have tried using a training program to recognize a template image to avoid counting the same face multiple times but due to there being only one template, the system was massively inaccurate and slightly too slow to run for every frame of the feed.
To better understand the process: at the beginning, as a face is detected the frame is cropped and the (new) face is saved in a file location. Afterwards, faces detected in subsequent frames need to go through a process to detect whether a similar face has been detected before and exist in the database (if they do, they shouldn't get added to the database).
One recipe to face (pun! ;) this could be, for every frame:
get all the faces for all the frames (with opencv you can detect those and crop them)
generate face embeddings for the faces collected (e.g. using a tool for the purpose <- most likely this is the pre-trained component you are looking for, and allows you to "condense" the face image into a vector)
add all the the so-obtained face embeddings to a list
With some pre-defined time interval, run a clustering algorithm (see also Face clustering using Chinese Whispers algorithm) on the list of face embeddings collected. This will allow to group together faces belonging to the same person, and thus count the people appearing in the video.
Once that clusters are consolidated, you could prune some of the faces belonging to the same clusters/persons (to save storage in case you wanted)
Related
I am running haar cascade on micropython to detect license plate real-time. The image then gets saved & passed on to the next function which pre-processes it (dilation, erosion, binarize) before segmenting the individual characters - 7 of them and saving these detected characters. (This will eventually be passed to a character recognition model)
It works well on an image however when it's running in real-time, how would one control the threshold of segmentation ? The problem I'm facing is that from a live video stream, it captures and continually overwrites the images. The segmentation is hit-and-miss so sometimes I detect only 3 characters, other times I detect 8, of which 3 characters are repeated.
Currently, off the top of my head, I'm thinking of a more straightforward detecting rectangles - instead of haar - with a high threshold so that the image obtained is as clear as possible before pre-processing, this would aid in the segmentation aspect perhaps. Is there any other way to improve segmentation? Like trying to get 7 characters or ensuring they're properly detected?
My current set-up detects the 'contours' (blobs in my case as it's micropython on embedded device) & I then filter it by height > width so it works well enough on images, just that in a live video stream the quality captured and detection cannot be guaranteed.
I am trying to detect plants in the photos, i've already labeled photos with plants (with labelImg), but i don't understand how to train model with only background photos, so that when there is no plant here model can tell me so.
Do I need to set labeled box as the size of image?
p.s. new to ml so don't be rude, please)
I recently had a problem where all my training images were zoomed in on the object. This meant that the training images all had very little background information. Since object detection models use space outside bounding boxes as negative examples of these objects, this meant that the model had no background knowledge. So the model knew what objects were, but didn't know what they were not.
So I disagree with #Rika, since sometimes background images are useful. With my example, it worked to introduce background images.
As I already said, object detection models use non-labeled space in an image as negative examples of a certain object. So you have to save annotation files without bounding boxes for background images. In the software you use here (labelImg), you can use verify image to say that it saves the annotation file of the image without boxes. So it saves a file that says it should be included in training, but has no bounding box information. The model uses this as negative examples.
In your case, you don't need to do anything in that regard. Just grab the detection data that you created and train your network with it. When it comes to testing, you usually set a threshold for bounding boxes accuracy, because you may get lots of them so you only want the ones with the highest confidence.
Then you get/show the ones with highest bbox accuracies and there your go, you get your detection result and you can do what ever you want like cropping them using the bounding box coordinates you get.
If there are no plants, your network will likely create bboxes with an accuracy below your threshold (very low confidence) and then, you just ignore them.
I am working on a face detection and recognition app in python using tensorflow and opencv. The overall flow is as follow:
while True:
#1) read one frame using opencv: ret, frame = video_capture.read()
#2) Detect faces in the current frame using tensorflow (using mxnet_mtcnn_face_detection)
#3) for each detected face in the current frame, run facenet algorithm (tensorflow) and compare with my Database, to find the name of the detected face/person
#4) displlay a box around each face with the name of the person using opencv
Now, my issue is that the overhead (runtime) of face detection and recognition is very high and thus sometime the output video is more like a slow motion! I tried to use tracking methods (e.g., MIL, KCF), but in this case, I cannot detect new faces coming into the frame! Any approach to increase the speedup? At least to get rid of "face recognition function" for those faces that already recognized in previous frames!
I am trying to find the start + end time of a person appears in a video.
My current approach is to find the person using face detection, and then track his face using dlib object tracking (i.e. if the person is turning around in the video, i can't know that he is still in the video using face recognition. Therefore i need both detection and tracking techniques).
The problem is that the object tracking still tracks after an object, even if there was a camera shot cut or scene changed.
So, I tried to initialize the tracking object every shot. But, it's not so easy to detect the shots. even with very high sensitivity, ffmpeg and http://mklab.iti.gr/project/video-shot-segm don't return all of the shot cuts.
So, it turns out that I need to compare the object rectangle of the previous frame, with the rectangle detected in the current frame.
Any idea of a function that can give me a "similarity score" between two rectangles in two frames?
I am trying to detect a vehicle in an image (actually a sequence of frames in a video). I am new to opencv and python and work under windows 7.
Is there a way to get horizontal edges and vertical edges of an image and then sum up the resultant images into respective vectors?
Is there a python code or function available for this.
I looked at this and this but would not get a clue how to do it.
You may use the following image for illustration.
EDIT
I was inspired by the idea presented in the following paper (sorry if you do not have access).
Betke, M.; Haritaoglu, E. & Davis, L. S. Real-time multiple vehicle detection and tracking from a moving vehicle Machine Vision and Applications, Springer-Verlag, 2000, 12, 69-83
I would take a look at the squares example for opencv, posted here. It uses canny and then does a contour find to return the sides of each square. You should be able to modify this code to get the horizontal and vertical lines you are looking for. Here is a link to the documentation for the python call of canny. It is rather helpful for all around edge detection. In about an hour I can get home and give you a working example of what you are wanting.
Do some reading on Sobel filters.
http://en.wikipedia.org/wiki/Sobel_operator
You can basically get vertical and horizontal gradients at each pixel.
Here is the OpenCV function for it.
http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#sobel
Once you get this filtered images then you can collect statistics column/row wise and decide if its an edge and get that location.
Typically geometrical approaches to object detection are not hugely successful as the appearance model you assume can quite easily be violated by occlusion, noise or orientation changes.
Machine learning approaches typically work much better in my opinion and would probably provide a more robust solution to your problem. Since you appear to be working with OpenCV you could take a look at Casacade Classifiers for which OpenCV provides a Haar wavelet and a local binary pattern feature based classifiers.
The link I have provided is to a tutorial with very complete steps explaining how to create a classifier with several prewritten utilities. Basically you will create a directory with 'positive' images of cars and a directory with 'negative' images of typical backgrounds. A utiltiy opencv_createsamples can be used to create training images warped to simulate different orientations and average intensities from a small set of images. You then use the utility opencv_traincascade setting a few command line parameters to select different training options outputting a trained classifier for you.
Detection can be performed using either the C++ or the Python interface with this trained classifier.
For instance, using Python you can load the classifier and perform detection on an image getting back a selection of bounding rectangles using:
image = cv2.imread('path/to/image')
cc = cv2.CascadeClassifier('path/to/classifierfile')
objs = cc.detectMultiScale(image)