I'm working on a Raspberry PI, an embedded linux platform with Raspbian Jessie where Python 2.7 is already installed, and I have OpenCV algorithms that must run in real-time and must apply several HAAR Cascade classifiers on the same image. Is there any method to reduce the time of these operations? such as multithreading for example?
I also hear about GPU calculations but I didn't know from where I can start.
Thank you for the help.
If you haven't already done so, you should consider the following:
Reduce image size to the minimum required size for recognizing the target object for each classifier. If different objects require different resolutions, you can even use a set of copies of the original image, with different sizes.
Identify search regions for each classifier and thereby reduce the search area. For example, if you are searching for face landmarks, you can define search regions for the left eye, right eye, nose, and mouth after running the face detector and finding the rectangle that contains the face.
I am not very sure if optimization is going to be very helpful, because OpenCv already does some hardware optimization.
OpenCv with OpenCL for parallel processing can be of use to you, as raspberry pi 3 is Quad core. I do not think GPU that comes with these boards are enough powerful. You can try Qualcomm's DSP to computer vision and Neural Network. Nvidia's Tegra GPUs are another option.
Related
I'm trying to learn computer vision and more specifically open-cv in python.
I want to make a program that would track my barbell in a video and show me its path. (I know apps like this exists but I want to make it myself). I tried using the Canny edge detection and the HoughCircles functions but I seem to get everything but a good result.
I have been using this code to find the edges of my image:
gray = cv.cvtColor(src=img, code=cv.COLOR_BGR2GRAY)
blur = cv.blur(gray, (2,2))
canny = cv.Canny(blur, 60, 60)
And then this code to find the circle:
circles = cv.HoughCircles(canny, cv.HOUGH_GRADIENT, dp=2, minDist=1000, circles=None,maxRadius=50)
This is the result:
Result
left = original image with detected circle // right = canny image
Is this the right way to go or should I use another method?
Train the YOLO model for the barbell to detect barbel object is better than anything you tried with OpenCV. You need at least 500 images. Those images can be found on the internet easily. This tutorial is kick start tutorial on YOLO. Let's give a try.
If you tweak the parameters of HoughCircles it may recognize the barbell [EDIT: but with more preprocessing, gamma correction, blurring etc., so better not], however OpenCV has many algorithms for such object tracking - only a region from the image has to be specified first (if that's OK).
In your case the object is always visible and is not changing much, so I guess many of the available algorithms would work fine.
OpenCV has a built-in function for selection:
initBB = cv2.selectROI("Frame", frame, fromCenter=False, showCrosshair=True)
See this tutorial for tracking: https://www.pyimagesearch.com/2018/07/30/opencv-object-tracking/
The summary from the author suggestion is:
CSRT Tracker: Discriminative Correlation Filter (with Channel and Spatial Reliability). Tends to be more accurate than KCF but slightly slower. (minimum OpenCV 3.4.2)
Use CSRT when you need higher object tracking accuracy and can tolerate slower FPS throughput
I guess accuracy is what you want, if it is for offline usage.
Can you share a sample video?
What's your problem exactly? Why do you track the barbell? Do you need semantic segmentation or normal detection? These are important questions. Canny is a very basic approach It' needs a very stable background to use it. That's why there is deep learning to handle that kind of problem If we need to talk about deep learning you can use MaskRCNN, yolvoV4, etc. there are many available solutions out there.
I am trying to create an application that is able to detect and track the iris of an eye in a live video stream. In order to do that, I want to use Python and OpenCV. While researching for this on the internet, it seemed to me that there are multiple possible ways to do that.
First Way:
Run a Canny Filter to get the edges, and then use HoughCircle to find the Iris.
Second Way:
Use Otsus-Algorithm to find the perfect threshold and then use cv2.findContours() to find the Iris.
Since I want this to run on a Raspberry Pi (4B), my question is which of these methods is better, especially in terms of reliability and performance?
I would take a third path and start from a well enstablished method for facial landmark detection (e.g. dlib). You can use a pre-trained model to get a reliable estimate on the position of the eye.
This is an example output from a facial landmark detector:
Then you go ahead from there to find the iris, either using edge detection, Hough or whathever.
Probably you can simply use an heuristic as you can assume the iris to be always in the center of mass of the keypoints around each eye.
There are also some good tutorials online in a similar setting (even for Raspberry) for example this one or this other one from PyImageSearch.
Hope you are doing well.
I am trying to build a following robot which follows a person.
I have a raspberry pi and and a calibrated stereo camera setup.Using the camera setup,i can find depth value of any pixel with respect to the reference frame of the camera.
My plan is to use feed from the camera to detect person and then using the stereo camera to find the average depth value thus calculating distance and from that calculate the position of the person with respect to the camera and run the motors of my robot accordingly using PID.
Now i have the robot running and person detection using HOGdescriptor that comes opencv.But the problem is,even with nomax suppression, the detector is not stable to implement on a robot as too many false positives and loss of tracking occurs pretty often.
So my question is,can u guys suggest a good way to track only people. Mayb a light NN of some sort,as i plan to run it on a raspberry pi 3b+.
I am using intel d435 as my depth camera.
TIA
Raspberry pi does not have the computational capacity to perform object detection and realsense driver support, check out the processor load once you start the realsense application. One of the simplest models for person detection is opencv's HOGdescripto that you have used.
You can use pretrained model. Nowadays there's plenty of them to choose from. There're also lighter versions for mobile devices. Check this blog post. It's also worth to check TensorFlow Lite. Some architectures will give you boundig boxes, some masks. Guess you'd be more interested in masks.
I'm trying to get the black region from an image using TensorFlow. To this point I was using OpenCV but it fails to get the hole region given that the gray scale is very complicated.
The image I'm using is a photo of a electric meter, the whole meter is white(normally) except for the part with the numbers that is black. I would want to isolate this part in order to get the numbers later on.
To de the date, I have been using the function findContours from OpenCV, with a defined threshold .
I have seen that TensorFlow is very potent so I think this could no be a problem, but I can't find any documentation. Any hints? Thanks!
Tensorflow is a general purpose math library that is unique in two respects:
It provides automatic differentiation.
It has efficient kernels built to run on either the CPU or GPU.
It does have a library of image functions, but it's nowhere near as extensive as OpenCV, and will never be. Those are mostly for data augmentation (as it pertains to ML) and data loading.
Note that you can run OpenCV code on the GPU in many cases (I'm not sure about findContours in particular. So sticking with OpenCV should be considered.
But within tensorflow you would have to re-write that function yourself. In looking at the code (which I provided a link to in your question) it doesn't look very hard to do. You could replicate that in symbolic tensorflow operations in relatively short order, but nothing like that exists pre-built in tensorflow. Nor is it likely to in the future.
I have a decent amount of experience with OpenCV and am currently familiarizing myself with stereo vision. I happen to have two JeVois cameras (don't ask why) and was wondering if it was possible to run some sort of code on each camera to distribute the workload and cut down on processing time. It needs to be so that each camera can do part of the overall process (without needing to talk to each other) and the computer they're connected to receives that information and handles the rest of the work. If this is possible, does anyone have any solutions or tips? Thanks in advance!
To generalize the stereo-vision pipeline (look here for more in-depth):
Find the intrinsic/extrinsic values of each camera (good illustration here)
Solve for the transformation that will rectify your cameras' images (good illustration here)
Capture a pair of images
Transform the images according to Step 2.
Perform stereo-correspondence on that pair of rectified images
If we can assume that your cameras are going to remain perfectly stationary (relative to each other), you'll only need to perform Steps 1 and 2 one time after camera installation.
That leaves you with image capture (duh) and the image rectification as general stereo-vision tasks that can be done without the two cameras communicating.
Additionally, there are some pre-processing techniques (you could try this and this) that have been shown to improve the accuracy of some stereo-correspondence algorithms. These could also be done on each of your image-capture platforms individually.