I am new to using python and object detection and need some help for a uni project!
1- I am wandering if it is possible to use the same haar cascade file i used to train the pi to detect a specific object in a still image (using the picamera) for detecting the same image in a live video feed as my uni project requires the robot to search and find the specific object and retrieve it or do is it a completely separate process (i.e. setting up classes etc)? if so, where could I i find the most helpful information on doing this as I haven't been too successful when looking online. (pyimagesearch has a blog post on this but goes about using classes and i'm not sure how to go about even creating a class just for a specific object or if you even need one..)
2- Currently, my pi can kind of detect the object (a specific cube) in the still images but isnt very accurate or consistent, it often detects around the edges of the object as well as incorrectly detecting other things in the background or shadows that are part of the image, as the object as well. It tends to find more than one of the cube (lots of small rectangles mostly around - in close proximity) and in the object so it says its detecting 2+ cubes in an image (sometimes 15-20 or more) rather than just the one cube. I am wandering how I could reduce this error and increase the accuracy and consistency of the pi so it detects just the one, or at least doesn't wrongfully detect background shadows or other things in the image? I understand that lighting affects the result but I am wandering if its due to the quality of the original image i took with the picamera of the cube I used to train the haar cascade (it was quite a dark photo due to insufficient lighting), or maybe the size of the image itself (cropped it down to the edges so it is just the cube and resized it to 50x50), or maybe that I didnt train more than one image of the object against negatives when training the cascade file..? do the images you supply for training the pi have to taken with the picamera or could you take clearer pictures say with your phone and use them to train the cascade for detection via the picamera?I tried upgrading the resolution in the code but that made the data analysis take too long and didnt make much difference. Apologies for the long post as I am new to all this and wandering if theres any way to improve the results or is the only way for higher accuracy to retrain the cascade which I'd rather not do as it took two days to complete due to working with a pi zero W board!
Much Appreciated!
My specs: A raspberry pi Zero W board with a 16gb SD Card on Mac, running openCV 3.2 and Python 2.7.
Code for object detection in an image taken using the pi camera:
import io
import picamera
import cv2
import numpy as np
stream = io.BytesIO()
with picamera.PiCamera() as camera: camera.resolution = (320, 240) camera.capture(stream, format='jpeg')
buff = numpy.fromstring(stream.getvalue(), dtype=numpy.uint8)
image = cv2.imdecode(buff, 1)
cube_cascade = cv2.CascadeClassifier('/home/pi/data/cube1-cascade-10stages.xml')
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
Cube = cube_cascade.detectMultiScale(gray, 1.1, 5)
print "Found "+str(len(Cube))+" cube(s)"
for (x,y,w,h) in Cube: cv2.rectangle(image, (x,y), (x+w,y+h), (255,255,0), 2)
cv2.imwrite('result.jpg',image)
Thanks in advance.
Related
I am just starting with computer vision and I do not have much experience with this area. Therefore sorry for little bit generic question but I am not sure how to start and go in the correct direction.
Like in the title. I am building a system which is able to capture the image from the camera and I would like to detect if the 2 lines of stitches / seams are parallel to each other and if the gap between the lines is in specified limits / threshold. See below sample picture:
Can those lines be detected by some functions in open cv or do I need to use machine learning approach and built a model which will recognize the single stitch on the picture and then based on the detection draw new lines and perform calculations?
I am implementing motion based object tracking program which is using background substraction, Kalman Filter and Hungarian algorithm. Everything is working fine except the occlusions. When two object are close enough to each other the background substraction recognizes it as one of these two objects. After they split the program recognizes these two objects correctly. I am looking for solution/algorithm which will detect occlusion like shown in point c) in the example below.
I will appreciate any references or code examples reffering to occlusion detecion problem when using background substraction.
Object detection using a machine learning algorithm should reliably distinguish between these objects, even with significant occlusion. You haven't shared anything about your environment so I don't know what kind of constraints you have, but using an ML approach, here is how I would tackle your problem.
import cv2
from sort import *
tracker = Sort() # Create instance of tracker (see link below for repo)
cap = cv2.VideoCapture(0)
while(True):
ret, frame = cap.read()
# Not sure what your environment is,
#but get your objects bounding boxes like this
detected_objects = detector.detect(frame) #pseudo code
# Get tracking IDs for objects and bounding boxes
detected_objects_with_ids = tracker.update(detected_objects)
...
The above example uses this Kalman Filter and Hungarian algorithm, which can track multiple objects in real-time.
Again, not sure about your environment, but you could find pre-built object detection algorithms on the Tensorflow site.
I'm trying to learn computer vision and more specifically open-cv in python.
I want to make a program that would track my barbell in a video and show me its path. (I know apps like this exists but I want to make it myself). I tried using the Canny edge detection and the HoughCircles functions but I seem to get everything but a good result.
I have been using this code to find the edges of my image:
gray = cv.cvtColor(src=img, code=cv.COLOR_BGR2GRAY)
blur = cv.blur(gray, (2,2))
canny = cv.Canny(blur, 60, 60)
And then this code to find the circle:
circles = cv.HoughCircles(canny, cv.HOUGH_GRADIENT, dp=2, minDist=1000, circles=None,maxRadius=50)
This is the result:
Result
left = original image with detected circle // right = canny image
Is this the right way to go or should I use another method?
Train the YOLO model for the barbell to detect barbel object is better than anything you tried with OpenCV. You need at least 500 images. Those images can be found on the internet easily. This tutorial is kick start tutorial on YOLO. Let's give a try.
If you tweak the parameters of HoughCircles it may recognize the barbell [EDIT: but with more preprocessing, gamma correction, blurring etc., so better not], however OpenCV has many algorithms for such object tracking - only a region from the image has to be specified first (if that's OK).
In your case the object is always visible and is not changing much, so I guess many of the available algorithms would work fine.
OpenCV has a built-in function for selection:
initBB = cv2.selectROI("Frame", frame, fromCenter=False, showCrosshair=True)
See this tutorial for tracking: https://www.pyimagesearch.com/2018/07/30/opencv-object-tracking/
The summary from the author suggestion is:
CSRT Tracker: Discriminative Correlation Filter (with Channel and Spatial Reliability). Tends to be more accurate than KCF but slightly slower. (minimum OpenCV 3.4.2)
Use CSRT when you need higher object tracking accuracy and can tolerate slower FPS throughput
I guess accuracy is what you want, if it is for offline usage.
Can you share a sample video?
What's your problem exactly? Why do you track the barbell? Do you need semantic segmentation or normal detection? These are important questions. Canny is a very basic approach It' needs a very stable background to use it. That's why there is deep learning to handle that kind of problem If we need to talk about deep learning you can use MaskRCNN, yolvoV4, etc. there are many available solutions out there.
I've been using the dlib library to detect faces, testing on both Python and DotNet wrappers. Both produce the same bizarre issue: poor (low image quality, poor lighting, etc.) images pulled from the internet are almost always detected while photos shot from my iPhone with high quality/lighting are almost never detected.
I am completely certain that the faces shot on my iPhone are of higher quality than those pulled from the internet, yet regardless of how many times I change the lighting and camera angle of my iPhone shots the results are always the same. An occasional iPhone face is detected while the large majority of perfectly acceptable face shots go undetected. I'm at a complete loss, any ideas are much appreciated.
A sample of my relatively straightforward code is below.
from face_recognition import load_image_file, face_locations
faces = face_locations(load_image_file("foo.jpg"))
face_detected = len(faces) >= 1
I figured out a solution and figured I'd share it in case anyone has this specific problem in the future. Converting all images to grayscale fixed literally every undetected face in my dataset. I wrote this in C# instead of Python, but I will share anyway. It can be done using the Emgu.CV library and the following code:
Image<Bgr, byte> emguImage = new Image<Bgr, byte>(path_to_image);
Image<Gray, byte> grayscaleEmguImage = emguImage.Convert<Gray, byte>().Clone();
Bitmap grayscaleBitmapImage = grayscaleEmguImage.ToBitmap();
It's fast and it works.
Could you try deepface as well?
#!pip install deepface
from deepface import DeepFace
img = DeepFace.detectFace("foo.jpg")
It wraps several face detectors actually. Its default detector is mtcnn but you can still set the detector you want.
detectors = ['mtcnn', 'opencv', 'ssd', 'dlib']
DeepFace.detectFace("foo.jpg", detector_backend = detectors[0])
detectFace function return exception if a face could not be detected.
I have a video file of evening time ( 6pm-9pm). And I want to detect movement of people on the road.
While trying to find the difference between a handful of images from "10 minute" time frame videos (10 equally time spaced images within any 10 minutes video frame clip) I'm facing these challenges:
All the images are coming as different (coming as Alert) because there is some plant moving due to wind all the time.
All the 10 images are coming different also because the sun is setting down and hence due to "natural light variation" the 10
images from 10 minute frames after coming different even though
there is no public/human movement.
How do I restrict my algorithm to focus only on movements ion certain area of the video rather than all of it ? (Couldn't find
anything on google or dont know if there's any algo in opencv for this)
This one is rather difficult to deal with. I recommend you try to blur the frames a little bit to reduce the noises from moving plants. Also, if the range of the movement is not so large, try changing the difference threshold and area threshold (if your algorithm contains contour detection as the following step). Hope this can help a little bit.
For detecting "movement" of people, a (10 frame/10 min) fps is a little too low. People in the frames can be totally different. This means you cannot detect the movement of a single person, but to find the differences between two frames. In the case where you are using low fps videos, I recommend you try Background Subtraction, to find people in the frames instead of people movements between the frames. For Background Subtraction, to solve
All the 10 images are coming different also because the sun is setting down and hence due to "natural light variation" the 10 images from 10 minute frames after coming different even though there is no public/human movement.
you can try using the average image of all frames as the background_img in
difference = current_img - background_img
If the time span is longer, you can use the average of images more recent to current_img as background_img. And keep updating background_img when running the video.
If your ROI is a rectangle in the frame, use
my_ROI = cv::Rect(x, y, width, height)
cv::Mat ROI_img= frame(my_ROI)
If not, try using a mask.
I think what you are looking for is a Pedestrian Detection. You can do this easily in Python with OpenCV package.
# Initialize a HOG descriptor
hog = cv2.HOGDescriptor()
# Set it for Pedestrian Detection
hog.setSVMDetector(cv2.HOGDescriptor_getDefaultPeopleDetector())
# Then use the detector
hog.detectMultiScale()
Exemple : Pedestrian Detection OpenCV