I have been using Labellmg to create the xml files in PASCAL VOC format. There is a prebuilt binary version which makes it really easy to start drawing bounding boxes around objects in the images.
https://github.com/tzutalin/labelImg
At times, the bounding boxes end up covering unwanted pixels. I am looking for a much more accurate tool on windows which can help me draw polygons or use any sort of point to point tool to annotate the object around the edges. I came across one which can be used on the Mac OS X
https://rectlabel.com/
Also, is it true that Tensorflow object detection API only supports bounding boxes annotations?
Related
I'm working on a project where I'd like to use mask RCNN to identify objects in a set of images. But, I'm having a hard time understanding how bounding boxes(encoded pixels) are created for the ground truth data. Can anyone point me in the right direction or explain this to me further?
Bounding boxes are typically labeled by hand. Most deep-learning people use a separate application for tagging. I believe this package is popular:
https://github.com/AlexeyAB/Yolo_mark
I developed my own RoR solution for tagging, because it's helpful to distribute the work among several people. The repository is open-source if you want to take a look:
https://github.com/asfarley/imgclass
I think it's a bit misleading to call this 'encoded pixels'. Bounding boxes are a labelled rectangle data-type, which means they are entirely defined by the type (car, bus, truck) and the (x,y) coordinates of the rectangle's corners.
The software for defining bounding-boxes generally consists of an image-display element, plus features to allow the user to drag bounding-boxes on the UI. My application uses a radio-button list to select the object type (car, bus, etc); then the user draws a bounding-box.
The result of completely tagging an image is a text-file, where each row represents a single bounding-box. You should check the library documentation for your training algorithm to understand exactly what format you need to input the bounding boxes.
In my own application, I've developed some features to compare bounding-boxes from different users. In any large ML effort, you will probably encounter some mis-labelled images, so you really need a tool to identify this because it can severely degrade your results.
I have drawn simple pattern of geometrical shapes on a paper and placed it one a object as marker. I'm able to detect and analyze pattern successfully. However when object moves a little faster the motion blur is introduced which can be rotational or linear. This way detected regions overlap e.g. a strip of arrows moving in direction of arrows, is detected as a single line after introduction of motion blur. Therefore I need to fix it somehow. So I can detect individual arrows and analyze them.
Below are images of markers with and without motion blur.
Is there any python module or open source implementation that can be used to solve it?
Motion can be in any direction at any speed so PSF is not known and required for Wiener, Lucy-Richardson methods.
Also it is a realtime tracking problem so I need something that executes fast.
P.S. I'm using Python 2.7 and Opencv 3
This problem can be solved by limiting the exposure time of your camera. This can be done using opencv by using:
cap.set(cv2.CAP_PROP_EXPOSURE,40)
or using the v4l2-ctl command line utility.
first step is to check whether camera is suitable for opencv properties such as
CAP_PROP_FRAME_WIDTH
CAP_PROP_FRAME_HEIGHT
in order to check camera suitability
second step is to is use CV_CAP_PROP_EXPOSURE like
cap.set(cv2.CAP_PROP_EXPOSURE, 40)
value can be change accordingly to avoid motion blur
Imagine someone taking a burst shot from camera, he will be having multiple images, but since no tripod or stand was used, images taken will be slightly different.
How can I align them such that they overlay neatly and crop out the edges
I have searched a lot, but most of the solutions were either making a 3D reconstruction or using matlab.
e.g. https://github.com/royshil/SfM-Toy-Library
Since I'm very new to openCV, I will prefer a easy to implement solution
I have generated many datasets by manually rotating and cropping images in MSPaint but any link containing corresponding datasets(slightly rotated and translated images) will also be helpful.
EDIT:I found a solution here
http://www.codeproject.com/Articles/24809/Image-Alignment-Algorithms
which gives close approximations to rotation and translation vectors.
How can I do better than this?
It depends on what you mean by "better" (accuracy, speed, low memory requirements, etc). One classic approach is to align each frame #i (with i>2) with the first frame, as follows:
Local feature detection, for instance via SIFT or SURF (link)
Descriptor extraction (link)
Descriptor matching (link)
Alignment estimation via perspective transformation (link)
Transform image #i to match image 1 using the estimated transformation (link)
I am trying to detect a vehicle in an image (actually a sequence of frames in a video). I am new to opencv and python and work under windows 7.
Is there a way to get horizontal edges and vertical edges of an image and then sum up the resultant images into respective vectors?
Is there a python code or function available for this.
I looked at this and this but would not get a clue how to do it.
You may use the following image for illustration.
EDIT
I was inspired by the idea presented in the following paper (sorry if you do not have access).
Betke, M.; Haritaoglu, E. & Davis, L. S. Real-time multiple vehicle detection and tracking from a moving vehicle Machine Vision and Applications, Springer-Verlag, 2000, 12, 69-83
I would take a look at the squares example for opencv, posted here. It uses canny and then does a contour find to return the sides of each square. You should be able to modify this code to get the horizontal and vertical lines you are looking for. Here is a link to the documentation for the python call of canny. It is rather helpful for all around edge detection. In about an hour I can get home and give you a working example of what you are wanting.
Do some reading on Sobel filters.
http://en.wikipedia.org/wiki/Sobel_operator
You can basically get vertical and horizontal gradients at each pixel.
Here is the OpenCV function for it.
http://docs.opencv.org/modules/imgproc/doc/filtering.html?highlight=sobel#sobel
Once you get this filtered images then you can collect statistics column/row wise and decide if its an edge and get that location.
Typically geometrical approaches to object detection are not hugely successful as the appearance model you assume can quite easily be violated by occlusion, noise or orientation changes.
Machine learning approaches typically work much better in my opinion and would probably provide a more robust solution to your problem. Since you appear to be working with OpenCV you could take a look at Casacade Classifiers for which OpenCV provides a Haar wavelet and a local binary pattern feature based classifiers.
The link I have provided is to a tutorial with very complete steps explaining how to create a classifier with several prewritten utilities. Basically you will create a directory with 'positive' images of cars and a directory with 'negative' images of typical backgrounds. A utiltiy opencv_createsamples can be used to create training images warped to simulate different orientations and average intensities from a small set of images. You then use the utility opencv_traincascade setting a few command line parameters to select different training options outputting a trained classifier for you.
Detection can be performed using either the C++ or the Python interface with this trained classifier.
For instance, using Python you can load the classifier and perform detection on an image getting back a selection of bounding rectangles using:
image = cv2.imread('path/to/image')
cc = cv2.CascadeClassifier('path/to/classifierfile')
objs = cc.detectMultiScale(image)
Hi I am wanting to use the python imaging library to crop images to a specific size for a website. I have a problem, these images are meant to show people's faces so I need to automatically crop based on them.
I know face detection is a difficult concept so I'm thinking of using the face.com API http://developers.face.com/tools/#faces/detect which is fine for what I want to do.
I'm just a little stuck on how I would use this data to crop a select area based on the majority of faces.
Can anybody help?
Joe
There is a library for python that have a concept of smart-cropping that among other options, can use face detection to do a smarter cropping.
It uses opencv under the hood, but you are isolated from it.
https://github.com/globocom/thumbor
If you have some rectangle that you want to excise from an image, here's what I might try first:
(optional) If the image is large, do a rough square crop centered on the face with dimensions sqrt(2) larger than the longer edge (if rectangular). Worst-case (45° rotation), it will still grab everything important.
Rotate based on the face orientation (something like rough_crop.rotate(math.degrees(math.atan(ydiff/xdiff)), trig is fun)
Do a final crop. If you did the initial crop, the face should be centered, otherwise you'll have to transform (rotate) all your old coordinates to the new image (more trig!).