Change YOLO algorithm's backbone - python

Is there any way to change the YOLO algorithm's feature extractor(backbone) with DenseNet?
I saw that Circular Bounding Boxes could be used instead of Rectangular Bounding Boxes. How would you change the data set format,training and detection parts? Is there any tutorial about this?
Dataset for circular bounding boxes would be (class name, x, y, radius) and will have 4 members. If we train like that, will it work? Is there any tool for this type of annotation?

Related

Data augmentation for Tensorflow Object Detection API with polygon bounding box

I want to generate a lot of data from my dataset and save on disk. Then use this generated data for object detection. Is there any way to generate data with polygons bounding boxes and then forward them for object detection task?
Yes, there are many annotation tools that support polygon bounding boxes. Here are some:
https://github.com/ryouchinsa/Rectlabel-support
https://github.com/buni-rock/Pixie
https://github.com/UniversalDataTool/react-image-annotate

Predicting the surface of the car using its 2d bbox and plate bbox

I'm trying to solve an interesting problem w/o using GPU intensive model in inference time. (No Deep Learning)
Input: 2D Image which contains car(s) in it, with accurate bboxes, and also a bbox of the plate's car. (We also know that the cameras are located just a bit above the cars)
Output: Surface of the car prediction (the bottom side of a cuboid in 3d bbox)
Approach 1: I'm trying to leverage the fact that I have some prior knowledge except the 2d bbox of the car, but also the 2d bbox of the plate, which can give me the orientation of the car, I thought about taking an angle between the center bbox of the car and the center bbox of the 2d plate to understand what is the direction the car is facing at.
After I know the direction the car is facing to, I also can roughly know where should be one of the edges of the surface because of the fact that the 3d bbox is bounded by the 2d bbox (thus the surface is also bounded), and the fact that the 2d bbox of the plate is a few pixels far from the surface, so one of the edges of the surface can be estimated.
But, the problem here is determining the lateral edges, how 'long' should they be. I'm not quite sure how to estimate the lateral sides of the bottom surface, but I think it can be somehow inferred by the size of the 2d bbox of the car (which again, should bound that surface). Maybe I'll be able to solve it after finding the edge of the surface, and then exploring ways to infer the lateral edges of that surface.
Approach 2: Annotating the data with 3d bboxes with a pre-trained model, and trying to predict the 3d bbox from a 2d bbox (and probably some more priors like 2d bbox of the plate), but I'm not using a deep model to do it, but a simple NN with a few layers to predict the 3d bbox. (trained in a supervised manner)
Using Deep learning-based object detection methods is tend to achieve a really high detection
accuracy. Deep neural network is a trend to improve the accuracy of bounding box, designing a reasonable
regression loss function is also an important way. So, if you are considering accuracy as an important factor on the project you may need to consider using deep learning.
But if the accuracy doesn't matter that much and you really prefer not to use deep learning then you can use other simple ways.
The conventional 2D object detection yields 4 degrees of freedom (DoF) axis-aligned bounding boxes with center (x, y) and 2D size (w, h), the 3D bounding boxes in autonomous driving context generally have 7 DoF: 3D physical size (w, h, l), 3D center location (x, y, z) and yaw. Note that roll and pitch are normally assumed to be zero. Now the question is, how do we recover a 7-DoF object from a 4-DoF one?
You can find a solution and approach explanation based on this research, but it is a little bit complex since it came from a research.
In your 2nd Approach:
"Annotating the data with 3d bboxes with a pre-trained model"
You can try that, then putting all the work for the 3D bbox creation during inference. This is too specific and very complex problem to answer directly, even more without deep learning. But I hope my answer can help a bit.
Here is another approach I can share just in case you want to consider:
You can also train your own model that has different classes for each direction of the car. It actually may take you a lot of time to prepare the dataset for it. Using that model, you can easily detect car direction.
By that you may able to let a specific function to create a 3D bbox based on that car-direction detected. Although I cannot recommend this approach if you do not prefer making your own annotated dataset since it really takes a lot of time.
You can use OpenCV for creating the 3D bbox by getting the specific values you'll need from the 2D bbox.
But do take note that it will not provide you the best accuracy. It's still the best way to use Deep Learning instead for better accuracy. You can find a lot of implementation of this in the net.

How to change yolo bounding style from (x,y,w,h) to (x,y,w,h,a)?

I'm training yolo model, I have the bounding boxes in this format(x,y,w,h,a)=>ex (197.996317 669.721413 390.070453 29.258397 7.696052) the "a" is the angle of the bounding box , I want to implement the angel to bounding box format and train the yolo network
I already calculated bounding box and his angle but I don't know how to implement that to the yolo network style.
How can I implement angle to yolo bounding box style?
Well 60 images is really not - You dont have enough combination - explaining why u need to retrain.
Read this here https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
It also applies to the original repository.

How to use boundary boxes with images for multi label image training?

I am working on machine learning for image classification and managed to get several projects done successfully. All projects had images which always belongs to one class. Now I want to try images with multiple labels on each image. I read that I have to draw boxes (boundary boxes) around images for training.
My question is
Do I have to crop those areas into single images and use them as before for training?
Drawn boxes are only used to cropping?
Or do we really feed the original images and box coordinates (top left[X, Y], width and height) to training?
Any tutorials to materials related to this are appreciated.
Basically, you need to detect various objects in an image which belong to different classes. Here's where Object Detection comes in the picture.
Object Detection tries to classify labels for various objects in an
image and also predict the bounding boxes.
There are many algorithms for object detection. If you are a seasoned TensorFlow user, you can directly use the TensorFlow Object Detection API. You can select the architecture you need and feed the annotations along with the images.
To annotate the images ( drawing bounding boxes around boxes and storing them separately ), you can use LabelImg tool.
You can refer to these blogs:
Creating your own object detector
A Step-by-Step Introduction to the Basic Object Detection Algorithms
Instead of training a whole new object detector, you can use a pretrained object detector available. The TensorFlow Object Detection model can classify 80 objects. If the objects you need to classify are included in these objects, then you get a ready-to-build model. The model draws a bounding box around the object of your interest.
You can crop this part of the image and build a classifier on it, according to your needs.

Parameters of dlib shape predictor model

I trained dlib shape predictor model on my custom data (using train_shape_predictor.py file). As a result I got .dat file. Now I have an image containing an object on which the dlib prediction model has been trained. How I will use that prediction model, to predict a shape in the input image?
I am seeing Dlib shape prediction documentation, there is mentioned that dlib shape predictor accepts two arguments :
An image
A box (Dlib Rectangle)
Now what will be these parameters, in my case, as I have just one image (Containing an object, which will be predicted through trained model)?
Any sort of help in that regard will be highly appreciated.
As the document says:
image is a numpy ndarray containing either an 8bit grayscale or RGB
image. --> Pass your image here
box is the bounding box to begin the shape prediction inside. --> if you already have the bounding box of your object (e.g. where about a face is in the image), pass it here.
A typical application would be:
rects = dlib.simple_object_detector(image)
for rect in rects:
shape = dlib.shape_predictor(image, rect)
You can use any object detector for finding the bounding boxes. As Quang said, it's correct for dlib.
You can also use OpenCV's detector. However, you need to keep in mind that dlib's rectangle four coordinate and OpenCV rectangle representation is different.
After getting a bounding box from Opencv, do this.
d_rect = dlib.rectangle(left= o_rect[0] , top=o_rect[1], right=o_rect[2], bottom=o_rect[3])
where o_rect is OpenCV rectangle.

Categories