Parameters of dlib shape predictor model - python

I trained dlib shape predictor model on my custom data (using train_shape_predictor.py file). As a result I got .dat file. Now I have an image containing an object on which the dlib prediction model has been trained. How I will use that prediction model, to predict a shape in the input image?
I am seeing Dlib shape prediction documentation, there is mentioned that dlib shape predictor accepts two arguments :
An image
A box (Dlib Rectangle)
Now what will be these parameters, in my case, as I have just one image (Containing an object, which will be predicted through trained model)?
Any sort of help in that regard will be highly appreciated.

As the document says:
image is a numpy ndarray containing either an 8bit grayscale or RGB
image. --> Pass your image here
box is the bounding box to begin the shape prediction inside. --> if you already have the bounding box of your object (e.g. where about a face is in the image), pass it here.
A typical application would be:
rects = dlib.simple_object_detector(image)
for rect in rects:
shape = dlib.shape_predictor(image, rect)

You can use any object detector for finding the bounding boxes. As Quang said, it's correct for dlib.
You can also use OpenCV's detector. However, you need to keep in mind that dlib's rectangle four coordinate and OpenCV rectangle representation is different.
After getting a bounding box from Opencv, do this.
d_rect = dlib.rectangle(left= o_rect[0] , top=o_rect[1], right=o_rect[2], bottom=o_rect[3])
where o_rect is OpenCV rectangle.

Related

Resize COCO bounding boxes for object detection

So here is my first question here. I am preparing a dataset for object detection. I have done the following things so far:
I have an original picture (size w4000 x h3000).
I used the annotation platform Roboflow to annotate it in COCO format, with close to 250 objects in the picture.
Roboflow returned a downscaled picture (2048x1536) with a respective json file with the annotations in COCO format.
Then, to obtain a dataset from my original picture (as I have a lot of objects and the picture is big enough), I decided to tile the original picture in patches of 224x224. For this purpose, I upscaled a bit (4032x3136) to be able to slice it properly, obtaining 252 pictures.
QUESTIONS
How can I resize the bounding boxes of the Roboflow 2048x1536 picture to my original picture (4032x3136)?
Once the b.boxes are resized to my original size picture, how can I resize them again, adapting the size to each of my patches (224x224) created by slicing the original picture?
Thank you!!
It sounds like the ultimate goal is to have tiled 224x224 images from the source 4032x3136 images with bounding boxes correctly updated.
In Roboflow, at least, you can add tiling as a preprocessing step to your original 4032x3136 images. The images will be broken into the number of tiles you select (2x2, 3x3, NxY). The bounding boxes will be correctly updated to cover the objects across each individual tile as well.
To reimplement in code from what you've described, you would need to:
Upscale your 2048x1536 images to 4032x3136
Scale the bounding boxes accordingly
Break the images into 224x224 tiles using something like Pil
Update the annotations to be broken into the coordinates on the respective tiles; one annotation per tile

How to convert tensor output of image segmentation model to image? [duplicate]

This question already has an answer here:
Image.fromarray just produces black image
(1 answer)
Closed 1 year ago.
I am trying the code for image segmentation for self driving cars using Berkley deep drive dataset, I trained the model, and tested an image on it, got an output in tensor format(the segmented image), but I need it in image format, I tried Image.fromarray function, got the below output:
And the actual image is shown below:
The model I am using is from this git repo.
If I understand correctly your tensors are the result of the model prediction and the underlining model is U-net. If that is the case, those tensors should represent segmentation masks. If the image used for prediction is of size 512x512 (depends of model architecture) then the predicted tensor size by U-net will be k x 512 x 512 i.e k segmentation masks per image. You have to overlay these masks on the image with lighter colors to see how the predicted image is segmented by the model. So you need access to the image you used for prediction to do this.
Since you are using fast.ai apis I recommend checking the code of show_results method of learner object to see how they render the output. This should be a good starting point.
For tensorflow models I used:
from PIL import Image
prediction = np.squeeze(prediction)
r = prediction * 255
im = Image.fromarray(r.astype('uint8'), 'L')
im = im.resize(original.size)

How to change yolo bounding style from (x,y,w,h) to (x,y,w,h,a)?

I'm training yolo model, I have the bounding boxes in this format(x,y,w,h,a)=>ex (197.996317 669.721413 390.070453 29.258397 7.696052) the "a" is the angle of the bounding box , I want to implement the angel to bounding box format and train the yolo network
I already calculated bounding box and his angle but I don't know how to implement that to the yolo network style.
How can I implement angle to yolo bounding box style?
Well 60 images is really not - You dont have enough combination - explaining why u need to retrain.
Read this here https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
It also applies to the original repository.

How to use boundary boxes with images for multi label image training?

I am working on machine learning for image classification and managed to get several projects done successfully. All projects had images which always belongs to one class. Now I want to try images with multiple labels on each image. I read that I have to draw boxes (boundary boxes) around images for training.
My question is
Do I have to crop those areas into single images and use them as before for training?
Drawn boxes are only used to cropping?
Or do we really feed the original images and box coordinates (top left[X, Y], width and height) to training?
Any tutorials to materials related to this are appreciated.
Basically, you need to detect various objects in an image which belong to different classes. Here's where Object Detection comes in the picture.
Object Detection tries to classify labels for various objects in an
image and also predict the bounding boxes.
There are many algorithms for object detection. If you are a seasoned TensorFlow user, you can directly use the TensorFlow Object Detection API. You can select the architecture you need and feed the annotations along with the images.
To annotate the images ( drawing bounding boxes around boxes and storing them separately ), you can use LabelImg tool.
You can refer to these blogs:
Creating your own object detector
A Step-by-Step Introduction to the Basic Object Detection Algorithms
Instead of training a whole new object detector, you can use a pretrained object detector available. The TensorFlow Object Detection model can classify 80 objects. If the objects you need to classify are included in these objects, then you get a ready-to-build model. The model draws a bounding box around the object of your interest.
You can crop this part of the image and build a classifier on it, according to your needs.

How to feed bounding boxes in Regression head in CNN for object detection?

I am working on cropping the document Images from the center. I used pretrained model of VGG net and extract features from the images from the last convolutional layer.
I also get all possible bounding box from the image and also ground truth bounding boxes.
Here is the details:Total 25 Images(Demo purpose)
The Feature map size: (25,512,14,14)
All bounding boxes size: (25,)
for ex. one image has 55167 bounding box , so (55167,4) (This is vary in some images)
True bounding boxes: (25,4)
Now How do I feed this in network ?
I also gone through some of the papers and resources. I don't want a classification layer , I only want box coordinates as a result.
I am using keras library.
You should consider using a localization network, not only classification. This repository supports also the 2 stages training to save you some training time.

Categories