Train A Custom Object Detection Model with YOLO v5 - python

I'm trying to train a model with Yolo v5 to detect multiple objects on sales flyers. each image in the dataset used in training contains only one object and obviously a single bounding box.
I'm wondering if that will affect the performance of the model in a bad way? because what I'm trying to do at the end is detecting multiple objects on each sale flyer.
Thank you for your help.

It probably will lower your AP if you work like this, but give it a try. It really depends on your training images and your data augmentations. I dont know about YOLOv5 but YOLOv4 has its Mosaic Data Augmentation which will adress your problem in a way i guess.

Related

The model cannot be trained well by MASK RCNN

I am using MRCNN in python to train 20 images (with annotated images info saved as json file) for object detection. The problem is that at the best case the loss is around 4 which it shows that the model has not learned well (the loss fluctuates a lot during the learning process for each epochs). Obviously, when using the trained model for detection the result is wrong, it means that it cannot detect the object and randomly selects some pixels as the object.
Can someone kindly help me how I can improve the performance and also some hints about initial weights if the object is not one of the objects in COCO database.
Thanks in advance.

Object Detection - Mask RCNN

I have a project in which I need firstly to detect if the image is fake or not and if it is fake to detect the object. For the first part, I am using ELA and CNN in order to detect if the image is forged or not, but for the object detection I need to use Mask R-CNN, but unfortunately I have a problem understanding how to use it. I am using the CASIA v2 dataset and I have the ground truth masks for all the forged images.
I saw that every model online is using the model COCO for the mask RCNN, but I need the model to be trained on my dataset. Also, I saw that I need a list of labels, but for my project I only need to display fake on the detected object, is it alright if in the label.txt I will only write "Fake"?
Also, I am a little bit new to Deep Learning, so any help is useful.

How to get better results when using the TensorFlow object detection API for a custom trained model to recognize hand poses?

in my project I need to train a object detection model which is able to recognize hands in different poses in real-time from a rgb webcam.
Thus, I'm using the TensorFlow object detection API.
What I did so far is training the model based on the ssd_inception_v2 architecture with the ssd_inception_v2_coco model as finetune-checkpoint.
I want to detect 10 different classes of hand poses. Each class has 300 images which are augmented. In total there are 2400 images in for training and 600 for evaluation. The labeling was done with LabelImg.
The Problem is that the model isn't able to detect the different classes properly. Even if it still wasn't good I got much better results by training with the same images, but only with like 3 different classes. It seems like the problem is the SSD architecture. I've read several times that SSD networks are not good in detecting small objects.
Finally, my questions are the following:
Could I receive better results by using a faster_rcnn_inception architecture for this use case?
Has anyone some advice how to optimize the model?
Do I need more images?
Thanks for your answers!

tensorflow object detection trained model not working

I trained my dataset for tensorflow object detection using both ssd and faster r-cnn model.There were 220 train and 30 test images in my dataset.
I trained the model for 200k steps and got loss under 1.But when i tested my trained model on video it was detecting and labelling almost everything in the video.
Can anyone tell me why is that happening?
Thank you
The number of classes you are using is just one and you trained your model with images belonging to the same class and tested it for the same.
So the problem is the model is skewed(predicts the same for all images)
No matter whatever image you test it on, you will get the same output.
Solution:
Train you model with an nearly equal number of negative images.
Ex:220 images containing the object to be identified(label them as 1) and another nearly 220 images not containing the object(label them as 0)
Use F1 score to check your accuracy because it will help you understand if the dataset is skewed or not.
Check this to learn about different kinds of accuracy measures.
Take this course to learn more about CNNs.

How to do object detection using CNN's features in tensorflow?

I am trying to make an end to end unified model that detects(localizes) the object in an image. The object itself can be of many types, like "text in the wild", but the surrounding features of the object should determine where the region of interest is.
Like detecting a human face, without considering the features of the face itself. i.e its some rage distance about the neck.
I'm expecting the output to be coordinates of the object, or like the image-net format to generate bounding boxes like : [xmin , ymin , xmax, ymax]
I have a data-set of 500 images. Are there any examples of object detection in tensorflow based on surrounding features. i.e the feature maps from conv1 or conv2. ?
There is Tensorflow based framework for object detection/localization that you can check out:
https://github.com/Russell91/TensorBox
Though, I am not sure that 500 images would be enough to successfully retrain provided model(s).
Object detection using deep learning is broadly classified in to one-stage detectors (Yolo,SSD) and two stage detectors like Faster RCNN. Google's repo[1] contains pre-trained models for various detection architectures.
You could pick up a pre-trained model and then train it on your dataset. The two-stage model is modular and you have a choice of different feature extractors depending on whether speed/accuracy is crucial for you.
[1] Google's object detection repository

Categories