Object Detection - Mask RCNN - python

I have a project in which I need firstly to detect if the image is fake or not and if it is fake to detect the object. For the first part, I am using ELA and CNN in order to detect if the image is forged or not, but for the object detection I need to use Mask R-CNN, but unfortunately I have a problem understanding how to use it. I am using the CASIA v2 dataset and I have the ground truth masks for all the forged images.
I saw that every model online is using the model COCO for the mask RCNN, but I need the model to be trained on my dataset. Also, I saw that I need a list of labels, but for my project I only need to display fake on the detected object, is it alright if in the label.txt I will only write "Fake"?
Also, I am a little bit new to Deep Learning, so any help is useful.

Related

Conditional Detection using Tensorflow Object Detection API

I am detecting hand gestures using tensorflow object detection api but I want to apply condition to this detection means I want to detect hand gesture of a person when a person is wearing a whistle otherwise no detection. Can I achieve this using Tensorflow object detection api. If no suggest me some good methods of achieving this. thanks :)
the first thing that you should perform is customizing a pre-trained Tensorflow model with one class, using this technique you can generate a model which has only one class, named "hand" for example. but how to do this? don't worry, just simply follow the below steps:
download Tensorflow model master from GitHub and build it. (you can clone it using git instead of downloading it).
after your building is accomplished, you have to label your train image using label image software as well. the output of the label image is a CSV file.
in the next step you have to convert this CSV file to the record file.
and then train your own model.
after that your model is being trained, you have to export your model and do the target detection simply.
it is good to say that if you don't have an Nvidia GPU, use google Colab because this process is very time-consuming by making use of CPU.
best wishes to you.
you can prepare the dataset (1) labeling the hand as wearing a whistle and (2) not wearing a whistle after that training with the classification model and not forgetting to preprocess before training

Can I use DeepSORT without deep learning detection such as YOLO?

I am new to computer vision, and I still didn't try any kind of neural network detections such as yolo, however, I am wishing to do object tracking before entering the field of detection. I started reading about deep sort and all the projects use deep learning detections that needs training. My question is, can I give an ROI result to my deep SORT tracker instead of detections using YOLO and it continues tracking the object selected with ROI.
Here is a link that i found information about the code of DeepSORT.DeepSORT: Deep Learning to Track Custom Objects in a Video
In DeepSORT, you need to have detection in order to perform tracking. It is a tracking-by-detection method. The detection results are input to the Kalman filter component of DeepSORT. The filter generates tracking predictions. Also, the bounding boxes from detection are used to extract crops of RoI from the input image. These image crops are used by the trained Siamese model for feature extraction. The feature extraction by the Siamese model helps in reducing ID Switch.
If you are only interested in doing tracking and ID switch in case of occlusion is not your concern then you can have look at CenterTrack. It does joint detection and tracking in a single model. In this case, you can avoid model training from scratch. The authors provide pre-trained models for tracking both pedestrians and vehicles. As compared to DeepSORT the CenterTrack is pretty fast. 
[Sorry for the late reply] I think you should try Siamese Network for tracking by selecting the ROI region. You can find many variants in this given link.
https://github.com/HonglinChu/SiamTrackers.

Train A Custom Object Detection Model with YOLO v5

I'm trying to train a model with Yolo v5 to detect multiple objects on sales flyers. each image in the dataset used in training contains only one object and obviously a single bounding box.
I'm wondering if that will affect the performance of the model in a bad way? because what I'm trying to do at the end is detecting multiple objects on each sale flyer.
Thank you for your help.
It probably will lower your AP if you work like this, but give it a try. It really depends on your training images and your data augmentations. I dont know about YOLOv5 but YOLOv4 has its Mosaic Data Augmentation which will adress your problem in a way i guess.

How to train Tensorflow Object Detection images that do not contain objects?

I am training an object detection network using Tensorflow's object detection,
https://github.com/tensorflow/models/tree/master/research/object_detection
I can successfully train a network based on my own images and labels.
However, I have a large dataset of images that do not contain any of my labeled objects, and I want to be able to train the network to not detect anything in these images.
From what I understand with Tensorflow object detection, I need to give it a set of images and corresponding XML files that box and label the objects in the image. The scripts convert the XML to CSV and then to another format for the training, and do not allow XML files that have no objects.
How to give an image and XML files that have no objects?
Or, how does the network learn what is not an object?
For example if you want to detect "hot dogs" you can train it with a set of images with hot dogs. But how to train it what is not a hot dog?
An Object Detection CNN can learn what is not an object, simply by letting it see examples of images without any labels.
There are two main architecture types:
two-stages, with first stage object/region proposal (RPN), and second - classification and bounding box fine-tuning;
one-stage, which directly classifies and regresses BB based on the feature vector corresponding to a certain cell in the feature map.
In any case, there's a part which is responsible to decide what is an object and what's not. In RPN you have "objectness" score, and in one-stages there's the confidence of classification, where you usually a background class (i.e. everything which is not the supported classes).
So in both cases, in case a specific example in an image doesn't have any supported class, you teach the CNN to decrease the objectness score or increase the background confidence correspondingly.
You might want to take a look at this solution.
For for the tensorflow object detection API to include your negative examples, you need to add the negative examples to the csv file you have created from the xml, either by modifying the script that generates the csv file or by adding the examples afterwards.
For generating xml-files without class labels using LabelImg, you can do this by pressing "Verify Image".

Does inception model label multiple object in one image?

I used retrain.py to train tensorflow with my own dataset of traffic sign but it seems it doesn't capture multi-object in one image.I am using the label_image.py to detect the object in my image. I have an image of two road sign which exists in my dataset but i get only one sign with high accuracy. It doesn't detect other sign.
You have misunderstood what a classification CNN does. Inception is built and trained to classify an image. Not objects in an image. For this reason you will only get a single result from label_image.py as it is using softmax to generate a confidence that an image is of a certain class.
It does not identify individual objects as I explained to you on another question here: Save Image of Detected object from image using Tensor-flow
If you are trying to detect multiple signs then you will need to use object detection models.

Categories