How to define new models in Tensorflow Object Detection API? - python

Tensorflow Object Detection API is a marvelous resource and a unique piece of well-documented code. Its performance on object detection encourage me to use this API for detecting object poses similar to Poirson et. al.
In the case of faster-rcnn meta-architecture, pose detection requires adding a new regression layer along with bbox classification & regression layers and modifying ground truth feeding pipeline.
So, is there an easy way to modify the networks? Or should I dig into the code and make proper modifications which seems challenging? Any sample work or guidance will be appreciated.

Related

Conditional Detection using Tensorflow Object Detection API

I am detecting hand gestures using tensorflow object detection api but I want to apply condition to this detection means I want to detect hand gesture of a person when a person is wearing a whistle otherwise no detection. Can I achieve this using Tensorflow object detection api. If no suggest me some good methods of achieving this. thanks :)
the first thing that you should perform is customizing a pre-trained Tensorflow model with one class, using this technique you can generate a model which has only one class, named "hand" for example. but how to do this? don't worry, just simply follow the below steps:
download Tensorflow model master from GitHub and build it. (you can clone it using git instead of downloading it).
after your building is accomplished, you have to label your train image using label image software as well. the output of the label image is a CSV file.
in the next step you have to convert this CSV file to the record file.
and then train your own model.
after that your model is being trained, you have to export your model and do the target detection simply.
it is good to say that if you don't have an Nvidia GPU, use google Colab because this process is very time-consuming by making use of CPU.
best wishes to you.
you can prepare the dataset (1) labeling the hand as wearing a whistle and (2) not wearing a whistle after that training with the classification model and not forgetting to preprocess before training

Can I use DeepSORT without deep learning detection such as YOLO?

I am new to computer vision, and I still didn't try any kind of neural network detections such as yolo, however, I am wishing to do object tracking before entering the field of detection. I started reading about deep sort and all the projects use deep learning detections that needs training. My question is, can I give an ROI result to my deep SORT tracker instead of detections using YOLO and it continues tracking the object selected with ROI.
Here is a link that i found information about the code of DeepSORT.DeepSORT: Deep Learning to Track Custom Objects in a Video
In DeepSORT, you need to have detection in order to perform tracking. It is a tracking-by-detection method. The detection results are input to the Kalman filter component of DeepSORT. The filter generates tracking predictions. Also, the bounding boxes from detection are used to extract crops of RoI from the input image. These image crops are used by the trained Siamese model for feature extraction. The feature extraction by the Siamese model helps in reducing ID Switch.
If you are only interested in doing tracking and ID switch in case of occlusion is not your concern then you can have look at CenterTrack. It does joint detection and tracking in a single model. In this case, you can avoid model training from scratch. The authors provide pre-trained models for tracking both pedestrians and vehicles. As compared to DeepSORT the CenterTrack is pretty fast. 
[Sorry for the late reply] I think you should try Siamese Network for tracking by selecting the ROI region. You can find many variants in this given link.
https://github.com/HonglinChu/SiamTrackers.

Implementation of Combining Faster R-CNN and U-net Network for instance segmentation?

I saw an article discussing "Combining Faster R-CNN and U-net Network for Efficient Whole Heart Segmentation", has anyone seen such an implementation on github or such ?
I want to create an instance segmentation model which can preform well on biological images (arabidopsis seedlings).
I would appreciate if someone could give me some feed back :)

Bounding boxes using tensorflow and inception-v3

Is it possible to have bounding boxes prediction using TensorFlow?
I found TensorBox on github but I'm looking for a better supported or maybe official way to address this problem.
I need to retrain the model for my own classes.
It is unclear what exactly do you mean. Do you need object detection? I assume it from the 'bounding boxes'. If so, inception networks are not directly applicable for your task, they are classification networks.
You should look for object detection models, like Single Shot Detector (SSD) or You Only Look Once (YOLO). They often use pre-trained convolutional layers from classification networks, but have additional layers on the top of it. If you want Inception (aka GoogLeNet), YOLO is based on that. Take a look at this implementation: https://github.com/thtrieu/darkflow or any other you can find in Google.
The COCO2016 winner for object detection was implemented in tensorflow. Some state of the art techniques are Faster R-CNN, R-FCN and SSD. Check the slides from http://image-net.org/challenges/talks/2016/GRMI-COCO-slidedeck.pdf (Slide 14 has key tensorflow ops for you to recreate this pipeline).
Edit 6/19/2017:
Tensorflow released some techniques to predict bboxes:
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

How to do object detection using CNN's features in tensorflow?

I am trying to make an end to end unified model that detects(localizes) the object in an image. The object itself can be of many types, like "text in the wild", but the surrounding features of the object should determine where the region of interest is.
Like detecting a human face, without considering the features of the face itself. i.e its some rage distance about the neck.
I'm expecting the output to be coordinates of the object, or like the image-net format to generate bounding boxes like : [xmin , ymin , xmax, ymax]
I have a data-set of 500 images. Are there any examples of object detection in tensorflow based on surrounding features. i.e the feature maps from conv1 or conv2. ?
There is Tensorflow based framework for object detection/localization that you can check out:
https://github.com/Russell91/TensorBox
Though, I am not sure that 500 images would be enough to successfully retrain provided model(s).
Object detection using deep learning is broadly classified in to one-stage detectors (Yolo,SSD) and two stage detectors like Faster RCNN. Google's repo[1] contains pre-trained models for various detection architectures.
You could pick up a pre-trained model and then train it on your dataset. The two-stage model is modular and you have a choice of different feature extractors depending on whether speed/accuracy is crucial for you.
[1] Google's object detection repository

Categories