Bounding boxes using tensorflow and inception-v3 - python

Is it possible to have bounding boxes prediction using TensorFlow?
I found TensorBox on github but I'm looking for a better supported or maybe official way to address this problem.
I need to retrain the model for my own classes.

It is unclear what exactly do you mean. Do you need object detection? I assume it from the 'bounding boxes'. If so, inception networks are not directly applicable for your task, they are classification networks.
You should look for object detection models, like Single Shot Detector (SSD) or You Only Look Once (YOLO). They often use pre-trained convolutional layers from classification networks, but have additional layers on the top of it. If you want Inception (aka GoogLeNet), YOLO is based on that. Take a look at this implementation: https://github.com/thtrieu/darkflow or any other you can find in Google.

The COCO2016 winner for object detection was implemented in tensorflow. Some state of the art techniques are Faster R-CNN, R-FCN and SSD. Check the slides from http://image-net.org/challenges/talks/2016/GRMI-COCO-slidedeck.pdf (Slide 14 has key tensorflow ops for you to recreate this pipeline).
Edit 6/19/2017:
Tensorflow released some techniques to predict bboxes:
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Related

How to implement RPN in Faster RCNN for object detection?

I am trying to implement the Faster RCNN on identifying the airplane from the image. I stuck at implementing Region Proposal Network(RPN). How can I implement RPN and train them to produce bounding box proposal using python script?
There are plenty of ready-to-use implementations of various neural networks including Faster RCNN. Consider using DL frameworks such as Pytorch or Keras.
For example, see this Pytorch tutorial on fine-tuning the Mask R-CNN model.
Faster RCNN is a two-stage object detection model. Where the first stage is an RPN (Region Proposal Network), and the second is a classifier. For your task, you can ignore the second part if you don't need it.
Some implementations:
Faster RCNN in Pytorch
Faster RCNN in Keras

Can I use DeepSORT without deep learning detection such as YOLO?

I am new to computer vision, and I still didn't try any kind of neural network detections such as yolo, however, I am wishing to do object tracking before entering the field of detection. I started reading about deep sort and all the projects use deep learning detections that needs training. My question is, can I give an ROI result to my deep SORT tracker instead of detections using YOLO and it continues tracking the object selected with ROI.
Here is a link that i found information about the code of DeepSORT.DeepSORT: Deep Learning to Track Custom Objects in a Video
In DeepSORT, you need to have detection in order to perform tracking. It is a tracking-by-detection method. The detection results are input to the Kalman filter component of DeepSORT. The filter generates tracking predictions. Also, the bounding boxes from detection are used to extract crops of RoI from the input image. These image crops are used by the trained Siamese model for feature extraction. The feature extraction by the Siamese model helps in reducing ID Switch.
If you are only interested in doing tracking and ID switch in case of occlusion is not your concern then you can have look at CenterTrack. It does joint detection and tracking in a single model. In this case, you can avoid model training from scratch. The authors provide pre-trained models for tracking both pedestrians and vehicles. As compared to DeepSORT the CenterTrack is pretty fast. 
[Sorry for the late reply] I think you should try Siamese Network for tracking by selecting the ROI region. You can find many variants in this given link.
https://github.com/HonglinChu/SiamTrackers.

How to define new models in Tensorflow Object Detection API?

Tensorflow Object Detection API is a marvelous resource and a unique piece of well-documented code. Its performance on object detection encourage me to use this API for detecting object poses similar to Poirson et. al.
In the case of faster-rcnn meta-architecture, pose detection requires adding a new regression layer along with bbox classification & regression layers and modifying ground truth feeding pipeline.
So, is there an easy way to modify the networks? Or should I dig into the code and make proper modifications which seems challenging? Any sample work or guidance will be appreciated.

Can Tensorflow be used to detect if a particular feature exists in an image?

For example, using OpenCV and haarcascade_frontal_face.xml, we can predict if a face exists in an image. I would like to know if such a thing (detecting an object of interest) is possible with Tensorflow and if yes, how?
Yes and no. Tensorflow is a graph computation library mostly suited for neural networks.
You can create a neural network that determines if a face is in the image or not... You can even search for existing implementations that use Tensorflow...
There is no default Haar feature based cascade classifier in Tensorflow...

How to create bounding boxes around the ROIs using TensorFlow

I'm using inception v3 and tensorflow to identify some objects within the image.
However, it just create a list of possible objects and I need it to inform their position in the image.
I'm following the flowers tutorial: https://www.tensorflow.org/versions/r0.9/how_tos/image_retraining/index.html
bazel-bin/tensorflow/examples/image_retraining/retrain --image_dir
~/flower_photos
Inception is a classification network, not a localization network.
You need another architecture to predict the bounding boxes, like R-CNN and its newer (and faster) variants (Fast R-CNN, Faster R-CNN).
Optionally, if you want to use inception and you have a train set annotated with class and bounding box coordinates, you can add a regression head to inception, and make the network learn to regress the bounding box coordinates.
It's the same thing of transfer learning, but you just use the last convolutional layer output as a feature extractor, and train this new head to regress 4 coordinates + 1 class for every bounding box in your training set.
By default inception does not output coordinates. There are specific tools for that like Faster R-CNN available for Caffe.
If you want to stick with tensorflow, you can retrain the inception to output the coordinates if you have the human annotated images.
Putting bounding boxes around objects is usually called detection in the lingo of the field, and there is a whole category of networks designed for it. There's a separate category in the PASCAL VOC competition for detection, and that's a good place to find good detection networks
My favorite detection network (which is the currently leader for the 2012 PASCAL VOC dataset) is YOLO, which starts with a typical classifier, but then has some extra layers to support bounding boxes. Instead of just returning a class, it produces downsampled version of the original image, where each pixel has its own class. Then it has a regression layer that predicts the exact position and size of the bounding boxes. You can start with a pre-trained classifier, and then modify it into a YOLO network and retrain it. The procedure is described in the the original paper about YOLO
I like YOLO because it has a simple structure, compared to other detection networks, it allows you to use transfer learning from classification networks (which makes it easier to train), and the detection speed is very fast. It was actually developed for real-time detection in video.
There is an implementation of YOLO in TensorFlow, if you would like to avoid using the custom darknet framework used by the authors of YOLO.

Categories