How to implement RPN in Faster RCNN for object detection? - python

I am trying to implement the Faster RCNN on identifying the airplane from the image. I stuck at implementing Region Proposal Network(RPN). How can I implement RPN and train them to produce bounding box proposal using python script?

There are plenty of ready-to-use implementations of various neural networks including Faster RCNN. Consider using DL frameworks such as Pytorch or Keras.
For example, see this Pytorch tutorial on fine-tuning the Mask R-CNN model.
Faster RCNN is a two-stage object detection model. Where the first stage is an RPN (Region Proposal Network), and the second is a classifier. For your task, you can ignore the second part if you don't need it.
Some implementations:
Faster RCNN in Pytorch
Faster RCNN in Keras

Related

Can you integrate opencv SIFT with a tensorflow model?

I am trying to create a CNN, but using the SIFT algorithm instead of any pooling layers.
Problem is I can't seem to find any Python implementation of the algorithm in Tensorflow or PyTorch. The only implementation I have seen of it is with opencv.
Is it possible to use the opencv SIFT implementation as a layer in a Tensorflow CNN Model?
If so, how would you go about creating it?
While this is an interesting idea, I believe it has numerous issues that make it highly impracticable to impossible.
Layers of a network have to be differentiable with regards to their input to allow any gradients to be calculated, which are then used to update the weights.
While I think it might be possible to write a fully differentiable sift implementation, this alone will be impractical.
Further SIFT does not have a constant number of outputs and takes a long time to compute, which would slow down the training a lot.
The only practical way to use SIFT with neural networks would be to first run SIFT and then use the top N detected keypoints as input for the first layer. However, I'm not sure this would be successful.

How to improve YOLOv3 detection time? (OpenCV + Python)

I'm using YOLOv3 custom trained model with OpenCV 4.2.0 compiled with CUDA. When I'm testing code in Python I'm using OpenCV on GPU (GTX1050 Ti) but detection on single image (416px x 416px) takes 0.055 s (~20 FPS). My config file is set to small object detection, because I need to detect ~ 10px x 10px objects on 2500px x 2000px images so I split original image into 30 smaller pieces. My goal is to reach 0.013 s (~80 FPS) on 416px x 416px image. Is it possible in Python with OpenCV? If not, how to do it in proper way?
PS. Currently detection takes like 50% of CPU, 5GB RAM and 6% GPU.
Some of the preferred ways to improve detection time with already trained Yolov3 model are:
Quantisation: Run inference with INT8 instead of FP32. You can use this repo for this purpose.
Use Inference accelerator such as TensorRT since you're using Nvidia's GPU. The tool includes good amount of inference oriented optimisations along with quantisation optimisations INT8 and FP16 to reduce detection time. This thread talks about Yolov3 inference with TensorRT5. Use this repo for Yolov3 on TensorRT7.
Use inference library such as tkDNN, which is a Deep Neural Network library built with cuDNN and tensorRT primitives, specifically thought to work on NVIDIA Jetson Boards.
If you're open to do the model training there are few more options other than the ones mentioned above:
You can train the models with tinier versions rather than full Yolo versions, of course this comes at the cost of drop in accuracy/mAP. You can train tiny-yolov4 (latest model) or train tiny-yolov3.
Model Pruning - If you could rank the neurons in the network according to how much they contribute, you could then remove the low ranking neurons from the network, resulting in a smaller and faster network. Pruned yolov3 research paper and it's implementation. This is another pruned Yolov3 implementation.

Is it possible to remove categories in a pretrained tensorflow model?

I am currently using Tensorflow Object Detection API for my human detection app.
I tried filtering in the API itself which worked but I am still not contended by it because it's slow. So I'm wondering if I could remove other categories in the model itself to also make it faster.
If it is not possible, can you please give me other suggestions to make the API faster since I will be using two cameras. Thanks in advance and also pardon my english :)
Your questions addresses several topics for using neural network pretrained models.
Theoretical methods
In general, you can always neutralize categories by removing the corresponding neurons in the softmax layer and compute a new softmax layer only with the relevant rows of the matrix.
This method will surely work (maybe that is what you meant by filtering) but will not accelerate the network computation time by much, since most of the flops (multiplications and additions) will remain.
Similar to decision trees, pruning is possible but may reduce performance. I will explain what pruning means, but note that the accuracy over your categories may remain since you are not just trimming, you are predicting less categories as well.
Transfer the learning to your problem. See stanford's course in computer vision here. Most of the times I've seen that works good is by keeping the convolution layers as-is, and preparing a medium-size dataset of the objects you'd like to detect.
I will add more theoretical methods if you request, but the above are the most common and accurate I know.
Practical methods
Make sure you are serving your tensorflow model, and not just using an inference python code. This could significantly accelerate performance.
You can export the parameters of the network and load them in a faster framework such as CNTK or Caffe. These frameworks work in C++/CSharp and can inference much faster. Make sure you load the weights correctly, some frameworks use different order in tensor dimensions when saving/loading (little/big endian-like issues).
If your application perform inference on several images, you can distribute the computation via several GPUs. **This can also be done in tensorflow, see Using GPUs.
Pruning a neural network
Maybe this is the most interesting method of adapting big networks for simple tasks. You can see a beginner's guide here.
Pruning means that you remove parameters from your network, specifically the whole nodes/neurons in a decision tree/neural network (resp). To do that in object detection, you can do as follows (simplest way):
Randomly prune neurons from the fully connected layers.
Train one more epoch (or more) with low learning rate, only on objects you'd like to detect.
(optional) Perform the above several times for validation and choose best network.
The above procedure is the most basic one, but you can find plenty of papers that suggest algorithms to do so. For example
Automated Pruning for Deep Neural Network Compression and An iterative pruning algorithm for feedforward neural networks.

How to define new models in Tensorflow Object Detection API?

Tensorflow Object Detection API is a marvelous resource and a unique piece of well-documented code. Its performance on object detection encourage me to use this API for detecting object poses similar to Poirson et. al.
In the case of faster-rcnn meta-architecture, pose detection requires adding a new regression layer along with bbox classification & regression layers and modifying ground truth feeding pipeline.
So, is there an easy way to modify the networks? Or should I dig into the code and make proper modifications which seems challenging? Any sample work or guidance will be appreciated.

Bounding boxes using tensorflow and inception-v3

Is it possible to have bounding boxes prediction using TensorFlow?
I found TensorBox on github but I'm looking for a better supported or maybe official way to address this problem.
I need to retrain the model for my own classes.
It is unclear what exactly do you mean. Do you need object detection? I assume it from the 'bounding boxes'. If so, inception networks are not directly applicable for your task, they are classification networks.
You should look for object detection models, like Single Shot Detector (SSD) or You Only Look Once (YOLO). They often use pre-trained convolutional layers from classification networks, but have additional layers on the top of it. If you want Inception (aka GoogLeNet), YOLO is based on that. Take a look at this implementation: https://github.com/thtrieu/darkflow or any other you can find in Google.
The COCO2016 winner for object detection was implemented in tensorflow. Some state of the art techniques are Faster R-CNN, R-FCN and SSD. Check the slides from http://image-net.org/challenges/talks/2016/GRMI-COCO-slidedeck.pdf (Slide 14 has key tensorflow ops for you to recreate this pipeline).
Edit 6/19/2017:
Tensorflow released some techniques to predict bboxes:
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Categories