Can you integrate opencv SIFT with a tensorflow model? - python

I am trying to create a CNN, but using the SIFT algorithm instead of any pooling layers.
Problem is I can't seem to find any Python implementation of the algorithm in Tensorflow or PyTorch. The only implementation I have seen of it is with opencv.
Is it possible to use the opencv SIFT implementation as a layer in a Tensorflow CNN Model?
If so, how would you go about creating it?

While this is an interesting idea, I believe it has numerous issues that make it highly impracticable to impossible.
Layers of a network have to be differentiable with regards to their input to allow any gradients to be calculated, which are then used to update the weights.
While I think it might be possible to write a fully differentiable sift implementation, this alone will be impractical.
Further SIFT does not have a constant number of outputs and takes a long time to compute, which would slow down the training a lot.
The only practical way to use SIFT with neural networks would be to first run SIFT and then use the top N detected keypoints as input for the first layer. However, I'm not sure this would be successful.

Related

How to understand/debug/visualize U-Net segmentation results

I am training a U-Net architecture to for a segmentation task. This is in Python using Keras. I have now run into an issue, that I am trying to understand:
I have two very similar images from a microscopy image series (these are consecutive images), where my current U-Net model performs very good on one, but performs extremely poor on the immediately following one. However, there is little difference between the two to the eye and the histograms also look very much alike. Also on other measurements the model performs great across the whole frame-range, but then this issue appears for other measurements.
I am using data-augmentation during training (histogram stretching, affine transformation, noise-addition) and I am surprised that still the model is so brittle.
Since the U-Net is still mostly a black-box to me, I want to find out steps I can take to better understand the issue and then adjust the training/model accordingly.
I know there are ways to visualize what individual layers learn (e.g. as discussed F. Chollets book see here) and I should be able to apply these to U-Nets, which is fully convolutional.
However, these kinds of methods are practically always discussed in the realm of classifying networks - not semantic segmentation.
So my question is:
Is this the best/most direct approach to reach an understanding of how U-Net models attain a segmentation result? If not, what are better ways to understand/debug U-Nets?
I suggest you use the U-Net container on NGC https://ngc.nvidia.com/catalog/resources/nvidia:unet_industrial_for_tensorflow
I also suggest you read this: Mixed Precision Training: https://arxiv.org/abs/1710.03740
https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/
Let me know how you are progressing and if any public repo, happy to have a look

Is it possible to remove categories in a pretrained tensorflow model?

I am currently using Tensorflow Object Detection API for my human detection app.
I tried filtering in the API itself which worked but I am still not contended by it because it's slow. So I'm wondering if I could remove other categories in the model itself to also make it faster.
If it is not possible, can you please give me other suggestions to make the API faster since I will be using two cameras. Thanks in advance and also pardon my english :)
Your questions addresses several topics for using neural network pretrained models.
Theoretical methods
In general, you can always neutralize categories by removing the corresponding neurons in the softmax layer and compute a new softmax layer only with the relevant rows of the matrix.
This method will surely work (maybe that is what you meant by filtering) but will not accelerate the network computation time by much, since most of the flops (multiplications and additions) will remain.
Similar to decision trees, pruning is possible but may reduce performance. I will explain what pruning means, but note that the accuracy over your categories may remain since you are not just trimming, you are predicting less categories as well.
Transfer the learning to your problem. See stanford's course in computer vision here. Most of the times I've seen that works good is by keeping the convolution layers as-is, and preparing a medium-size dataset of the objects you'd like to detect.
I will add more theoretical methods if you request, but the above are the most common and accurate I know.
Practical methods
Make sure you are serving your tensorflow model, and not just using an inference python code. This could significantly accelerate performance.
You can export the parameters of the network and load them in a faster framework such as CNTK or Caffe. These frameworks work in C++/CSharp and can inference much faster. Make sure you load the weights correctly, some frameworks use different order in tensor dimensions when saving/loading (little/big endian-like issues).
If your application perform inference on several images, you can distribute the computation via several GPUs. **This can also be done in tensorflow, see Using GPUs.
Pruning a neural network
Maybe this is the most interesting method of adapting big networks for simple tasks. You can see a beginner's guide here.
Pruning means that you remove parameters from your network, specifically the whole nodes/neurons in a decision tree/neural network (resp). To do that in object detection, you can do as follows (simplest way):
Randomly prune neurons from the fully connected layers.
Train one more epoch (or more) with low learning rate, only on objects you'd like to detect.
(optional) Perform the above several times for validation and choose best network.
The above procedure is the most basic one, but you can find plenty of papers that suggest algorithms to do so. For example
Automated Pruning for Deep Neural Network Compression and An iterative pruning algorithm for feedforward neural networks.

What should the output layer of my CNN look like?

I am running a model to detect a few interesting features in an image. I have a set of images measuring 600x200 px. These images have features such as rock fragments that I would like to identify. Imagine a (4x12) grid overlayed on the image I can produce annotations manually using an annotator tool such as ((4,9), (3,10), (3,11), (3,12)) to identify the interesting cells in the image. I can build a CNN model with Keras with the input as a grayscale image. But how should I encode the output. One way that seems intuitive to me is to treat it as a sparse matrix of shape (12,4,1) and only the interesting cells have 1 while others have 0.
Is there a better way to encode the outputs?
What should be the activation function on the last layer be? I am using ReLU for the hidden layers.
What should the loss function be? Will mean_squared_error work?
Your problem is really similiar to both detection and segmentation problems (you can read about it e.g. here. The approach you proposed is reasonable because in both detection and segmentation tasks computing the feature map you proposed is an usual part of training pipeline. However - there are several problem you might come across:
memory issues: you need to either deal with sparse tensors or use generators in order to deal with memory problems,
loss and activation: loss and activation for segmentation are currently not supported by Keras API so you need to implement it on your own. Here and here you can find an examples on how to tackle this problem.
In case of detection only (not classification of this points) I would advice you to use sigmoid and binary_crossentropy. In case of classification softmax and categorical_crossentropy.
Of course - there are other ways on how to tackle this problem. One could solve it as a regression where you need to predict the pixels where there is something interesting. But dealing with varying input in Keras is rather cumbersome.

Visualizing a feature/kernel produced by a CNN via an "optimal" input image using Keras

So I've been doing a lot of research regarding the visualization of CNN's and I can't seem to find a solution to what I'm trying to do, or at least to my understanding of the methodologies employed. A lot of it is pretty new and cutting edge, so I could just not be properly grasping the concepts.
Basically, I want to take a learned kernel/feature as trained by a CNN and essentially manufacture an "optimized" picture such that when the kernel is convolved with said picture, we have the highest convolutional sum possible.
If I'm not mistaken, this should exaggerate the features of that kernel on the image level rather than at the filter/kernel level, which seems to be what most have done in terms of visualizing these filters.
In case what I'm asking is not clear, here's an example (probably bad, but it'll get the point across.)
Assume we are using MNIST and I've created a CNN like so:
5x5 Conv with 10 kernels/Feature Maps
Relu
2x2 MaxPool 2 stride
Dense + Softmax
Let's say I've fully trained my model and now want to look at one of the 10 5x5 kernels it produced and get a better idea of what it's looking for. I want to manufacture a new 28x28 picture such that when convolved with this 5x5 kernel, the sum of the 28x28 convolution is maximized.
Are there techniques that already do something like this? I feel like everything I see involves either "unwinding" or "reversing" the neural net (https://arxiv.org/pdf/1311.2901.pdf), viewing the feature maps as pictures pass through (http://kvfrans.com/visualizing-features-from-a-convolutional-neural-network/), or just looking at the kernels themselves (https://www.youtube.com/watch?v=AgkfIQ4IGaM).
Is it even something useful to look at? I feel like this is the closest thing I've seen to what I'm requesting. https://arxiv.org/pdf/1312.6034.pdf
Any insight would be a huge help, thanks!
This is called activation maximization, and Keras even has an example of it available here. Note that the code in the post might be outdated for current Keras versions, but a updated version is available in the examples folder in Keras.

Bounding boxes using tensorflow and inception-v3

Is it possible to have bounding boxes prediction using TensorFlow?
I found TensorBox on github but I'm looking for a better supported or maybe official way to address this problem.
I need to retrain the model for my own classes.
It is unclear what exactly do you mean. Do you need object detection? I assume it from the 'bounding boxes'. If so, inception networks are not directly applicable for your task, they are classification networks.
You should look for object detection models, like Single Shot Detector (SSD) or You Only Look Once (YOLO). They often use pre-trained convolutional layers from classification networks, but have additional layers on the top of it. If you want Inception (aka GoogLeNet), YOLO is based on that. Take a look at this implementation: https://github.com/thtrieu/darkflow or any other you can find in Google.
The COCO2016 winner for object detection was implemented in tensorflow. Some state of the art techniques are Faster R-CNN, R-FCN and SSD. Check the slides from http://image-net.org/challenges/talks/2016/GRMI-COCO-slidedeck.pdf (Slide 14 has key tensorflow ops for you to recreate this pipeline).
Edit 6/19/2017:
Tensorflow released some techniques to predict bboxes:
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html

Categories