How to use Inception-v3 as a convolutional network

How to use Inception-v3 as a convolutional network - python

So, I've retrained the Inception-v3 network to classify specific kinds of data - for training I've provided it with 200x200 pictures. Now, when I run the graph on another 200x200 picture it works just fine. What I want to achieve is to turn it into a filter for a convolutional network - i.e. slide it as a filter through the whole picture and get the probability of each pixel being in a given class.
It seems to be fairly simple to do manually - just splice the picture into small sections, classify each of them, put the results together and voila. But that would be very inefficient. Instead, I want to do something like what is described here: http://cs231n.github.io/convolutional-networks/#convert. Basically, change the last FC layer and turn it into a CONV layer by reshaping the weights. Seems simple enough, but I can't figure out how to actually do this.
My main problem is that at the end of the Inception-v3 net, right before the last FC layer, there's a pooling operation that reformats the data into (1,2048) shape, so I won't really be able to perform a convolution here.
Could anyone help me out?

My most immediate solution for this is to skip the fully connected layer in the end as it would cause the input image to lose its initial structure. Doing a Conv -> FC -> Conv seems redundant

Related

How do you determine what convolutional layers look for in Tensorflow?

This is my first post, so I'm sorry if I'm doing something wrong. I'm a beginner when it comes to machine learning. I want to create an accurate model with the cifar10 dataset. I've learned that a Conv2D layer looks for a specific thing in an image classification model. But how do I know what the layer is looking for? I'm not sure if I make sense here, or if anything I've learned so far is true could someone help me out in understanding how layers work in tensorfow?

Convolutional layers extract features from images. They apply a filter to images and output a feature map. Then, one may add dense layers afterwards to obtain labels from these features.
Conv2D is just a 2d convolution used with 2d images - this means that the kernel is 2d also.
Here are a few helpful links: https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/
https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac

How to train different size of image using cnn? [duplicate]

I am trying to train my model which classifies images.
The problem I have is, they have different sizes. how should i format my images/or model architecture ?

You didn't say what architecture you're talking about. Since you said you want to classify images, I'm assuming it's a partly convolutional, partly fully connected network like AlexNet, GoogLeNet, etc. In general, the answer to your question depends on the network type you are working with.
If, for example, your network only contains convolutional units - that is to say, does not contain fully connected layers - it can be invariant to the input image's size. Such a network could process the input images and in turn return another image ("convolutional all the way"); you would have to make sure that the output matches what you expect, since you have to determine the loss in some way, of course.
If you are using fully connected units though, you're up for trouble: Here you have a fixed number of learned weights your network has to work with, so varying inputs would require a varying number of weights - and that's not possible.
If that is your problem, here's some things you can do:
Don't care about squashing the images. A network might learn to make sense of the content anyway; does scale and perspective mean anything to the content anyway?
Center-crop the images to a specific size. If you fear you're losing data, do multiple crops and use these to augment your input data, so that the original image will be split into N different images of correct size.
Pad the images with a solid color to a squared size, then resize.
Do a combination of that.
The padding option might introduce an additional error source to the network's prediction, as the network might (read: likely will) be biased to images that contain such a padded border.
If you need some ideas, have a look at the Images section of the TensorFlow documentation, there's pieces like resize_image_with_crop_or_pad that take away the bigger work.
As for just don't caring about squashing, here's a piece of the preprocessing pipeline of the famous Inception network:
# This resizing operation may distort the images because the aspect
# ratio is not respected. We select a resize method in a round robin
# fashion based on the thread number.
# Note that ResizeMethod contains 4 enumerated resizing methods.
# We select only 1 case for fast_mode bilinear.
num_resize_cases = 1 if fast_mode else 4
distorted_image = apply_with_random_selector(
distorted_image,
lambda x, method: tf.image.resize_images(x, [height, width], method=method),
num_cases=num_resize_cases)
They're totally aware of it and do it anyway.
Depending on how far you want or need to go, there actually is a paper here called Spatial Pyramid Pooling in Deep Convolution Networks for Visual Recognition that handles inputs of arbitrary sizes by processing them in a very special way.

Try making a spatial pyramid pooling layer. Then put it after your last convolution layer so that the FC layers always get constant dimensional vectors as input . During training , train the images from the entire dataset using a particular image size for one epoch . Then for the next epoch , switch to a different image size and continue training .

Customized convolutional layer in TensorFlow

Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.

Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)

How to feed image sequences to convolutional layers and apply conv-lstm cells after?

I'm currently trying to implement the following paper: https://research.nvidia.com/sites/default/files/publications/dnn_denoise_author.pdf
I have troubles in adapting my network, which currently processes only single images, to processing image sequences.
My data has the following shape: (7, 512, 512, 1), where 7 is the number of frames in my sequence, 512 is the width and height of an image and 1 the number of channels.
My question is how to pass a sequence through convolutional layers? (the conv3d suggestion that I saw for other questions of this type seems weird, since I have 7 frames).
Then, I wish to pass the result of the convolutional layers to a ConvLSTM block, however, is this even possible given the feature maps obtained after convolutions and maxpooling? (Other answers for using ConvLSTM block refer only to applying them on the sequence directly). The result of this operations will be again fed to convolutions and maxpooling and so on.
I have also checked the other questions involving CNN and RNN and I was thinking of using the TimeDistributed(...(...)) type of functions, but I am not sure if I'm going in the right direction. Any piece of advice is more than welcomed.
Thank you for your time!

I am facing similar situation where I have n frame sequence and I would like to predict the next frame after given sequence, the solution would be to forward image n through network and get loss from frame n+1 and repeat(n+1, n+2 etc.). Hope I understand this right.

Visualizing a feature/kernel produced by a CNN via an "optimal" input image using Keras

So I've been doing a lot of research regarding the visualization of CNN's and I can't seem to find a solution to what I'm trying to do, or at least to my understanding of the methodologies employed. A lot of it is pretty new and cutting edge, so I could just not be properly grasping the concepts.
Basically, I want to take a learned kernel/feature as trained by a CNN and essentially manufacture an "optimized" picture such that when the kernel is convolved with said picture, we have the highest convolutional sum possible.
If I'm not mistaken, this should exaggerate the features of that kernel on the image level rather than at the filter/kernel level, which seems to be what most have done in terms of visualizing these filters.
In case what I'm asking is not clear, here's an example (probably bad, but it'll get the point across.)
Assume we are using MNIST and I've created a CNN like so:
5x5 Conv with 10 kernels/Feature Maps
Relu
2x2 MaxPool 2 stride
Dense + Softmax
Let's say I've fully trained my model and now want to look at one of the 10 5x5 kernels it produced and get a better idea of what it's looking for. I want to manufacture a new 28x28 picture such that when convolved with this 5x5 kernel, the sum of the 28x28 convolution is maximized.
Are there techniques that already do something like this? I feel like everything I see involves either "unwinding" or "reversing" the neural net (https://arxiv.org/pdf/1311.2901.pdf), viewing the feature maps as pictures pass through (http://kvfrans.com/visualizing-features-from-a-convolutional-neural-network/), or just looking at the kernels themselves (https://www.youtube.com/watch?v=AgkfIQ4IGaM).
Is it even something useful to look at? I feel like this is the closest thing I've seen to what I'm requesting. https://arxiv.org/pdf/1312.6034.pdf
Any insight would be a huge help, thanks!

This is called activation maximization, and Keras even has an example of it available here. Note that the code in the post might be outdated for current Keras versions, but a updated version is available in the examples folder in Keras.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.