Implementing LeNet in Pytorch

Implementing LeNet in Pytorch - python

Sorry if this question is incredibly basic. I feel like there is a wealth of resources online, but most of them are half-complete or skip over the details that I want to know.
I am trying to implement LeNet with Pytorch for practice.
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html
How come in this examples and many examples online, they define the convolutional layers and the fc layers in init, but the subsampling and activation functions in forward?
What is the purpose of using torch.nn.functional for some functions, and torch.nn for others? For example, you have convolution with torch.nn (https://pytorch.org/docs/stable/nn.html#conv1d) and convolution with torch.nn.functional (https://pytorch.org/docs/stable/nn.functional.html#conv1d). Why choose one or the other?
Let's say I want to try different image sizes, like 28x28 (MNIST). The tutorial recommends I resize MNIST. Is there a way to instead change the values of LeNet? What happens if I don't change them?
What is the purpose of num_flat_features? If you wanted to flatten the features, couldn't you just do x = x.view(-1, 16*5*5)?

How come in this examples and many examples online, they define the
convolutional layers and the fc layers in init, but the subsampling
and activation functions in forward?
Any layer with trainable parameters should be defined in __init__. Subsampling, certain activations, dropout, etc.. don't have any trainable parameters so can be defined either in __init__ or used directly via the torch.nn.functional interface during forward.
What is the purpose of using torch.nn.functional for some functions, and torch.nn for others?
The torch.nn.functional functions are the actual functions that are used at the heart of the majority of torch.nn layers, they call into C++ compiled code. For example nn.Conv2d subclasses nn.Module, as should any custom layer or model which contains trainable parameters. The class handles registering parameters and encapsulates some other necessary functionality required for training and testing. During forward it actually uses nn.functional.conv2d to apply the convolution operation. As mentioned in the first question, when performing a parameterless operation like ReLU there's effectively no difference between using the nn.ReLU class and the nn.functional.relu function.
The reason they are provided is they give some freedom to do unconventional things. For example in this answer which I wrote the other day, providing a solution without nn.functional.conv2d would have been difficult.
Let's say I want to try different image sizes, like 28x28 (MNIST). The
tutorial recommends I resize MNIST. Is there a way to instead change
the values of LeNet? What happens if I don't change them?
There's no obvious way to change an existing, trained model to support different image sizes. The size of the input to the linear layer is necessarily fixed and the number of features at that point in the model is generally determined by the size of the input to the network. If the size of the input differs from the size that the model was designed for then when the data progresses to the linear layers it will have the wrong number of elements and cause the program will crash. Some models can handle a range of input sizes, usually by using something like an nn.AdaptiveAvgPool2d layer before the linear layer to ensure the input shape to the linear layer is always the same. Even so, if the input image size is too small then the downsampling and/or pooling operations in the network will cause the feature maps to vanish at some point, causing the program to crash.
What is the purpose of num_flat_features? If you wanted to flatten the
features, couldn't you just do x = x.view(-1, 16*5*5)?
When you define the linear layer you need to tell it how large the weight matrix is. A linear layer's weights are simply an unconstrained matrix (and bias vector). The shape of the weight matrix therefore is determined by the input shape, but you don't know the input shape before you run forward so it needs to be provided as an additional parameter (or hard coded) when you initialize the model.
To get to the actual question. Yes, during forward you could simply use
x = x.view(-1, 16*5*5)
Better yet, use
x = torch.flatten(x, start_dim=1)
This tutorial was written before the .flatten function was added to the library. The authors effectively just wrote their own flatten functionality which could be used regardless of the shape of x. This was probably so you had some portable code that could be used in your model without hard coding sizes. From a programming perspective it's nice to generalize such things since it means you wouldn't have to worry about changing those magic numbers if you decide to change part of the model (though this concern didn't appear to extend to the initialization).

Related

Which layer should I use when I build a Neural Network with Tensorflow 2.x?

I'm currently stuyind TensorFlow 2.0 and Keras. I know that the activation functions are used to calculate the output of each layer of a neural network, based on mathematical functions. However, when searching about layers, I can't find synthetic and easy-to-read information for a beginner in deep learning.
There's a keras documentation, but I would like to know synthetically:
what are the most common layers used to create a model (Dense, Flatten, MaxPooling2D, Dropout, ...).
In which case to use each of them? (Classification, regression, other)
what is the appropriate way to use each layer depending on each case?

Depending on the problem you want to solve, there are different activation functions and loss functions that you can use.
Regression problem: You want to predict the price of a building. You have N features. Of course, the price of the building is a real number, therefore you need to have mean_squared_error as a loss function and a linear activation for your last node. In this case, you can have a couple of Dense() layers with relu activation, while your last layer is a Dense(1,activation='linear').
In between the Dense() layers, you can add Dropout() so as to mitigate the overfitting effect(if present).
Classification problem: You want to detect whether or not someone is diabetic while taking into account several factors/features. In this case, you can use again stacked Dense() layers but your last layer will be a Dense(1,activation='sigmoid'), since you want to detect whether a patient is or not diabetic. The loss function in this case is 'binary_crossentropy'. In between the Dense() layers, you can add Dropout() so as to mitigate the overfitting effect(if present).
Image processing problems: Here you surely have stacks of [Conv2D(),MaxPool2D(),Dropout()]. MaxPooling2D is an operation which is typical for image processing and also some natural language processing(not going to expand upon here). Sometimes, in convolutional neural network architectures, the Flatten() layer is used. Its purpose is to reduce the dimensionality of the feature maps into 1D vector whose dimension is equal to the total number of elements within the entire feature map depth. For example, if you had a matrix of [28,28], flattening it would reduce it to (1,784), where 784=28*28.
Although the question is quite broad and maybe some people will vote to close it, I tried to provide you a short overview over what you asked. I recommend that your start learning the basics behind neural networks and then delve deeper into using a framework, such as TensorFlow or PyTorch.

How to create custom neural network with custom weight initialization in tensorflow or pytorch

I'm trying to create a small neural network with custom connections between neurons. The connections should exist over several layers and not be fully connected (sparse) as shown in the picture. I would also like to do the weight initialization manually and not completely randomly. My goal is to determine whether a connection is positive or negative. Is it possible to create such a neural net in tensorflow (python/js) or pytorch?

To summarize:
Can you do it? -- Yes, absolutely.
Is it going to be pretty? -- No, absolutely not.
In my explanation, I will further focus on PyTorch, as this is the library that I am more comfortable with, and that is especially more useful if you have custom operations that you can easily express in a pythonic manner. Tensorflow also has eager execution mode (more serious integration from version 2, if I remember that correctly), but it is traditionally done with computational graphs, which make this whole thing a little uglier than it needs to be.
As you hopefully know, backpropagation (the "learning" step in any ANN) is basically an inverse pass through the network, to calculate gradients, or at least close enough to the truth for our problem at hand. Importantly, torch functions store this "reverse" direction, which makes it trivial for the user to call backpropagation functions.
To model a simple network as described in your image, we have only one major disadvantage:
The available operations are usually excelling at what they are doing because they are simply and can be optimized quite heavily. In your case, you have to express different layers as custom operations, which generally scales incredibly poorly, unless you can express the functionals as some form of matrix operation, which I do not see straigt away in your example. I am further assuming that you are applying some form of non-linearity, as it would otherwise be a network that would fail for any non-linearly separable problem.
import torch
import torch.nn as nn
class CustomNetwork(nn.module):
def __init__(self):
self.h_1_1 = nn.Sequential(nn.Linear(1,2), nn.ReLU) # top node in first layer
self.h_1_2 = nn.Sequential(nn.Linear(1,2), nn.ReLU) # bottom node in first layer
# Note that these nodes have no shared weights, which is why we
# have to initialize separately.
self.h_2_1 = nn.Sequential(nn.Linear(1,1), nn.ReLU) # top node in second layer
self.h_2_2 = nn.Sequential(nn.Linear(1,1), nn.ReLU) # bottom node in second layer
self.h_2_1 = nn.Sequential(nn.Linear(2,1), nn.ReLU) # top node in third layer
self.h_2_2 = nn.Sequential(nn.Linear(2,1), nn.ReLU) # bottom node in third layer
# out doesn't require activation function due to pairing with loss function
self.out = nn.Linear(2,1)
def forward(self, x):
# x.shape: (batch_size, 2)
# first layer. shape of (batch_size, 2), respectively
out_top = self.h_1_1(x[:,0])
out_bottom = self.h_1_2(x[:,1])
# second layer. shape of (batch_size, 1), respectively
out_top_2 = self.h_2_1(out_top[:,0])
out_bottom_2 = self.h_2_2(out_bottom[:,0])
# third layer. shape of (batch_size, 1), respectively
# additional concatenation of previous outputs required.
out_top_3 = self.h_3_1(torch.cat([out_top_2, -1 * out_top[:,1]], dim=1))
out_bottom_3 = self.h_3_2(torch.cat([out_bottom_2, -1 * out_bottom[:,1]], dim=1))
return self.out(torch.cat([out_top_3, out_bottom_3], dim=1))
As you can see, any computational step is (in this case rather explicitly) given, and very much possible. Again, once you want to scale your number of neurons for each layer, you are going to have to be a little more creative in how you process, but for-loops do very much work in PyTorch as well. Note that this will in any case be much slower than a vanilla linear layer, though.
If you can live with seperately trained weights, you can always also just define separate linear layers of smaller size and put them in a more convenient fashion.

Customized convolutional layer in TensorFlow

Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.

Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)

Does convolution kernel need to be designed in CNN (Convolutional Neural Networks)?

I am new to Convolutional Neural Networks. I am reading some tutorial and testing some sample codes using Keras. To add a convolution layer, basically I just need to specify the number of kernels and the size of the kernel.
My question is what each kernel looks like? Are they generic to all computer vision applications?

My question is what each kernel looks like?
This depends on the parameters you chose for your Convolutional Layer:
It will indeed depend on the kernel_size parameter you mentioned, as it will determine the shape and size of your kernel. Say you pass this parameter as (3,3) (on a Conv2D layer naturally), you will then obtain a 3x3 Kernel Matrix.
It will depend on your kernel_initializer parameter, which determines the way that MxN Kernel Matrix is going to be filled. It's default value is "glorot_uniform", which is explained on its doc page:
Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor.
This is telling us the specific way it fills that kernel matrix. You may well select any other kernel initializer you desire to fit your needs. You may even build Custom Initializers, also exemplified in that doc page:
from keras import backend as K
def my_init(shape, dtype=None):
#or whatever you want to customize
return K.random_normal(shape, dtype=dtype)
model.add(Dense(64, kernel_initializer=my_init))
Furthermore, it will depend on your kernel_regularizer parameter, which defines regularization functions applied to the weights of your kernel. It's default value is None but you can select others from the ones available. You can again define your own custom initializers in a similar fashion:
def l1_reg(weight_matrix):
#same here, fit your own needs
return 0.01 * K.sum(K.abs(weight_matrix))
model.add(Dense(64, input_dim=64,
kernel_regularizer=l1_reg)
Are they generic to all computer vision applications?
This I think may be a bit broad, however I would venture and say yes. Keras has available many kernels that were designed to specifically adapt to Deep Learning applications; it includes those ones that are most commonly used throughout the literature and well-known applications.
The good thing is that, as illustrated before, if any of those kernels does not fit your needs you could well define your own custom initializer, or well enhance it by using regularizes. This enables you to tackle those really specific CV problems you may have.

The actual kernel values are learned during the learning process, that's why you only need to set the number of kernels and their size.
What might be confusing is that the learned kernel values actually mimic things like Gabor and edge detection filters. These are generic to many computer vision applications, but instead of being engineered manually, they are learned from a big classification dataset (like ImageNet).
Also the kernel values are part of a feature hierarchy that can be used directly as features for a variety of computer vision problems. In that terms they are also generic.

How to Switch from Keras Tensortype to numpy array for a custom layer?

So I have a custom layer, that does not have any weights.
In a fist step, I tried to implement the functions manipulating the input tensors in Kers. But I did not succeed because of many reasons. My second approach was to implement the functions with numpy operations, since the custom layer I am implementing does not have any weights, from my understanding, I would say, that I could use numpy operarations, as I don't need backpropagation, since there are no weights, right? And then, I would just convert the output of my layer to a tensor with:
Keras.backend.variable(value = output)
So the main idea is to implement a custom layer, that takes tensors, convert them to numpy arrays, operate on them with numpy operations, then convert the output to a tensor.
The problem is that I seem not to be able to use .eval() in order to convert the input tensors of my layer into numpy arrays, so that they could be manipulated with numpy operations.
Can anybody tell, how I can get around this problem ?

As mentioned by Daniel Möller in the comments, Keras needs to be able to backpropagate through your layer, in order the calculate the gradients for the previous layers. Your layer needs to be differentiable for this reason.
For the same reason you can only use Keras operations, as those can be automatically differentiated with autograd. If your layer is something simple, have a look at the Lambda layer where you can implement custom layers quickly.
As an aside, Keras backend functions should cover a lot of use cases, so if you're stuck with writing your layer through those, you might want to post a another question here.
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.