PyTorch RNN has RNN.weight_ih, which is a weight between input and hidden layer, and RNN.weight_hh, which is a weight between hidden and hidden. Why is there no weight between hidden and output?
When I was learning about RNNs, I learned that there are 3 weights.
There’s no weight there because the PyTorch RNN doesn’t prescribe how to create the output from the hidden state. When you apply the RNN to a sequence, it returns the sequence of hidden states.
You can decide what to do with these: maybe a linear transformation is the right way to get the output (as you learned it). Maybe you don’t need the outputs, except for the final one. In that case, you can save O(T) computations by only computing the final output, manually.
Related
I would need to do a kind of custom backpropagation so that, in an arbitrary layer of the network I can decide if actually modify the weights going outside that layer, or make them unchanged.
For example: I would like to study what happens if, during training, I force to not update some weight connecting input layer to 1st layer.
Is there a simple way to just "correct" the normal backpropagation intersecting between the layers ?
Thanks
I am creating a CNN to predict the distributed strain applied to an optical fiber from the measured light spectrum (2D), which is ideally a Lorentzian curve. The label is a 1D array where only the strained section is non-zero (the label looks like a square wave).
My CNN has 10 alternating convolution and pooling layers, all activated by RelU. This is then followed by 3 fully-connected hidden layers with softmax activation, then an output layer activated by RelU. Usually, CNNs and other neural networks make use of RelU for hidden layers and then softmax for output layer (in the case of classification problems). But in this case, I use softmax to first determine the positions of optical fiber for which strain is applied (i.e. non-zero) and then use RelU in the output for regression. My CNN is able to predict the labels rather accurately, but I cannot find any supporting publication where softmax is used in hidden layers, followed by RelU in the output layer; nor why this approach is conversely not recommended (i.e. not mathematically possible) other than those I found in Quora/Stackoverflow. I would really appreciate if anyone could enlighten me on this matter as I am pretty new to deep learning, and wish to learn from this. Thank you in advance!
If you look at the way a layer l sees the input from a previous layer l-1, it is assuming that the dimensions of the feature vector are linearly independent.
If the model is building some kind of confidence using a set of neurons, then the neurons better be linearly independent otherwise it is simply exaggerating the value of 1 neuron.
If you apply softmax in hidden layers then you are essentially combining multiple neurons and tampering with their independence. Also if you look at the reasons why ReLU is preferred is because it can give you a better gradients which other activations like sigmoid won't. Also if your goal is too add normalization to your layers, you’d be better off with using an explicit batch normalization layer
Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
I need to implement neurons freezing in CNN for a deep learning research,
I tried to find any function in the Tensorflow docs, but I didn't find anything.
How can I freeze specific neuron when I implemented the layers with tf.nn.conv2d?
A neuron in a dense neural network layer simply corresponds to a column in a weight matrix. You could therefore redefine your weight matrix as a concatenation of 2 parts/variables, one trainable and one not. Then you could either:
selectively pass only the trainable part in the var_list argument of the minimize function of your optimizer, or
Use tf.stop_gradient on the vector/column corresponding to the neuron you want to freeze.
The same concept could be used for convolutional layers, although in this case the definition of a "neuron" becomes unclear; still, you could freeze any column(s) of a convolutional kernel.
As clarified in the comments, you want to freeze Neurons in a tf.nn.conv2d convolution. While there is direct way of doing this in Tensorflow (as per my search), you could try slicing the Tensor and applying tf.stop_gradient() to it. Here is a stackoverflow answer to give you an intuition on how to use tf.stop_gradient()
I haven't tested it, but according to the docs I think it should work.
I am new to Keras. How can I print the outputs of a layer, both intermediate or final, during the training phase?
I am trying to debug my neural network and wanted to know how the layers behave during training. To do so I am trying to exact input and output of a layer during training, for every step.
The FAQ (https://keras.io/getting-started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer) has a method to extract output of intermediate layer for building another model but that is not what I want. I don't need to use the intermediate layer output as input to other layer, I just need to print their values out and perhaps graph/chart/visualize it.
I am using Keras 2.1.4
I think I have found an answer myself, although not strictly accomplished by Keras.
Basically, to access layer output during training, one needs to modify the computation graph by adding a print node.
A more detailed description can be found in this StackOverflow question:
How can I print the intermediate variables in the loss function in TensorFlow and Keras?
I will quote an example here, say you would like to have your loss get printed per step, you need to set your custom loss function as:
for Theano backend:
diff = y_pred - y_true
diff = theano.printing.Print('shape of diff', attrs=['shape'])(diff)
return K.square(diff)
for Tensorflow backend:
diff = y_pred - y_true
diff = tf.Print(diff, [tf.shape(diff)])
return K.square(diff)
Outputs of other layers can be accessed similarly.
There is also a nice vice tutorial about using tf.Print() from Google
Using tf.Print() in TensorFlow
If you want to know more info on each neuron, you need to use the following to get their bias and weights.
weights = model.layers[0].get_weights()[0]
biases = model.layers[0].get_weights()[1]
0 index defines weights and 1 defines the bias.
You can also get per layer too,
for layer in model.layers:
weights = layer.get_weights() # list of numpy arrays
After each training, if you can access each layer with its dimension and obtain the weights and bias to a numpy array, you should be able to visualize how the neuron after each training.
Hope it helps.