I am currently doing a mood classification project by supervised learning, using tensor flow.
And in machine learning theory, as you know, there is always an x0 which is +1. When making a placeholder for input dataset, is the function automatically produce a x0 part? or should I designate it manually?
Thanks
There are two ways of thinking about x0. Either your input has an extra dimension, which always has 1 in it, and then a linear regression or a fully connected layer in a neural network will be represented as:
out = W * in
where * is matrix-vector multiplication, or, which is more common, to not add that extra dimension, and instead model it as
out = W * in + b
This is, in part, to highlight the difference between W, which is how we "weight" the input, and b, which is how much we "shift" it (b is called a "bias" term). One other reason why this representation is more desirable is because it is common to regularize W, but not b.
Now, back to your question, TensorFlow neural network library models fully connected layer in terms of a weight matrix and a bias vector, therefore you do not need to add an extra one to your input vector.
If you use low-level Tensor operations instead of the high-level predefined layers, then TensorFlow makes no assumptions about your input, and if you want to model your model in terms of operations on a vector with an extra 1 in it, it is your responsibility to add that 1 to that vector, TensorFlow will not do that for you.
Related
I'm trying to create a small neural network with custom connections between neurons. The connections should exist over several layers and not be fully connected (sparse) as shown in the picture. I would also like to do the weight initialization manually and not completely randomly. My goal is to determine whether a connection is positive or negative. Is it possible to create such a neural net in tensorflow (python/js) or pytorch?
To summarize:
Can you do it? -- Yes, absolutely.
Is it going to be pretty? -- No, absolutely not.
In my explanation, I will further focus on PyTorch, as this is the library that I am more comfortable with, and that is especially more useful if you have custom operations that you can easily express in a pythonic manner. Tensorflow also has eager execution mode (more serious integration from version 2, if I remember that correctly), but it is traditionally done with computational graphs, which make this whole thing a little uglier than it needs to be.
As you hopefully know, backpropagation (the "learning" step in any ANN) is basically an inverse pass through the network, to calculate gradients, or at least close enough to the truth for our problem at hand. Importantly, torch functions store this "reverse" direction, which makes it trivial for the user to call backpropagation functions.
To model a simple network as described in your image, we have only one major disadvantage:
The available operations are usually excelling at what they are doing because they are simply and can be optimized quite heavily. In your case, you have to express different layers as custom operations, which generally scales incredibly poorly, unless you can express the functionals as some form of matrix operation, which I do not see straigt away in your example. I am further assuming that you are applying some form of non-linearity, as it would otherwise be a network that would fail for any non-linearly separable problem.
import torch
import torch.nn as nn
class CustomNetwork(nn.module):
def __init__(self):
self.h_1_1 = nn.Sequential(nn.Linear(1,2), nn.ReLU) # top node in first layer
self.h_1_2 = nn.Sequential(nn.Linear(1,2), nn.ReLU) # bottom node in first layer
# Note that these nodes have no shared weights, which is why we
# have to initialize separately.
self.h_2_1 = nn.Sequential(nn.Linear(1,1), nn.ReLU) # top node in second layer
self.h_2_2 = nn.Sequential(nn.Linear(1,1), nn.ReLU) # bottom node in second layer
self.h_2_1 = nn.Sequential(nn.Linear(2,1), nn.ReLU) # top node in third layer
self.h_2_2 = nn.Sequential(nn.Linear(2,1), nn.ReLU) # bottom node in third layer
# out doesn't require activation function due to pairing with loss function
self.out = nn.Linear(2,1)
def forward(self, x):
# x.shape: (batch_size, 2)
# first layer. shape of (batch_size, 2), respectively
out_top = self.h_1_1(x[:,0])
out_bottom = self.h_1_2(x[:,1])
# second layer. shape of (batch_size, 1), respectively
out_top_2 = self.h_2_1(out_top[:,0])
out_bottom_2 = self.h_2_2(out_bottom[:,0])
# third layer. shape of (batch_size, 1), respectively
# additional concatenation of previous outputs required.
out_top_3 = self.h_3_1(torch.cat([out_top_2, -1 * out_top[:,1]], dim=1))
out_bottom_3 = self.h_3_2(torch.cat([out_bottom_2, -1 * out_bottom[:,1]], dim=1))
return self.out(torch.cat([out_top_3, out_bottom_3], dim=1))
As you can see, any computational step is (in this case rather explicitly) given, and very much possible. Again, once you want to scale your number of neurons for each layer, you are going to have to be a little more creative in how you process, but for-loops do very much work in PyTorch as well. Note that this will in any case be much slower than a vanilla linear layer, though.
If you can live with seperately trained weights, you can always also just define separate linear layers of smaller size and put them in a more convenient fashion.
I am building a vanilla neural network from scratch using NumPy and trialling the model performance for different activation functions. I am especially keen to see how the 'Maxout' activation function would effect my model performance.
After doing some search, I was not able to find an implementation in NumPy except for their definition (https://ibb.co/kXCpjKc). The formula for forward propagation is clear where I would take the max(Z) (where Z = w.T * x + b). But, their derivative that I will be using in backpropogation is not clear to me.
What does j = argmax(z) mean in this context? How do I implement it in NumPy?
Any help would be much appreciated! Thank you!
Changing any of the non maximum values slightly does not affect the output, so their gradient is zero. The gradient is passed from the next layer to only the neuron that achieved the max (gradient = 1 in the link you provided). See this stackoverflow answer: https://datascience.stackexchange.com/a/11703.
In a neural network setting you would need the gradient with respect to every of the x_i, so you would need the full derivative. In the link you provided you can see there is only a partial derivative defined. The partial derivative is a vector (of almost all zeros and 1 where the neuron is maximum), so the full gradient will become a matrix.
You can implement this in numpy using np.argmax.
Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
I do self-studying in Udacity PyTorch
Regarding to the last paragraph
Learning
In the code you've been working with, you've been setting the values of filter weights explicitly, but neural networks will actually learn the best filter weights as they train on a set of image data. You'll learn all about this type of neural network later in this section, but know that high-pass and low-pass filters are what define the behavior of a network like this, and you know how to code those from scratch!
In practice, you'll also find that many neural networks learn to detect the edges of images because the edges of object contain valuable information about the shape of an object.
I have studied all through the last 44th sections. But I couldn't be able to answer the following questions
What is the initialized weight when I do torch.nn.Conv2d? And how to define it myself?
How does PyTorch update weights in the convolutional layer?
When you declared nn.Conv2d the weights are initialized via this code.
In particular, if you give bias it uses initialization as proposed by Kaiming et.al. It initializes as uniform distribution between (-bound, bound) where bound=\sqrt{6/((1+a^2)fan_in)} (See here).
You can initialize weight manually too. This has been answered elsewhere (See here) and I won't repeat it.
When you call optimizer.step and optimizer has parameters of convolutional filter registered they are updated.
1.In PyTorch, Conv2d is designed to accept 4D Tensor of shape (N, C, H, W) as an input for forward pass, where N is the number of samples in mini-batch, C is the number of input channels (for example 3 color channel of an image), H and W are height and width of an image.
Your weights should reflect that and be 4D Tensor of shape (F, C, K_H, K_W) where F would be the number of different kernels you would like to have in this layer, C is the number of input channels, K_H and K_W are height and width of kernels. Exact values of initialization can be computed using formula in PyTorch docs, nn.Conv2d definition.
Here is a great figure which will help to visualize computation.
Cross-correlation computation with 2 input channels. Ref. http://www.d2l.ai/chapter_convolutional-neural-networks/channels.html, Fig. 6.4.1
2.Weights are updated using back propagation algorithm by calculating gradients. It is executed under the hood in PyTorch. If you are initializing weights yourself, you should add requires_grad=True for the weight tensor to specifically say that this tensor should be updated by back propagation.
I've been given a fully trained model by another researcher that has inputs as placeholders. Regarding it as a function f(x), I would like to find x to minimize my distance metric (loss function) dist(x, f(x)). This could be something like the euclidean distance between the two points.
I tried to use TensorFlow's built-in optimizer functions. The issue is that tf.train.AdamOptimizer(1e-4).minimize(loss, var_list[input_placeholder]) fails, complaining that input_placeholder isn't of a supported type. Thus, I cannot get gradients for my input.
How can I optimize a function in TensorFlow when the inputs have to be specified in this way? Unfortunately, these placeholders are not passed through a Variable first, and I have to treat that model as a black box.
Using the Keras functional API detailed in this question, I created a dense layer with no bias to sit right before the model I was given. Holding its input as a constant all 1's vector, I optimized the joined model using only the Variable in the dense layer, giving me the optimal vector as the output of that layer.
All TensorFlow Optimizer subclasses allow you to minimize while only modifying a particular set of Variables, which I got out of Keras fairly simply.