I have this function for a neural network and it's the function to calculate the next layer from a list of inputs and a list of weights. Is there any way to make this faster or more efficient?
the arguments inp is the input, weights are the weights, layerlength is the length of the next layer and rounds is just the length to round the output to.
def output(inp,weights,layerlength,rounds):
layer=[]
count=0
lappend=layer.append
for a in range(layerlength):
total=0
for b in range(len(inp)):
total+=inp[b]*weights[count]
count+=1
lappend(round(total,rounds))
return layer
In general, try not to use for loop constructs in Python. They are extremely slow. Use Matrix operations programmed with numpy instead, then the loops will run under the hood in C++ instead (50 to 100 times faster).
You can easily reformulate your above piece of code without any Python for loops by defining your layer and inp vectors and your weights matrix all as numpy.array() and then perform matrix multiplication on them.
EDIT:
I hope I am not helping you cheat on your homework here ;)
import numpy as np
# 10 dimensional input
inpt = np.arange(10)
# 20 neurons in the first (fully connected) layer
weights = np.random.rand(10, 20)
# mat_mul: to calculate the input to the non-linearity of the first layer
# you need to multiply each input dimension with all the weights assigned to a specific neuron of the first layer
# and then sum them up, and this for all the neurons in that layer
# you can do all of that in this single Matrix multiplication
layer = np.matmul(inpt, weights)
print(inpt.shape)
print()
print(weights.shape)
print()
print(layer.shape)
So I'm assuming you're computing the activations of one layer.
Make sure you use linear algebra libraries like Numpy (or Tensorflow, PyTorch, etc). These will make sure your computations run much more efficient on the CPU (or GPU). Typically using for loops give a lot of computational overhead.
For example, in numpy you can write your feedforward pass for one layer as:
output = inp.dot(weights)
inp is here your n by m input matrix, weights is your m by k weight matrix. output will then be a n by k matrix of your forward step activations.
Related
I recently started learning Python and am trying to implement my first neural network. My goal is to write a function that generates a neural net with a variable amount of layers and nodes. All necessary information for that is stored in layerStructure (e.g.: First layer has four nodes, third layer has three nodes).
import numpy as np
#Vector of input layer
input = np.array([1,2,3,4])
#Amount of nodes in each layer
layerStructure = np.array([len(input),2,3])
#Generating empty weight matrix container
weightMatrix_arr = np.array([])
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
print(randmatrix)
The code above generates the following output:
[[0.6067148 0.66445212 0.54061231 0.19334004]
[0.22385007 0.8391435 0.73625366 0.86343394]]
[[0.61794333 0.9114799 ]
[0.10626486 0.95307027]
[0.50567023 0.57246852]]
My first attempt was to store each random weight matrix in a container array called weightMatrix_arr. However, since the shape of individual matrices varies, I cannot use np.append() to store them all in the matrix container. How can I save these matrices in order to access them during the backpropagation?
You can use a list instead of an np.array:
#Generating empty weight LIST container
weightMatrixes = []
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
weightMatrixes.append(randmatrix)
print(randmatrix)
Otherwise you can set the weightMatrix_arr dtype to object:
:
#Generating empty weight LIST container
weightMatrixes = np.array([], dtype=object)
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
weightMatrixes = np.append(weightMatrixes, randmatrix)
Note both ways you can't access the inner layer indexes without accessing the layer matrix:
weightMatrixes[layer, 0, 3] # ERROR
weightMatrixes[layer][0, 3] # OK
If memory consumption is not a problem, you can shape all layers as a longest one, and just ignore extra cells according to a layerStructure value.
I used a python dictionary to store the weights for each hidden layer with layer number as a key to the dictionary,
so that while retrieval is easy to access the weights I,e simple and clean use the dictionary to store the model weights,
its doesn't matter the shape of weights. below is a snippet of code.
"""def generate_weights(layers):
Weights={}
for i in range(1,len(layers)):
w0=2*np.random.random((layers[i-1],layers[i]))-1
Weights[i-1] = w0
return Weights
generate_weights([3,4,2])"""
I wrote a very basic tensorflow model where I want to predict a number:
import tensorflow as tf
import numpy as np
def HW_numbers(x):
y = (2 * x) + 1
return y
x = np.array([1.0,2.0,3.0,4.0,5.0,6.0,7.0], dtype=float)
y = np.array(HW_numbers(x))
model = tf.keras.models.Sequential([tf.keras.layers.Dense(units=1,input_shape=[1])])
model.compile(optimizer='sgd',loss='mean_squared_error')
model.fit(x,y,epochs = 30)
print(model.predict([10.0]))
This above code works fine. But if I add an activation function in Dense layer, the prediction becomes weird. I have tried 'relu','sigmoid','tanh' etc.
My question is, why is that? What exactly is activation function doing in that single layer that messes up the prediction?
I have used Tensorflow 2.0
Currently, you are learning a linear function. As it can be described by a single neuron, you just need a single neuron to learn the function. On the other hand activation function is:
to learn and make sense of something really complicated and Non-linear complex functional mappings between the inputs and response variable. It introduces non-linear properties to our Network. Their main purpose is to convert an input signal of a node in an A-NN to an output signal. That output signal now is used as an input in the next layer in the stack.
Hence, as you have just a single neuron here (a specific case), you do not need to pass the value to the next layer. In other words, all hidden, input, and output layers are merged together. Hence, the activation function is not helpful for your case. Unless you want to make a decision base on the output of the neuron.
Your network consists of just one neuron. So what it does with with no activation function is to multiply your input with the neurons weight. This weight will eventually converge to something around 2.1.
But with relu as an activation function, only positive numbers are propagated through your network. So if your neuron's weight is initialized with a negative number, you will always get zero as an output. So with relu, you have a 50:50 chance to get good results.
With the activation functions tanh and sigmoid, the output of the neuron is limited to [-1,1] and [0, 1] respectively, so your output can't be more than one.
So for such a small neuronal network, these activation functions don't match the problem.
I am using Pytorch to implement a neural network that has (say) 5 inputs and 2 outputs
class myNetwork(nn.Module):
def __init__(self):
super(myNetwork,self).__init__()
self.layer1 = nn.Linear(5,32)
self.layer2 = nn.Linear(32,2)
def forward(self,x):
x = torch.relu(self.layer1(x))
x = self.layer2(x)
return x
Obviously, I can feed this an (N x 5) Tensor and get an (N x 2) result,
net = myNetwork()
nbatch = 100
inp = torch.rand([nbatch,5])
inp.requires_grad = True
out = net(inp)
I would now like to compute the derivatives of the NN output with respect to one element of the input vector (let's say the 5th element), for each example in the batch. I know I can calculate the derivatives of one element of the output with respect to all inputs using torch.autograd.grad, and I could use this as follows:
deriv = torch.zeros([nbatch,2])
for i in range(nbatch):
for j in range(2):
deriv[i,j] = torch.autograd.grad(out[i,j],inp,retain_graph=True)[0][i,4]
However, this seems very inefficient: it calculates the gradient of out[i,j] with respect to every single element in the batch, and then discards all except one. Is there a better way to do this?
By virtue of backpropagation, if you did only compute the gradient w.r.t a single input, the computational savings wouldn't necessarily amount to much, you would only save some in the first layer, all layers afterwards need to be backpropagated either way.
So this may not be the optimal way, but it doesn't actually create much overhead, especially if your network has many layers.
By the way, is there a reason that you need to loop over nbatch? If you wanted the gradient of each element of a batch w.r.t a parameter, I could understand that, because pytorch will lump them together, but you seem to be solely interested in the input...
I would like to make a PyTorch model that takes the outer product of the input with itself and then does a linear regression on that. As an example, consider the input vector [1,2,3], then I would like to compute w and b to optimize [1*1, 1*2, 1*3, 2*1, 2*2, 2*3, 3*1, 3*2, 3*3] # w + b.
For a batch input with r rows and c columns, I can do this in PyTorch with
(input.reshape(r,c,1) # input.reshape(r,1,c)).reshape(r,c**2) # weigts + b
My problem is that it is extraordinarily slow. Like a factor 1000 times slower and more memory consumptious than Adding a fully connected c*c RELU layer, even though it has the same number of weights.
My question is why this happens?
Is reshape a very expensive operation for PyTorch? Could I reformulate it in a different way that would make things more efficient?
Another equivalent formulation I know is torch.diag(input # weights # input.T) + b, but now we are computing way more values than we need (r*r) just to throw them away again.
When you have to reshape a tensor during the training loop of a model it's always best to use view instead of reshape. There doesn't appear to be any performance overhead with a view, but it does require that the tensor data is contiguous.
If your tensors at the beginning aren't contiguous you can recopy the tensor and make it contiguous.
It turns out that PyTorch has torch.bilinear, which is backed up by CUDA and does exactly what I need. That's neat and very fast. It still leaves the case of higher-order tensorings. I don't see any torch.trilinear and so forth, but for now it's great.
I want to use the weights and biases from a Keras ML model to make a mathematical prediction function in another program that does not have Keras installed (and cannot).
I have a simple, MLP model that I'm using to fit data. I'm running in Python with Keras and a TensorFlow backend; for now, I'm using an input layer, 1 hidden layer, and 1 output layer. All layers are RELU, my optimizer is adam, and the loss function is mean_squared_error.
From what I understand, the weights I get for the layers should be used mathematically in the form:
(SUM (w*i)) + b
Where the sum is over all weights and inputs, and b is for the bias on the neuron. For example, let's say I have an input layer of shape (33, 64). There are 33 inputs with 64 neurons. I'll have a vector input of dim 33 and a vector output of dim 64. This would make each SUM 33 terms * 33 weights, and the output would be all of the 64 SUMs plus the 64 biases (respectively).
The next layer, in my case it's 32 neurons, will do the same but with 64 inputs and 32 outputs. The output layer I have goes to a single value, so input 32 and output 1.
I have written code to try to mimic the model. Here is a snippet for making a single prediction:
def modelR(weights, biases, data):
# This is the input layer.
y = []
for i in range(len(weights[0][0])):
x = np.zeros(len(weights[0][0]))
for j in range(len(data)):
x[i] += weights[0][j][i]*data[j]
y.append(x[i]+biases[0][i])
# This is the hidden layer.
z = []
for i in range(len(weights[1][0])):
x = np.zeros(len(weights[1][0]))
for j in range(len(y)):
x[i] += weights[1][j][i]*y[j]
z.append(x[i]+biases[1][i])
# This is the output layer.
p = 0.0
for i in range(len(z)):
p += weights[-1][i][0]*z[i]
p = p+biases[-1][0]
return p
To be clear, "weights" and "biases are derived via:
weights = []
biases = []
for i in range(len(model.layers)):
weights.append(model.layers[i].get_weights()[0])
biases.append(model.layers[i].get_weights()[1])
weights = np.asarray(weights)
biases = np.asarray(biases)
So the first weight on the first neuron for the first input is weight[0][0][0], the first weight on the first input for the second neuron is weight[0][1][0], etc. I could be wrong on this, which may be where I'm getting stuck. But this makes sense as we're going from (1 x 33) vector to a (1 x 64) vector, so we ought to have a (33 x 64) matrix.
Any ideas of where I'm going wrong? Thanks!
EDIT: ANSWER FOUND
I'm marking jhso's answer as correct, even though it didn't work properly in my code as such (I'm probably missing an import statement somewhere). The key was the activation function. I was using RELU, so I shouldn't have been passing along any negative values. Also, jhso shows a nice way to not use loops but to simply do the matrix multiplication (which I didn't know Python did). Now I just have to figure out how to do it in c++!
I think it's good to familiarise yourself with linear algebra when working with machine learning. When we have an equation of the form sum(matrix elem times another matrix elem) it's often a simple matrix multiplication of the form matrix1 * matrix2.T. This simplifies your code quite a bit:
def modelR(weights, biases, data):
# This is the input layer.
y = np.matmul(data,weights[0])+biases[0][None,:]
y_act = relu(y) #also dropout or any other function you use here
z = np.matmul(y_act,weights[1])+biases[1][None,:]
z_act = relu(z) #also dropout and any other function you use here
p = np.matmul(z_act,weights[2])+biases[2][None,:]
p_act = sigmoid(p)
return p_act
I made a guess at which activation function you use. I'm also unsure of how your data is structured, just make sure that the features/weights are always the inner dimension of the multiplication, ie. if your input is (Bx10) and your weights are (10x64) then input*weights is good enough and will produce an output of shape (Bx64).