Simple neural network - how to store weights? - python

I recently started learning Python and am trying to implement my first neural network. My goal is to write a function that generates a neural net with a variable amount of layers and nodes. All necessary information for that is stored in layerStructure (e.g.: First layer has four nodes, third layer has three nodes).
import numpy as np
#Vector of input layer
input = np.array([1,2,3,4])
#Amount of nodes in each layer
layerStructure = np.array([len(input),2,3])
#Generating empty weight matrix container
weightMatrix_arr = np.array([])
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
print(randmatrix)
The code above generates the following output:
[[0.6067148 0.66445212 0.54061231 0.19334004]
[0.22385007 0.8391435 0.73625366 0.86343394]]
[[0.61794333 0.9114799 ]
[0.10626486 0.95307027]
[0.50567023 0.57246852]]
My first attempt was to store each random weight matrix in a container array called weightMatrix_arr. However, since the shape of individual matrices varies, I cannot use np.append() to store them all in the matrix container. How can I save these matrices in order to access them during the backpropagation?

You can use a list instead of an np.array:
#Generating empty weight LIST container
weightMatrixes = []
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
weightMatrixes.append(randmatrix)
print(randmatrix)
Otherwise you can set the weightMatrix_arr dtype to object:
:
#Generating empty weight LIST container
weightMatrixes = np.array([], dtype=object)
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
weightMatrixes = np.append(weightMatrixes, randmatrix)
Note both ways you can't access the inner layer indexes without accessing the layer matrix:
weightMatrixes[layer, 0, 3] # ERROR
weightMatrixes[layer][0, 3] # OK

If memory consumption is not a problem, you can shape all layers as a longest one, and just ignore extra cells according to a layerStructure value.

I used a python dictionary to store the weights for each hidden layer with layer number as a key to the dictionary,
so that while retrieval is easy to access the weights I,e simple and clean use the dictionary to store the model weights,
its doesn't matter the shape of weights. below is a snippet of code.
"""def generate_weights(layers):
Weights={}
for i in range(1,len(layers)):
w0=2*np.random.random((layers[i-1],layers[i]))-1
Weights[i-1] = w0
return Weights
generate_weights([3,4,2])"""

Related

Train on transformed output

I have a recurrent neural network model that maps a (N,) sequence to a (N,3) length sequence. My target outputs are actually (N,N) matrices. However, I have a deterministic function implemented in numpy that converts (N,3) into these (N,N) matrices in a particular way that I want. How can I use this operation in training? I.e. currently my neural network is giving out (N,3) sequences, how do I perform my function to convert it to (N,N) on these before calling keras.fit?
Edit: I should also note that it is much harder to do the reverse function from (N,N) to (N,3) so it's not a viable option to just convert my target outputs to the (N,3) output representations.
You can use a Lambda layer as the last layer of your model:
def convert_to_n_times_n(x):
# transform x from shape (N, 3) to (N, N)
transformation_layer = tf.keras.layers.Lambda(convert_to_n_times_n)
You probably want to use "tf-native methods" within your function as much as possible to avoid unnecessary conversions of tensors to numpy arrays and back.
If you only want to use the layer during training, but not during inference, you can achieve that using the functional API:
# create your original model (N,) -> (N, 3)
input_ = Input(shape=(N,))
x = SomeFancyLayer(...)(input_)
x = ...
...
inference_output = OtherFancyLayer(...)(x)
inference_model = Model(inputs=input_, outputs=inference_output)
# create & fit the training model
training_output = transformation_layer(inference_output)
training_model = Model(inputs=input_, outputs=training_output)
training_model.compile(...)
training_model.fit(X, Y)
# run inference using your original model
inference_model.predict(...)

how to make neural network function faster?

I have this function for a neural network and it's the function to calculate the next layer from a list of inputs and a list of weights. Is there any way to make this faster or more efficient?
the arguments inp is the input, weights are the weights, layerlength is the length of the next layer and rounds is just the length to round the output to.
def output(inp,weights,layerlength,rounds):
layer=[]
count=0
lappend=layer.append
for a in range(layerlength):
total=0
for b in range(len(inp)):
total+=inp[b]*weights[count]
count+=1
lappend(round(total,rounds))
return layer
In general, try not to use for loop constructs in Python. They are extremely slow. Use Matrix operations programmed with numpy instead, then the loops will run under the hood in C++ instead (50 to 100 times faster).
You can easily reformulate your above piece of code without any Python for loops by defining your layer and inp vectors and your weights matrix all as numpy.array() and then perform matrix multiplication on them.
EDIT:
I hope I am not helping you cheat on your homework here ;)
import numpy as np
# 10 dimensional input
inpt = np.arange(10)
# 20 neurons in the first (fully connected) layer
weights = np.random.rand(10, 20)
# mat_mul: to calculate the input to the non-linearity of the first layer
# you need to multiply each input dimension with all the weights assigned to a specific neuron of the first layer
# and then sum them up, and this for all the neurons in that layer
# you can do all of that in this single Matrix multiplication
layer = np.matmul(inpt, weights)
print(inpt.shape)
print()
print(weights.shape)
print()
print(layer.shape)
So I'm assuming you're computing the activations of one layer.
Make sure you use linear algebra libraries like Numpy (or Tensorflow, PyTorch, etc). These will make sure your computations run much more efficient on the CPU (or GPU). Typically using for loops give a lot of computational overhead.
For example, in numpy you can write your feedforward pass for one layer as:
output = inp.dot(weights)
inp is here your n by m input matrix, weights is your m by k weight matrix. output will then be a n by k matrix of your forward step activations.

Set bias in CNN

I have two massive numpy arrays of weights and biases for a CNN. I can set weights for each layer (using set_weights) but I don't see a way to set the bias for each layer. How do I do this?
You do this by using layer.set_weights(weights). From the documentation:
weights: a list of Numpy arrays. The number
of arrays and their shape must match
number of the dimensions of the weights
of the layer (i.e. it should match the
output of `get_weights`).
You don't just put the weights for the filter in there but for each parameter the layer has. The order in which you have to put in the weights depends on layer.weights. You may look at the code or print the names of the weights of the layer by doing something like
print([p.name for p in layer.weights])

Confusion with weights dumping from neural net in keras

I created a simple 2-layer network, one hidden layer. I am dumping the weights from the middle layer to visualize what the hidden neurons are learning.
I am using
weights = model.layers[0].get_weights()
When I look at the weights structure I get:
So len(weights) = 2, len(weights[0]) = 500, len(weights[1]) = 100.
I want to create an array m of size (500,100), so that m.shape = (500,100).
I tried numpy.reshape(weights, 500, 100), zip(weights[0], weights[1]), then, by chance, I wrote numpy.array(weights[0]) and this came back with shape (500,100).
Can someone explain why?
The Keras tensors work differently, they are n-dimensional lists. To illustrate the concept consider the list:
>>> list=[[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]],[1,2,3]]
Here, the first element in list contains n-length elements and second list can also be an n-length elements. When you do:
>>> len(list)
Output is:
2( which is 2 in your case)
Also,
>>> len(list[0])
5(which is 500 in your case)
>>> len(list[1])
3(which is 100 in your case)
But when you try to convert to array:
>>> np.array(list[0]).shape
The answer is:
(5, 3) (which is 500,100 in your case)
This is because you are having an n-length list element inside your list[0] (which is weights[0] in your case). So when I asked you to return
len(weights[0][0])
it returned:
100
because it contains 100 length elements in that list and 500 such elements in it. Now, if you are wondering what does each 100 values mean, so they are corressponding weights of the connections i.e.
weights[0][0] = weights between first input to all 100 hidden neurons

Compute updates in Theano after N number of loss calculations

I've constructed a LSTM recurrent NNet using lasagne that is loosely based on the architecture in this blog post. My input is a text file that has around 1,000,000 sentences and a vocabulary of 2,000 word tokens. Normally, when I construct networks for image recognition my input layer will look something like the following:
l_in = nn.layers.InputLayer((32, 3, 128, 128))
(where the dimensions are batch size, channel, height and width) which is convenient because all the images are the same size so I can process them in batches. Since each instance in my LSTM network has a varying sentence length, I have an input layer that looks like the following:
l_in = nn.layers.InputLayer((None, None, 2000))
As described in above referenced blog post,
Masks:
Because not all sequences in each minibatch will always have the same length, all recurrent layers in
lasagne
accept a separate mask input which has shape
(batch_size, n_time_steps)
, which is populated such that
mask[i, j] = 1
when
j <= (length of sequence i)
and
mask[i, j] = 0
when
j > (length
of sequence i)
.
When no mask is provided, it is assumed that all sequences in the minibatch are of length
n_time_steps.
My question is: Is there a way to process this type of network in mini-batches without using a mask?
Here is a simplified version if my network.
# -*- coding: utf-8 -*-
import theano
import theano.tensor as T
import lasagne as nn
softmax = nn.nonlinearities.softmax
def build_model():
l_in = nn.layers.InputLayer((None, None, 2000))
lstm = nn.layers.LSTMLayer(l_in, 4096, grad_clipping=5)
rs = nn.layers.SliceLayer(lstm, 0, 0)
dense = nn.layers.DenseLayer(rs, num_units=2000, nonlinearity=softmax)
return l_in, dense
model = build_model()
l_in, l_out = model
all_params = nn.layers.get_all_params(l_out)
target_var = T.ivector("target_output")
output = nn.layers.get_output(l_out)
loss = T.nnet.categorical_crossentropy(output, target_var).sum()
updates = nn.updates.adagrad(loss, all_params, 0.005)
train = theano.function([l_in.input_var, target_var], cost, updates=updates)
From there I have generator that spits out (X, y) pairs and I am computing train(X, y) and updating the gradient with each iteration. What I want to do is do an N number of training steps and then update the parameters with the average gradient.
To do this, I tried creating a compute_gradient function:
gradient = theano.grad(loss, all_params)
compute_gradient = theano.function(
[l_in.input_var, target_var],
output=gradient
)
and then looping over several training instances to create a "batch" and collect the gradient calculations to a list:
grads = []
for _ in xrange(1024):
X, y = train_gen.next() # generator for producing training data
grads.append(compute_gradient(X, y))
this produces a list of lists
>>> grads
[[<CudaNdarray at 0x7f83b5ff6d70>,
<CudaNdarray at 0x7f83b5ff69f0>,
<CudaNdarray at 0x7f83b5ff6270>,
<CudaNdarray at 0x7f83b5fc05f0>],
[<CudaNdarray at 0x7f83b5ff66f0>,
<CudaNdarray at 0x7f83b5ff6730>,
<CudaNdarray at 0x7f83b5ff6b70>,
<CudaNdarray at 0x7f83b5ff64f0>] ...
From here I would need to take the mean of the gradient at each layer, and then update the model parameters. This is possible to do in pieces like this does does the gradient calc/parameter update need to happen all in one theano function?
Thanks.
NOTE: this is a solution, but by no means do i have enough experience to verify its best and the code is just a sloppy example
You need 2 theano functions. The first being the grad one you seem to have already judging from the information provided in your question.
So after computing the batched gradients you want to immediately feed them as an input argument back into another theano function dedicated to updating the shared variables. For this you need to specify the expected batch size at the compile time of your neural network. so you could do something like this: (for simplicity i will assume you have a global list variable where all your params are stored)
params #list of params you wish to update
BATCH_SIZE = 1024 #size of the expected training batch
G = [T.matrix() for i in range(BATCH_SIZE) for param in params] #placeholder for grads result flattened so they can be fed into a theano function
updates = [G[i] for i in range(len(params))] #starting with list of param updates from first batch
for i in range(len(params)): #summing the gradients for each individual param
for j in range(1, len(G)/len(params)):
updates[i] += G[i*BATCH_SIZE + j]
for i in range(len(params)): #making a list of tuples for theano.function updates argument
updates[i] = (params[i], updates[i]/BATCH_SIZE)
update = theano.function([G], 0, updates=updates)
Like this theano will be taking the mean of the gradients and updating the params as usual
dont know if you need to flatten the inputs as I did, but probably
EDIT: gathering from how you edited your question it seems important that the batch size can vary in that case you could add 2 theano functions to your existing one:
the first theano function takes a batch of size 2 of your params and returns the sum. you could apply this theano function using python's reduce() and get the sum of the over the whole batch of gradients
the second theano function takes those summed param gradients and a scaler (the batch size) as input and hence is able to update the NN params over the mean of the summed gradients.

Categories