Confusion with weights dumping from neural net in keras

Confusion with weights dumping from neural net in keras - python

I created a simple 2-layer network, one hidden layer. I am dumping the weights from the middle layer to visualize what the hidden neurons are learning.
I am using
weights = model.layers[0].get_weights()
When I look at the weights structure I get:
So len(weights) = 2, len(weights[0]) = 500, len(weights[1]) = 100.
I want to create an array m of size (500,100), so that m.shape = (500,100).
I tried numpy.reshape(weights, 500, 100), zip(weights[0], weights[1]), then, by chance, I wrote numpy.array(weights[0]) and this came back with shape (500,100).
Can someone explain why?

The Keras tensors work differently, they are n-dimensional lists. To illustrate the concept consider the list:
>>> list=[[[1,2,3],[1,2,3],[1,2,3],[1,2,3],[1,2,3]],[1,2,3]]
Here, the first element in list contains n-length elements and second list can also be an n-length elements. When you do:
>>> len(list)
Output is:
2( which is 2 in your case)
Also,
>>> len(list[0])
5(which is 500 in your case)
>>> len(list[1])
3(which is 100 in your case)
But when you try to convert to array:
>>> np.array(list[0]).shape
The answer is:
(5, 3) (which is 500,100 in your case)
This is because you are having an n-length list element inside your list[0] (which is weights[0] in your case). So when I asked you to return
len(weights[0][0])
it returned:
100
because it contains 100 length elements in that list and 500 such elements in it. Now, if you are wondering what does each 100 values mean, so they are corressponding weights of the connections i.e.
weights[0][0] = weights between first input to all 100 hidden neurons

Related

How can I more efficiently multiply every element in a batch of tensors with every other batch element, except for itself?

So, I have this code that multiplies every element in a batch of tensors with every other element, except for itself. The code works, but it becomes painfully slow with larger batch sizes (Ideally I want to be able to use it with batch sizes of up to 1000 or more, but even a couple hundred is okay). It basically freezes when using the PyTorch autograd system and large batch sizes (like 50 or greater).
I need help making the code faster and more efficient, while still getting the same output. Any help would be appreciated!
import torch
tensor = torch.randn(50, 512, 512)
batch_size = tensor.size(0)
list1 = []
for i in range(batch_size):
list2 = []
for j in range(batch_size):
if j != i:
x_out = (tensor[i] * tensor[j]).sum()
list2.append(x_out )
list1.append(sum(list2))
out = sum(list1)
I thought that torch.prod might be able to be used, but it doesn't seem to result in the same output as the code above. NumPy answers are acceptable as long as they can be recreated in PyTorch.

You could do the following:
import torch
tensor = torch.randn(50, 512, 512)
batch_size = tensor.size(0)
tensor = tensor.reshape(batch_size, -1)
prod = torch.matmul(tensor, tensor.transpose(0,1))
out = torch.sum(prod) - torch.trace(prod)
Here, you first flatten each element. Then, you multiply the matrix where each row is an element with its own transpose, which gives a batch_size x batch_size matrix, where the ijth element equals the product of tensor[i] with tensor[j]. So, summing up over the values in this matrix and subtracting its trace (i.e., sum of diagonal elements) gives the desired result.
I tried both methods with a batch_size of 1000, and the time taken dropped from 61.43s to 0.59s.

Simple neural network - how to store weights?

I recently started learning Python and am trying to implement my first neural network. My goal is to write a function that generates a neural net with a variable amount of layers and nodes. All necessary information for that is stored in layerStructure (e.g.: First layer has four nodes, third layer has three nodes).
import numpy as np
#Vector of input layer
input = np.array([1,2,3,4])
#Amount of nodes in each layer
layerStructure = np.array([len(input),2,3])
#Generating empty weight matrix container
weightMatrix_arr = np.array([])
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
print(randmatrix)
The code above generates the following output:
[[0.6067148 0.66445212 0.54061231 0.19334004]
[0.22385007 0.8391435 0.73625366 0.86343394]]
[[0.61794333 0.9114799 ]
[0.10626486 0.95307027]
[0.50567023 0.57246852]]
My first attempt was to store each random weight matrix in a container array called weightMatrix_arr. However, since the shape of individual matrices varies, I cannot use np.append() to store them all in the matrix container. How can I save these matrices in order to access them during the backpropagation?

You can use a list instead of an np.array:
#Generating empty weight LIST container
weightMatrixes = []
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
weightMatrixes.append(randmatrix)
print(randmatrix)
Otherwise you can set the weightMatrix_arr dtype to object:
:
#Generating empty weight LIST container
weightMatrixes = np.array([], dtype=object)
#Initialsing random weights matrices
for ii in range(len(layerStructure[0:-1])):
randmatrix = np.random.rand(layerStructure[ii+1],layerStructure[ii])
weightMatrixes = np.append(weightMatrixes, randmatrix)
Note both ways you can't access the inner layer indexes without accessing the layer matrix:
weightMatrixes[layer, 0, 3] # ERROR
weightMatrixes[layer][0, 3] # OK

If memory consumption is not a problem, you can shape all layers as a longest one, and just ignore extra cells according to a layerStructure value.

I used a python dictionary to store the weights for each hidden layer with layer number as a key to the dictionary,
so that while retrieval is easy to access the weights I,e simple and clean use the dictionary to store the model weights,
its doesn't matter the shape of weights. below is a snippet of code.
"""def generate_weights(layers):
Weights={}
for i in range(1,len(layers)):
w0=2*np.random.random((layers[i-1],layers[i]))-1
Weights[i-1] = w0
return Weights
generate_weights([3,4,2])"""

BERT sentence embedding by summing last 4 layers

I used Chris Mccormick tutorial on BERT using pytorch-pretained-bert to get a sentence embedding as follows:
tokenized_text = tokenizer.tokenize(marked_text)
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
segments_ids = [1] * len(tokenized_text)
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
model = BertModel.from_pretrained('bert-base-uncased')
model.eval()
with torch.no_grad():
encoded_layers, _ = model(tokens_tensor, segments_tensors)
# Holds the list of 12 layer embeddings for each token
# Will have the shape: [# tokens, # layers, # features]
token_embeddings = []
# For each token in the sentence...
for token_i in range(len(tokenized_text)):
# Holds 12 layers of hidden states for each token
hidden_layers = []
# For each of the 12 layers...
for layer_i in range(len(encoded_layers)):
# Lookup the vector for `token_i` in `layer_i`
vec = encoded_layers[layer_i][batch_i][token_i]
hidden_layers.append(vec)
token_embeddings.append(hidden_layers)
Now, I am trying to get the final sentence embedding by summing the last 4 layers as follows:
summed_last_4_layers = [torch.sum(torch.stack(layer)[-4:], 0) for layer in token_embeddings]
But instead of getting a single torch vector of length 768 I get the following:
[tensor([-3.8930e+00, -3.2564e+00, -3.0373e-01, 2.6618e+00, 5.7803e-01,
-1.0007e+00, -2.3180e+00, 1.4215e+00, 2.6551e-01, -1.8784e+00,
-1.5268e+00, 3.6681e+00, ...., 3.9084e+00]), tensor([-2.0884e+00, -3.6244e-01, ....2.5715e+00]), tensor([ 1.0816e+00,...-4.7801e+00]), tensor([ 1.2713e+00,.... 1.0275e+00]), tensor([-6.6105e+00,..., -2.9349e-01])]
What did I get here? How do I pool the sum of the last for layers?
Thank you!

You create a list using a list comprehension that iterates over token_embeddings. It is a list that contains one tensor per token - not one tensor per layer as you probably thought (judging from your for layer in token_embeddings). You thus get a list with a length equal to the number of tokens. For each token, you have a vector that is a sum of BERT embeddings from the last 4 layers.
More efficient would be avoiding the explicit for loops and list comprehenions:
summed_last_4_layers = torch.stack(encoded_layers[-4:]).sum(0)
Now, variable summed_last_4_layers contains the same data, but in the form of a single tensor of dimension: length of the sentence × 768.
To get a single (i.e., pooled) vector, you can do pooling over the first dimension of the tensor. Max-pooling or average-pooling might make much more sense in this case than summing all the token embeddings. When summing the values, vectors of differently long sentences are in different ranges and are not really comparable.

Keras sequence prediction with multiple simultaneous sequences

My question is very similar to what it seems this post is asking, although that post doesn't pose a satisfactory solution. To elaborate, I am currently using keras with tensorflow backend and a sequential LSTM model. The end goal is I have n time-dependent sequences with equal time steps (the same number of points on each sequence and the points are all the same time apart) and I would like to feed all n sequences into the same network so it can use correlations between the sequences to better predict the next step for each sequence. My ideal output would be an n-element 1-D array with array[0] corresponding to the next-step prediction for sequence_1, array[1] for sequence_2, and so on.
My inputs are sequences of single values, so each of n inputs can be parsed into a 1-D array.
I was able to get a working model for each sequence independently using the code at the end of this guide by Jakob Aungiers, although my difficulty is adapting it to accept multiple sequences at once and correlate between them (i.e. be analyzed in parallel). I believe the issue is related to the shape of my input data, which is currently in the form of a 4-D numpy array because of how Jakob's Guide splits the inputs into sub-sequences of 30 elements each to analyze incrementally, although I could also be completely missing the target here. My code (which is mostly Jakob's, not trying to take credit for anything that isn't mine) presently looks like this:
As-is this complains with "ValueError: Error when checking target: expected activation_1 to have shape (None, 4) but got array with shape (4, 490)", I'm sure there are plenty of other issues but I'd love some direction on how to achieve what I'm describing. Anything stick out immediately to anyone? Any help you could give will be greatly appreciated.
Thanks!
-Eric

Keras is already prepared to work with batches containing many sequences, there is no secret at all.
There are two possible approaches, though:
You input your entire sequences (all steps at once) and predict n results
You input only one step of all sequences and predict the next step in a loop
Suppose:
nSequences = 30
timeSteps = 50
features = 1 #(as you said: single values per step)
outputFeatures = 1
First apporach: stateful=False:
inputArray = arrayWithShape((nSequences,timeSteps,features))
outputArray = arrayWithShape((nSequences,outputFeatures))
input_shape = (timeSteps,features)
#use layers like this:
LSTM(units) #if the first layer in a Sequential model, add the input_shape
#if you want to return the same number of steps (like a new sequence parallel to the input, use return_sequences=True
Train like this:
model.fit(inputArray,outputArray,....)
Predict like this:
newStep = model.predict(inputArray)
Second approach: stateful=True:
inputArray = sameAsBefore
outputArray = inputArray[:,1:] #one step after input array
inputArray = inputArray[:,:-1] #eliminate the last step
batch_input = (nSequences, 1, features) #stateful layers require the batch size
#use layers like this:
LSMT(units, stateful=True) #if the first layer in a Sequential model, add input_shape
Train like this:
model.reset_states() #you need this in stateful=True models
#if you don't reset states,
#the stateful model will think that your inputs are new steps of the same previous sequences
for step in range(inputArray.shape[1]): #for each time step
model.fit(inputArray[:,step:step+1], outputArray[:,step:step+1],shuffle=False,...)
Predict like this:
model.reset_states()
predictions = np.empty(inputArray.shape)
for step in range(inputArray.shape[1]): #for each time step
predictions[:,step] = model.predict(inputArray[:,step:step+1])

How to iterate over the elements of a tensor in TensorFlow?

I have coded a neural network that returns a list of 3 numbers for every input sample. These values are then subtracted from the actual values to get the difference.
For example,
actual = [1,2,3]
predicted = [0,0,1]
diff = [1,2,2]
So my tensor now has the shape [batch_size, 3]
What I want to do is to iterate over the tensor elements to construct my loss function.
For instance, if my batch_size is 2 and finally
diff = [[a,b,c],[d,e,f]]
I want the loss to be
Loss = mean(sqrt(a^2+b^2+c^2), sqrt(d^2+e^2+f^2))
I know that TensorFlow has a tf.nn.l2_loss() function that computes the L2 loss of the entire tensor. But what I want is the mean of l2 losses of elements of a tensor along some axis.
How do I go about doing this?

You can use tf.sqrt followed by tf.reduce_sum and tf.reduce_mean. Both tf.reduce_sum and tf.reduce_mean have an axis argument that indicates which dimensions to reduce.
For more reduction operations, see https://www.tensorflow.org/api_guides/python/math_ops#Reduction

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Confusion with weights dumping from neural net in keras - python

Related

How can I more efficiently multiply every element in a batch of tensors with every other batch element, except for itself?

Simple neural network - how to store weights?

BERT sentence embedding by summing last 4 layers

Keras sequence prediction with multiple simultaneous sequences

How to iterate over the elements of a tensor in TensorFlow?

Categories

Resources