I'm trying to create a model which has words as inputs. Most of those words are in the glove word vector set (~50000). However, some of the frequent words are not (~1000). The question is, how do I concatenate the following two embedding layers to create one giant Embedding lookup table?
trained_em = Embedding(50000, 50,
weights=np.array([word2glove[w] for w in words_in_glove]),
trainable=False)
untrained_em = Embedding(1000, 50)
As far as I understand these are simply two lookup tables with same number of dimensions. So I'm hoping that there is a way to stack these two lookup tables.
Edit 1:
I just realised that this is probably going to be more than stacking Embedding layers because the input sequence would be a number from 0-50999. However untrained_em above only expect a number from 0-999. So perhaps a different solution is required.
Edit 2:
This is what I would expect to do in a numpy array representing the Embedding:
np.random.seed(42) # Set seed for reproducibility
pretrained = np.random.randn(15,3)
untrained = np.random.randn(5,3)
final_embedding = np.vstack([pretrained, untrained])
word_idx = [2, 5, 19]
np.take(final_embedding, word_idx, axis=0)
I believe the last bit can be done with something to do with keras.backend.gather but not sure how to put it all together.
Turns out that I need to implement a custom layer. Which was implemented by tweaking the orignial Embedding class.
The two most important parts shown in the class below are self.embeddings = K.concatenate([fixed_weight, variable_weight], axis=0) and out = K.gather(self.embeddings, inputs). The first is hopefully self explanatory while the second picks out the relevant input rows from the embeddings table.
However, in the particular application that I'm working on, turns out that it works out better using an Embedding layer instead of the modified layer. Perhaps because the learning rate is too high. I will report back on this after I have experimented more.
from keras.engine.topology import Layer
import keras.backend as K
from keras import initializers
import numpy as np
class Embedding2(Layer):
def __init__(self, input_dim, output_dim, fixed_weights, embeddings_initializer='uniform',
input_length=None, **kwargs):
kwargs['dtype'] = 'int32'
if 'input_shape' not in kwargs:
if input_length:
kwargs['input_shape'] = (input_length,)
else:
kwargs['input_shape'] = (None,)
super(Embedding2, self).__init__(**kwargs)
self.input_dim = input_dim
self.output_dim = output_dim
self.embeddings_initializer = embeddings_initializer
self.fixed_weights = fixed_weights
self.num_trainable = input_dim - len(fixed_weights)
self.input_length = input_length
def build(self, input_shape, name='embeddings'):
initializer = initializers.get(self.embeddings_initializer)
shape1 = (self.num_trainable, self.output_dim)
variable_weight = K.variable(initializer(shape1), dtype=K.floatx(), name=name+'_var')
fixed_weight = K.variable(self.fixed_weights, name=name+'_fixed')
self._trainable_weights.append(variable_weight)
self._non_trainable_weights.append(fixed_weight)
self.embeddings = K.concatenate([fixed_weight, variable_weight], axis=0)
self.built = True
def call(self, inputs):
if K.dtype(inputs) != 'int32':
inputs = K.cast(inputs, 'int32')
out = K.gather(self.embeddings, inputs)
return out
def compute_output_shape(self, input_shape):
if not self.input_length:
input_length = input_shape[1]
else:
input_length = self.input_length
return (input_shape[0], input_length, self.output_dim)
So, my suggestion is to use only one Embedding layer (taking into consideration your indexing problem), and transfer the weights from the old layer to the new one.
So, what you're going to to in this suggestion is...
Create your new model with 51000 words:
inp = Input((1,))
emb = Embedding(51000,50)(inp)
out = the rest of the model.....
model = Model(inp,out)
Now take the embedding layer and give it the weights you had:
weights = np.array([word2glove[w] for w in words_in_glove])
newWeights = model.layers[1].get_weights()[0]
newWeights[:50000,:] = weights
model.layers[1].set_weights([newWeights])
This will give you a new embedding, larger than the previous one, with a great part of its weights already trained, and the remaining randomly initialized.
Unfortunately, you will have to let everything be trained.
Related
I am trying to make a model in tensorflow using the keras subclasses method.
Q1) I am correctly calling layers as layers = [] and then using layers.append(GTLayer....) ?
Q2) calling GTLayer in init of GTN will run class GTLayer and will it call self.conv1 (which will return a tensor A from GTNconv) and self.conv2 (which will again return a tensor A from GTNconv)and then start the call mrthod of GTLayer to H,W , Am I right?
Q3) What happens to the returned H and W from 'Q2' will it store in layers[] list ? and then when we further call the GTNs call method it will bring up those layer? Am I correct?
Q4)Later in the GTNs call method I had to implement linear layers and thus I defined model = tf.keras.models.Sequential() and after theat initialised self.linear1 and self.linear2, this way I have implemented subclassing and sequential both! Is that correct?
Q5) I will finally get loss, y, Ws from calling GTN , now if I assign my model = GTN(arguments..) how will I do the training and back-propagation steps? using an optimiser and loss function? will it follow model.compile() and model.fit ? Or can we make it any different in the sub-classing method of keras?
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class GTN(layers.Layer):
def __init__(self, num_edge, num_channels,num_layers,norm):
super(GTN, self).__init__()
self.num_edge = num_edge
self.num_channels = num_channels
self.num_layers = num_layers
self.is_norm = norm
layers = []
for i in tf.range(num_layers):
if i == 0:
layers.append(GTLayer(num_edge, num_channels, first=True))
else:
layers.append(GTLayer(num_edge, num_channels, first=False))
model = tf.keras.models.Sequential()
self.loss = tf.keras.losses.BinaryCrossentropy(from_logits=True)
self.linear1 = model.add(tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None))
self.linear2 = model.add(tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None))
def gcn_conv(self,X,H):
X = tf.matmul(X, self.weight)
H = self.norm(H, add=True)
return tf.matmul(tf.transpose(H),X)
def call(self, A, X, target_x, target):
A = tf.expand_dims(A, 0)
Ws = []
for i in range(self.num_layers):
H = self.normalization(H)
H, W = self.layers[i](A, H)
Ws.append(W)
for i in range(self.num_channels):
X_tmp = tf.nn.relu(self.gcn_conv(X,H[i])).numpy()
X_ = tf.concat((X_,X_tmp), dim=1)
X_ = self.linear1(X_)
X_ = tf.nn.relu(X_).numpy()
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
class GTLayer(keras.layers.Layer):
def __init__(self, in_channels, out_channels, first=True):
super(GTLayer, self).__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.conv1 = GTConv(in_channels, out_channels)
self.conv2 = GTConv(in_channels, out_channels)
def call(self, A, H_=None):
a = self.conv1(A)
b = self.conv2(A)
H = tf.matmul( a, b)
W = [tf.stop_gradient(tf.nn.softmax(self.conv1.weight, axis=1).numpy()),
tf.stop_gradient(tf.nn.softmax(self.conv2.weight, axis=1).numpy()) ]
return H,W
class GTConv(keras.layers.Layer):
def __init__(self, in_channels, out_channels):
super(GTConv, self).__init__()
def call(self, A):
A = tf.add_n(tf.nn.softmax(self.weight))
return A
Q1
No. There are two possibilities here
1 - If you want to access a standard layers property of Keras models:
Only Model has a layers property, a keras.layers.Layer doesn't have this property
You are not supposed to mess with the layers property of a Model, you should just read it
The variable you are creating named layers is not a property of your class because you did not use self.layers.
2 - If you just want a list named layers for personal use in your class:
I recommend you don't use a standard name like this and change it to myLayers or something like that to avoid confusion.
The variable layers you created is not being used anywhere else in your code, you just created it and never used.
Remember that layers = [] just creates a local variable, while self.layers = [] creates a property in your class that can be used in other methods inside your class
Q2
You are not "calling" GTLayer, you are "creating" GTLayer. This means that you are running GTLayer.__init__().
This distinction is important in Keras:
This is "creating" a layer: layer_instance = GTLayer(...), which runs __init__
This is "calling" a layer: layer_instance(input_tensors), which runs __call__ (which will eventually run call as defined by you)
You can do both in the same line as output_tensors = GTLayer(...)(input_tensors)
So, this is happening in GTN.__init__:
You are "creating" two instances of the GTLayer.
This runs GTLayer.__init__() for each instance
This hits the lines self.conv1 = GTConv(in_channels, out_channels) and self.conv2 = GTConv(in_channels, out_channels)
This is also "creating" (not "calling") GTConv.
self.conv1 and self.conv2 are "Layer" instances, not tensors.
Q3
No tensor is produced here because you never "called" any layer in GTN.__init__().
(And this is ok. Usually, you "create" layers inside __init__() and "call" layers inside call.)
Your layers local variable will have "instances of GTLayer".
Q4
You mixed two approaches in a strange way.
You can, of course, use a Sequential model if you want, but it's not necessary, and you're not using it correcly.
If in call you are calling each layer (that is X_ = self.linear1(X_) and y = self.linear2(X_[target_x])), you don't need a Sequential model at all, and you can just have the following in GTN.__init__() (this is the best approach for subclassing):
self.linear1 = tf.keras.layers.Dense(self.w_out, input_shape=(self.w_out*self.num_channels,), activation=None)
self.linear2 = tf.keras.layers.Dense(self.num_class, input_shape=(self.w_out,), activation=None)
But you could have self.submodel = Sequential(...) and then use self.submodel in GTN.call(). But having a Model inside a layer sounds weird and might cause some strange behavior in specific cases. And, of course, the ReLUs should be a part of this submodel.
Q5
I will finally get loss, y, Ws from calling GTN
That loss and weights coming from call is a "very very" strange thing. I never saw this and I don't understand why you're doing it this way. This is not standard use of Keras and only in very specific and otherwise unsolvable cases you'd try something like this. I cannot say it will work.
How will I do the training and back-propagation steps?
You should have implemented a keras.models.Model, not a keras.layers.Layer. Only models have the ability to compile and train.
Usually, you'd not create a loss in call, you'd create a loss in model.compile, unless you're dealing with unconventional losses, like weight or activity regularization, things that really depend on the layer and not on the model's inputs/outputs.
Extra tips
There is no need to create custom layers if you're not going to create custom trainable weights. It's not wrong, of course, but also not necessary. It can help organize your code, or just add extra complication.
You are trying to use weight from your layers, but you never defined any weight anywhere.
I'm pretty sure there is a better way to achieve what you want, but I don't know what you want (and that would be something for another question, I think...)
This might be a good reading for subclassing: https://www.tensorflow.org/guide/keras/custom_layers_and_models?hl=en
I recently did a massive refactor to my PyTorch LSTM code, in order to support multitask learning. I created an MTLWrapper, which holds a BaseModel (which can be one of several variations on a regular LSTM network), which remained the same as it was before the refactor, minus a linear hidden2tag layer (takes hidden sequence and converts to tag space), which now sits in the wrapper. The reason for this is that for multitask learning, all the parameters are shared, except for the final linear layer, which I have one of for each task. These are stored in a nn.ModuleList, not just a regular python list.
What happens now is that my forward pass returns a list of tag scores tensors (one for each task), rather than a single tensor of the tag scores for a single task. I compute the losses for each of these tasks and then try to backpropagate with the average of these losses (technically also averaged over all the sentences of a batch, but this was true before the refactor too). I call model.zero_grad() before running the forward pass on each sentence in a batch.
I don't know exactly where it happened, but after this refactor, I started getting this error (on the second batch):
RuntimeError: Trying to backward through the graph a second time, but
the buffers have already been freed. Specify retain_graph=True when
calling backward the first time.
Following the advice, I added the retain_graph=True flag, but now I get the following error instead (also on the second backward step):
RuntimeError: one of the variables needed for gradient computation has
been modified by an inplace operation: [torch.FloatTensor [100, 400]],
which is output 0 of TBackward, is at version 2; expected version 1
instead. Hint: the backtrace further above shows the operation that
failed to compute its gradient. The variable in question was changed
in there or anywhere later. Good luck!
The hint in the backtrace is not actually helpful, because I have no idea where a tensor of the shape [100, 400] even came from - I don't have any parameters of size 400.
I have a sneaky suspicion that the problem is actually that I shouldn't need the retain_graph=True, but I have no way to confirm that vs. finding the mystery variable that is being changed according to the second error. Either way, I'm at a complete loss how to solve this issue. Any help is appreciated!
Code snippets:
import torch
import torch.nn as nn
import torch.nn.functional as F
class MTLWrapper(nn.Module):
def __init__(self, embedding_dim, hidden_dim, dropout,..., directions=1, device='cpu', model_type):
super(MTLWrapper, self).__init__()
self.base_model = model_type(embedding_dim, hidden_dim, dropout, ..., directions, device)
self.linear_taggers = []
for tagset_size in tagset_sizes:
self.linear_taggers.append(nn.Linear(hidden_dim*directions, tagset_size))
self.linear_taggers = nn.ModuleList(self.linear_taggers)
def init_hidden(self, hidden_dim):
return self.base_model.init_hidden(hidden_dim)
def forward(self, sentence):
lstm_out = self.base_model.forward(sentence)
tag_scores = []
for linear_tagger in self.linear_taggers:
tag_space = linear_tagger(lstm_out.view(len(sentence), -1))
tag_scores.append(F.log_softmax(tag_space))
tag_scores = torch.stack(tag_scores)
return tag_scores
Inside the train function:
for i in range(math.ceil(len(train_sents)/batch_size)):
batch = r[i*batch_size:(i+1)*batch_size]
losses = []
for j in batch:
sentence = train_sents[j]
tags = train_tags[j]
# Step 1. Remember that Pytorch accumulates gradients.
# We need to clear them out before each instance
model.zero_grad()
# Also, we need to clear out the hidden state of the LSTM,
# detaching it from its history on the last instance.
model.hidden = model.init_hidden(hidden_dim)
sentence_in = sentence
targets = tags
# Step 3. Run our forward pass.
tag_scores = model(sentence_in)
loss = [loss_function(tag_scores[i], targets[i]) for i in range(len(tag_scores))]
loss = torch.stack(loss)
avg_loss = sum(loss)/len(loss)
losses.append(avg_loss)
losses = torch.stack(losses)
total_loss = sum(losses)/len(losses) # average over all sentences in batch
total_loss.backward(retain_graph=True)
running_loss += total_loss.item()
optimizer.step()
count += 1
And code for one possible BaseModel (the others are practically identical):
class LSTMTagger(nn.Module):
def __init__(self, embedding_dim, hidden_dim, dropout, vocab_size, alphabet_size,
directions=1, device='cpu'):
super(LSTMTagger, self).__init__()
self.device = device
self.hidden_dim = hidden_dim
self.directions = directions
self.dropout = nn.Dropout(dropout)
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim.
self.lstm = nn.LSTM(embedding_dim, hidden_dim, dropout=dropout, bidirectional=directions == 2)
# The linear layer that maps from hidden state space to tag space
self.hidden = self.init_hidden(hidden_dim)
def init_hidden(self, dim):
# Before we've done anything, we don't have any hidden state.
# Refer to the PyTorch documentation to see exactly
# why they have this dimensionality.
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
return (torch.zeros(self.directions, 1, dim).to(device=self.device),
torch.zeros(self.directions, 1, dim).to(device=self.device))
def forward(self, sentence):
word_idxs = []
for word in sentence:
word_idxs.append(word[0])
embeds = self.word_embeddings(torch.LongTensor(word_idxs).to(device=self.device))
lstm_out, self.hidden = self.lstm(
embeds.view(len(sentence), 1, -1), self.hidden)
lstm_out = self.dropout(lstm_out)
return lstm_out
The problem is that when I was resetting the hidden states of the model (model.hidden = model.init_hidden(hidden_dim)) I didn't actually reassign the reinitialized weights to the BaseModel, but only in the MTLWrapper (which doesn't technically even use hidden layers).
I amended my MTLWrapper's init_hidden() function as follows:
class MTLWrapper(nn.Module):
def init_hidden(self, hidden_dim):
self.base_model.hidden = self.base_model.init_hidden(hidden_dim)
return self.base_model.init_hidden(hidden_dim)
This resolved the first error, and my code runs without the retain_graph=True flag.
I am working with the keras-capsnet implementation of Capsule Networks, and am trying to apply the same layer to 30 images per sample.
The weights are initialized within the init and build arguments for the class, shown below. I have successfully shared the weights between the primary routing layers which just use tf.layers.conv2d, where I can assign them the same name and set reuse = True.
Does anyone know how to initialize weights in a Keras custom layer so that they may be reused? I am much more familiar with the tensorflow API than with the Keras one!
def __init__(self, num_capsule, dim_capsule, routings=3,
kernel_initializer='glorot_uniform',
**kwargs):
super(CapsuleLayer, self).__init__(**kwargs)
self.num_capsule = num_capsule
self.dim_capsule = dim_capsule
self.routings = routings
self.kernel_initializer = initializers.get(kernel_initializer)
def build(self, input_shape):
assert len(input_shape) >= 3, "The input Tensor should have shape=[None, input_num_capsule, input_dim_capsule]"
self.input_num_capsule = input_shape[1]
self.input_dim_capsule = input_shape[2]
# Weights are initialized here each time the layer is called
self.W = self.add_weight(shape=[self.num_capsule, self.input_num_capsule,
self.dim_capsule, self.input_dim_capsule],
initializer=self.kernel_initializer,
name='W')
self.built = True
The answer was simple. Set up a layer without calling it on input, and then use that built layer to call the data individually.
I'm trying to create a very basic multivariate time series auto-encoder.
I want to be able to reconstruct the exact two signals I pass in.
Most of the references I'm looking at are using older versions of APIs or use embeddings.
I'm trying to use the latest higher level APIs, but its not obvious how you cobble them together.
class Seq2SeqConfig():
def __init__(self):
self.sequence_length = 15 # num of time steps
self.hidden_units = 64 # ?
self.num_features = 2
self.batch_size = 10
# Expect input as batch major.
encoder_inputs = tf.placeholder(shape=(None, config.sequence_length, config.num_features), dtype=tf.float32)
decoder_inputs = tf.placeholder(shape=(None, config.sequence_length, config.num_features), dtype=tf.float32))
# Convert inputs to time major
encoder_inputs_tm = tf.transpose(encoder_inputs, [1, 0, 2])
decoder_inputs_tm = tf.transpose(decoder_inputs, [1, 0, 2])
# setup encoder
encoder_cell = tf.contrib.rnn.LSTMCell(config.hidden_units)
encoder_outputs, encoder_final_state = tf.nn.dynamic_rnn(
cell=encoder_cell,
inputs=encoder_inputs_tm,
time_major=True)
# setup decoder
decoder_cell = tf.contrib.rnn.LSTMCell(config.hidden_units)
# The sequence length is mandatory. Not sure what the expectation is here?
helper = tf.contrib.seq2seq.TrainingHelper(
decoder_inputs_tm,
sequence_length=tf.constant(config.sequence_length, dtype=tf.int32, shape=[config.batch_size]),
time_major=True)
decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cell, helper, encoder_final_state)
decoder_outputs, _, _ = tf.contrib.seq2seq.dynamic_decode(decoder, output_time_major=True)
# loss calculation
loss_op = tf.reduce_mean(tf.square(decoder_outputs.rnn - decoder_targets_tm)
The loss operation fails because the shapes are different.
decoder_targets is (?, 15, 2) and decoder_outputs.rnn is (?, ?, 64).
Question 1:
Am I missing an operation somewhere where I reshape the decoder output?
I loosely followed this tensorflow tutorial: https://www.tensorflow.org/tutorials/seq2seq
There is a projection_layer operation passed into the basic decoder. Is that the purpose of this?
projection_layer = layers_core.Dense(tgt_vocab_size, use_bias=False)
I don't see a layers_core.Dense() function anywhere. I assume its deprecated or internal.
Question 2:
Which helper does one use for Inference when not using embeddings?
Question 3:
What would the ideal size of the hidden units be?
I assume because we want to reduce the dimensions in the latent space, it needs to be less that the size of the inputs. How does that translate when you have a input with sequence length = 15 and number of features = 2.
Should the number of hidden units be < 15, < 2 or < 15 *2?
Figured out the answer to Question 1
from tensorflow.python.layers.core import Dense
output_layer = Dense(config.num_features)
decoder = tf.contrib.seq2seq.BasicDecoder(decoder_cell, helper, encoder_final_state, output_layer)
Reference: https://github.com/udacity/deep-learning/blob/master/seq2seq/sequence_to_sequence_implementation.ipynb
Other two questions still stand.
Regarding question 3: I suggest you run several training and validation cycles with different hyperparameters to find what works best for your data and requirements. You can take a look at my implementation here (https://github.com/fritzfitzpatrick/tensorflow-seq2seq-generic-example) where I have built a very simple training & validation loop that stops once the validation loss has not gone down for a number of cycles to prevent overfitting.
Regarding question 2: I am still working on a CustomHelper implementation at the moment, and it looks like it is going somewhere. You can find the full sample code here (https://github.com/fritzfitzpatrick/tensorflow-seq2seq-generic-example/blob/master/tensorflow_custom_helper.ipynb).
batch_size = 5
features_dec_inp = 2 # number of features in target sequence
go_token = 2
end_token = 3
sess = tf.InteractiveSession()
def initialize_fn():
finished = tf.tile([False], [batch_size])
start_inputs = tf.fill([batch_size, features_dec_inp], go_token)
return (finished, start_inputs)
def next_inputs_fn(time, outputs, state, sample_ids):
del time, sample_ids
# finished needs to update after last step.
# one could use conditional logic based on sequence length
# if sequence length is known in advance
finished = tf.tile([False], [batch_size])
# next inputs should be the output of the dense layer
# unless the above finished logic returns [True]
# in which case next inputs can be anything in the right shape
next_inputs = tf.fill([batch_size, features_dec_inp], 0.5)
return (finished, next_inputs, state)
helper = tf.contrib.seq2seq.CustomHelper(
initialize_fn = initialize_fn,
sample_fn = tf.identity,
next_inputs_fn = next_inputs_fn)
print(helper)
Regarding question 1: This is the code that I am using to reduce the dimensionality of my decoder output to the number of features in my target sequence:
train_output_dense = tf.layers.dense(
train_dec_out_logits, # [batch_size x seq_length x num_units]
features_dec_exp_out) # [batch_size x seq_length x num_target_features]
Imagine a fully-connected neural network with its last two layers of the following structure:
[Dense]
units = 612
activation = softplus
[Dense]
units = 1
activation = sigmoid
The output value of the net is 1, but I'd like to know what the input x to the sigmoidal function was (must be some high number, since sigm(x) is 1 here).
Folllowing indraforyou's answer I managed to retrieve the output and weights of Keras layers:
outputs = [layer.output for layer in model.layers[-2:]]
functors = [K.function( [model.input]+[K.learning_phase()], [out] ) for out in outputs]
test_input = np.array(...)
layer_outs = [func([test_input, 0.]) for func in functors]
print layer_outs[-1][0] # -> array([[ 1.]])
dense_0_out = layer_outs[-2][0] # shape (612, 1)
dense_1_weights = model.layers[-1].weights[0].get_value() # shape (1, 612)
dense_1_bias = model.layers[-1].weights[1].get_value()
x = np.dot(dense_0_out, dense_1_weights) + dense_1_bias
print x # -> -11.7
How can x be a negative number? In that case the last layers output should be a number closer to 0.0 than 1.0. Are dense_0_out or dense_1_weights the wrong outputs or weights?
Since you're using get_value(), I'll assume that you're using Theano backend. To get the value of the node before the sigmoid activation, you can traverse the computation graph.
The graph can be traversed starting from outputs (the result of some computation) down to its inputs using the owner field.
In your case, what you want is the input x of the sigmoid activation op. The output of the sigmoid op is model.output. Putting these together, the variable x is model.output.owner.inputs[0].
If you print out this value, you'll see Elemwise{add,no_inplace}.0, which is an element-wise addition op. It can be verified from the source code of Dense.call():
def call(self, inputs):
output = K.dot(inputs, self.kernel)
if self.use_bias:
output = K.bias_add(output, self.bias)
if self.activation is not None:
output = self.activation(output)
return output
The input to the activation function is the output of K.bias_add().
With a small modification of your code, you can get the value of the node before activation:
x = model.output.owner.inputs[0]
func = K.function([model.input] + [K.learning_phase()], [x])
print func([test_input, 0.])
For anyone using TensorFlow backend: use x = model.output.op.inputs[0] instead.
I can see a simple way just changing a little the model structure. (See at the end how to use the existing model and change only the ending).
The advantages of this method are:
You don't have to guess if you're doing the right calculations
You don't need to care about the dropout layers and how to implement a dropout calculation
This is a pure Keras solution (applies to any backend, either Theano or Tensorflow).
There are two possible solutions below:
Option 1 - Create a new model from start with the proposed structure
Option 2 - Reuse an existing model changing only its ending
Model structure
You could just have the last dense separated in two layers at the end:
[Dense]
units = 612
activation = softplus
[Dense]
units = 1
#no activation
[Activation]
activation = sigmoid
Then you simply get the output of the last dense layer.
I'd say you should create two models, one for training, the other for checking this value.
Option 1 - Building the models from the beginning:
from keras.models import Model
#build the initial part of the model the same way you would
#add the Dense layer without an activation:
#if using the functional Model API
denseOut = Dense(1)(outputFromThePreviousLayer)
sigmoidOut = Activation('sigmoid')(denseOut)
#if using the sequential model - will need the functional API
model.add(Dense(1))
sigmoidOut = Activation('sigmoid')(model.output)
Create two models from that, one for training, one for checking the output of dense:
#if using the functional API
checkingModel = Model(yourInputs, denseOut)
#if using the sequential model:
checkingModel = model
trainingModel = Model(checkingModel.inputs, sigmoidOut)
Use trianingModel for training normally. The two models share weights, so training one is training the other.
Use checkingModel just to see the outputs of the Dense layer, using checkingModel.predict(X)
Option 2 - Building this from an existing model:
from keras.models import Model
#find the softplus dense layer and get its output:
softplusOut = oldModel.layers[indexForSoftplusLayer].output
#or should this be the output from the dropout? Whichever comes immediately after the last Dense(1)
#recreate the dense layer
outDense = Dense(1, name='newDense', ...)(softPlusOut)
#create the new model
checkingModel = Model(oldModel.inputs,outDense)
It's important, since you created a new Dense layer, to get the weights from the old one:
wgts = oldModel.layers[indexForDense].get_weights()
checkingModel.get_layer('newDense').set_weights(wgts)
In this case, training the old model will not update the last dense layer in the new model, so, let's create a trainingModel:
outSigmoid = Activation('sigmoid')(checkingModel.output)
trainingModel = Model(checkingModel.inputs,outSigmoid)
Use checkingModel for checking the values you want with checkingModel.predict(X). And train the trainingModel.
So this is for fellow googlers, the working of the keras API has changed significantly since the accepted answer was posted. The working code for extracting a layer's output before activation (for tensorflow backend) is:
model = Your_Keras_Model()
the_tensor_you_need = model.output.op.inputs[0] #<- this is indexable, if there are multiple inputs to this node then you can find it with indexing.
In my case, the final layer was a dense layer with activation softmax, so the tensor output I needed was <tf.Tensor 'predictions/BiasAdd:0' shape=(?, 1000) dtype=float32>.
(TF backend)
Solution for Conv layers.
I had the same question, and to rewrite a model's configuration was not an option.
The simple hack would be to perform the call function manually. It gives control over the activation.
Copy-paste from the Keras source, with self changed to layer. You can do the same with any other layer.
def conv_no_activation(layer, inputs, activation=False):
if layer.rank == 1:
outputs = K.conv1d(
inputs,
layer.kernel,
strides=layer.strides[0],
padding=layer.padding,
data_format=layer.data_format,
dilation_rate=layer.dilation_rate[0])
if layer.rank == 2:
outputs = K.conv2d(
inputs,
layer.kernel,
strides=layer.strides,
padding=layer.padding,
data_format=layer.data_format,
dilation_rate=layer.dilation_rate)
if layer.rank == 3:
outputs = K.conv3d(
inputs,
layer.kernel,
strides=layer.strides,
padding=layer.padding,
data_format=layer.data_format,
dilation_rate=layer.dilation_rate)
if layer.use_bias:
outputs = K.bias_add(
outputs,
layer.bias,
data_format=layer.data_format)
if activation and layer.activation is not None:
outputs = layer.activation(outputs)
return outputs
Now we need to modify the main function a little. First, identify the layer by its name. Then retrieve activations from the previous layer. And at last, compute the output from the target layer.
def get_output_activation_control(model, images, layername, activation=False):
"""Get activations for the input from specified layer"""
inp = model.input
layer_id, layer = [(n, l) for n, l in enumerate(model.layers) if l.name == layername][0]
prev_layer = model.layers[layer_id - 1]
conv_out = conv_no_activation(layer, prev_layer.output, activation=activation)
functor = K.function([inp] + [K.learning_phase()], [conv_out])
return functor([images])
Here is a tiny test. I'm using VGG16 model.
a_relu = get_output_activation_control(vgg_model, img, 'block4_conv1', activation=True)[0]
a_no_relu = get_output_activation_control(vgg_model, img, 'block4_conv1', activation=False)[0]
print(np.sum(a_no_relu < 0))
> 245293
Set all negatives to zero to compare with the results retrieved after an embedded in VGG16 ReLu operation.
a_no_relu[a_no_relu < 0] = 0
print(np.allclose(a_relu, a_no_relu))
> True
easy way to define new layer with new activation function:
def change_layer_activation(layer):
if isinstance(layer, keras.layers.Conv2D):
config = layer.get_config()
config["activation"] = "linear"
new = keras.layers.Conv2D.from_config(config)
elif isinstance(layer, keras.layers.Dense):
config = layer.get_config()
config["activation"] = "linear"
new = keras.layers.Dense.from_config(config)
weights = [x.numpy() for x in layer.weights]
return new, weights
I had the same problem but none of the other answers worked for me. Im using a newer version of Keras with Tensorflow so some answers dont work now. Also the structure of the model is given so i can't change it easely. The general idea is to create a copy of the original model that will work exactly like the original one but spliting the activation from the outputs layers. Once this is done we can easely access the outputs values before the activation is applied.
First we will create a copy of the original model but with no activation on the outputs layers. This will be done using Keras clone_model function (See Docs).
from tensorflow.keras.models import clone_model
from tensorflow.keras.layers import Activation
original_model = get_model()
def f(layer):
config = layer.get_config()
if not isinstance(layer, Activation) and layer.name in original_model.output_names:
config.pop('activation', None)
layer_copy = layer.__class__.from_config(config)
return layer_copy
copy_model = clone_model(model, clone_function=f)
This alone will only make a clone with new weights so we must copy the original_model weights to the new one:
copy_model.build(original_model.input_shape)
copy_model.set_weights(original_model.get_weights())
Now we will add the activations layers:
from tensorflow.keras.models import Model
old_outputs = [ original_model.get_layer(name=name) for name in copy_model.output_names ]
new_outputs = [ Activation(old_output.activation)(output) if old_output.activation else output
for output, old_output in zip(copy_model.outputs, old_outputs) ]
copy_model = Model(copy_model.inputs, new_outputs)
Finally we could create a new model whose evaluation will be the outputs with no activation applied:
no_activation_outputs = [ copy_model.get_layer(name=name).output for name in original_model.output_names ]
no_activation_model = Model(copy.inputs, no_activation_outputs)
Now we could use copy_model like the original_model and no_activation_model to access pre-activation outputs. Actually you could even modify the code to split a custom set of layers instead of the outputs.