I'm running into an issue with a model I'm trying to build. I've been trying to debug it and ran into an oddity that I think may be the cause, but I'm not sure what I'm doing wrong. I've reduced what I think the problem is into a small snippet you can run on colab.
Here's a colab where you can try running this:
https://colab.research.google.com/drive/1pSTwCwMFGlWgJOP3gn9WF6pZq2CiP4XJ
import keras
from keras.layers import Layer, Dense, Input, Reshape
import keras.backend as K
class SimplePermute(Layer):
def __init__(self, **kwargs):
super(SimplePermute, self).__init__(**kwargs)
def call(self, inputs, **kwargs):
return K.permute_dimensions(inputs, [0,2,1])
test_i = Input(shape=(10, 256))
test = SimplePermute()(test_i)
print(test.get_shape())
print(K.int_shape(test))
test = Dense(units=100, activation="softmax", name="sft2")(test)
print(test.get_shape())
print(K.int_shape(test))
I'd expect the second series of prints to print the permuted tensor shape - that is [?, 256, 10]. However, the K.int_shape() returns [?, 10, 256], while TF's get_shape() returns the properly permuted shape.
I believe this internal mismatch is causing the errors I'm seeing downstream in the model.
Your custom layer doesn't have the compute_output_shape method implemented. This is what Keras uses to determine the _keras_shape property of the tensors, which is returned by K.int_shape.
You can use the standard Permute((2,1)) layer.
Or you can use a Lambda(lambda x: K.permute_dimensions(x, [0,2,1])) layer.
Or you can implement the compute_output_shape method:
.
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[2], input_shape[1])
Related
I am developing an LSTM autoencoder model for anomaly detection. I have my keras model setup as below:
from keras.models import Sequential
from keras import Model, layers
from keras.layers import Layer, Conv1D, Input, Masking, Dense, RNN, LSTM, Dropout, RepeatVector, TimeDistributed, Masking, Reshape
def create_RNN_with_attention():
x=Input(shape=(X_train_dt.shape[1], X_train_dt.shape[2]))
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
attention_layer = attention()(RNN_layer_1)
dropout_layer_1 = Dropout(rate=0.2)(attention_layer)
repeat_vector_layer = RepeatVector(n=X_train_dt.shape[1])(dropout_layer_1)
RNN_layer_2 = LSTM(units=64, return_sequences=True)(repeat_vector_layer)
dropout_layer_1 = Dropout(rate=0.2)(RNN_layer_2)
output = TimeDistributed(Dense(X_train_dt.shape[2], trainable=True))(dropout_layer_1)
model=Model(x,output)
model.compile(loss='mae', optimizer='adam')
return model
Notice the attention layer that I added, attention_layer. Before adding this, the model compiled perfectly, however after adding this attention_layer - the model is throwing out the following error: ValueError: Input 0 is incompatible with layer repeat_vector_40: expected ndim=2, found ndim=1
My attention layer is setup as follows:
import keras.backend as K
class attention(Layer):
def __init__(self,**kwargs):
super(attention,self).__init__(**kwargs)
def build(self,input_shape):
self.W=self.add_weight(name='attention_weight', shape=(input_shape[-1],1),
initializer='random_normal', trainable=True)
self.b=self.add_weight(name='attention_bias', shape=(input_shape[1],1),
initializer='zeros', trainable=True)
super(attention, self).build(input_shape)
def call(self,x):
# Alignment scores. Pass them through tanh function
e = K.tanh(K.dot(x,self.W)+self.b)
# Remove dimension of size 1
e = K.squeeze(e, axis=-1)
# Compute the weights
alpha = K.softmax(e)
# Reshape to tensorFlow format
alpha = K.expand_dims(alpha, axis=-1)
# Compute the context vector
context = x * alpha
context = K.sum(context, axis=1)
return context
The idea of the attention mask is to allow the model to focus on more prominent features as is trains.
Why am I getting the error above and how can I fix this?
I think that the problem lies in this line:
RNN_layer_1 = LSTM(units=64, return_sequences=False)(x)
This layer outputs a tensor of shape (batch_size, 64). So this means that you output a vector and then run attention mechanism on w.r.t. to the batch dimension instead of a sequential dimension. This also means that you output with a squeezed batch dimension that is not acceptable for any keras layer. This is why the Repeat layer raises error as it expects vector of at least shape (batch_dimension, dim).
If you want to run attention mechanism over a sequence then you should switch the line mentioned above to:
RNN_layer_1 = LSTM(units=64, return_sequences=True)(x)
I was using tensorflow 2.1.0 to build a model, and while I build a custom layer, which need to convert a to numpy array, problem occurred. I explicitly remember aTensor.numpy() is a real thing, so it must be something I did wrong, could anyone tell me how can I fix it? I'm still a noob on tensorflow.
Here are the codes(the code is about a layer, not the whole model):
class CIN_Layer(tf.keras.layers.Layer):
def __init__(self, in_shape):
super(CIN_Layer, self).__init__()
self.in_shape = in_shape
#this is the custom part
def get3DTensor(self, inputs, lastLayerOutput=None):
print(type(inputs))
inputs = inputs.numpy()#FIX HERE: problem occurs here
interaction = []
if lastLayerOutput == None:
lastLayerOutput = inputs.copy()
else:
lastLayerOutput = lastLayerOutput.numpy()
for i in range(inputs.shape[0]):
interaction.append(np.dot(inputs[i].reshape([-1,1]), lastLayerOutput[i].reshape([1,-1])))
return tf.convert_to_tensor(np.array(interaction))
def build(self, input_shape):
self.kernel = self.add_weight('CIN_kernel', shape=[self.in_shape[-1] for i in range(3)])
def call(self, inputs, lastLayerOutput=None):
interaction = self.get3DTensor(inputs, lastLayerOutput)
return tf.reduce_sum(tf.matmul(inputs, self.kernel))
inputs = tf.keras.layers.Input(shape=(5,10))
cin_layer = CIN_Layer(in_shape=(5,10))
lastLayerOutput = cin_layer(inputs)
output = tf.keras.layers.Dense(1)(lastLayerOutput)
model = tf.keras.Model(inputs=inputs, outputs=output)
model.compile(loss='mean_squared_error', optimizer=optimizer)
model.summary()
If there are other ways to insert some numpy code in a tensorflow model, please do tell.
In tensorflow 2.0 there are two types of objects Tensor and EagerTensor. Only the EagerTensor has numpy() method associated with it and EagerTensors are those whose values are readily available during runtime eg. tf.ones(2,3) will create a EagerTensor as we know it's value and any operations performed on it will give out a eagerTensor. In your code during the layer definition, parameter 'inputs' to call method is a normal tensor whose value is known only during the graph execution(forward pass) and so you cannot call a numpy method on it. During the forward pass of tensorflow you should do your operations only using tensor but cannot alternate between tensors and numpy arrays(it makes tracing graph for backprop impossible)
In TF 1.x, it was possible to build layers with custom variables. Here's an example:
import numpy as np
import tensorflow as tf
def make_custom_getter(custom_variables):
def custom_getter(getter, name, **kwargs):
if name in custom_variables:
variable = custom_variables[name]
else:
variable = getter(name, **kwargs)
return variable
return custom_getter
# Make a custom getter for the dense layer variables.
# Note: custom variables can result from arbitrary computation;
# for the sake of this example, we make them just constant tensors.
custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
custom_getter = make_custom_getter(custom_variables)
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with tf.variable_scope("model", custom_getter=custom_getter):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
The printed variables of the constructed dense layer will be custom tensors we specified in the custom_variables dict:
[<tf.Tensor 'custom_kernel:0' shape=(784, 64) dtype=float32>, <tf.Tensor 'custom_bias:0' shape=(64,) dtype=float32>]
This allows us to create layers/models that use provided tensors in custom_variables directly as their weights, so that we could further differentiate the output of the layers/models with respect to any tensors that custom_variables may depend on (particularly useful for implementing functionality in modulating sub-nets, parameter generation, meta-learning, etc.).
Variable scopes used to make it easy to nest all off graph-building inside scopes with custom getters and build models on top of the provided tensors as their parameters. Since sessions and variable scopes are no longer advisable in TF 2.0 (and all of that low-level stuff is moved to tf.compat.v1), what would be the best practice to implement the above using Keras and TF 2.0?
(Related issue on GitHub.)
Answer based on the comment below
Given you have:
kernel = createTheKernelVarBasedOnWhatYouWant() #shape (784, 64)
bias = createTheBiasVarBasedOnWhatYouWant() #shape (64,)
Make a simple function copying the code from Dense:
def custom_dense(x):
inputs, kernel, bias = x
outputs = K.dot(inputs, kernel)
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
Use the function in a Lambda layer:
layer = Lambda(custom_dense)
hiddens = layer([x, kernel, bias])
Warning: kernel and bias must be produced from a Keras layer, or come from an kernel = Input(tensor=the_kernel_var) and bias = Input(tensor=bias_var)
If the warning above is bad for you, you can always use kernel and bias "from outside", like:
def custom_dense(inputs):
outputs = K.dot(inputs, kernel) #where kernel is not part of the arguments anymore
outputs = K.bias_add(outputs, bias, data_format='channels_last')
return outputs
layer = Lambda(custom_dense)
hiddens = layer(x)
This last option makes it a bit more complicated to save/load models.
Old answer
You should probably use a Keras Dense layer and set its weights in a standard way:
layer = tf.keras.layers.Dense(64, name='the_layer')
layer.set_weights([np.random.rand(784, 64), np.random.rand(64)])
If you need that these weights are not trainable, before compiling the keras model you set:
model.get_layer('the_layer').trainable=False
If you want direct access to the variables as tensors, they are:
kernel = layer.kernel
bias = layer.bias
There are plenty of other options, but that depends on your exact intention, which is not clear in your question.
Below is a general-purpose solution that works with arbitrary Keras models in TF2.
First, we need to define an auxiliary function canonical_variable_name and a context manager custom_make_variable with the following signatures (see implementation in meta-blocks library).
def canonical_variable_name(variable_name: str, outer_scope: str):
"""Returns the canonical variable name: `outer_scope/.../name`."""
# ...
#contextlib.contextmanager
def custom_make_variable(
canonical_custom_variables: Dict[str, tf.Tensor], outer_scope: str
):
"""A context manager that overrides `make_variable` with a custom function.
When building layers, Keras uses `make_variable` function to create weights
(kernels and biases for each layer). This function wraps `make_variable` with
a closure that infers the canonical name of the variable being created (of the
form `outer_scope/.../var_name`) and looks it up in the `custom_variables` dict
that maps canonical names to tensors. The function adheres the following logic:
* If there is a match, it does a few checks (shape, dtype, etc.) and returns
the found tensor instead of creating a new variable.
* If there is a match but checks fail, it throws an exception.
* If there are no matching `custom_variables`, it calls the original
`make_variable` utility function and returns a newly created variable.
"""
# ...
Using these functions, we can create arbitrary Keras models with custom tensors used as variables:
import numpy as np
import tensorflow as tf
canonical_custom_variables = {
"model/dense/kernel": tf.constant(
np.random.rand(784, 64), name="custom_kernel", dtype=tf.float32),
"model/dense/bias": tf.constant(
np.random.rand(64), name="custom_bias", dtype=tf.float32),
}
# Compute hiddens using a dense layer with custom variables.
x = tf.random.normal(shape=(1, 784), name="inputs")
with custom_make_variable(canonical_custom_variables, outer_scope="model"):
Layer = tf.layers.Dense(64)
hiddens = Layer(x)
print(Layer.variables)
Not entirely sure I understand your question correctly, but it seems to me that it should be possible to do what you want with a combination of custom layers and keras functional api.
Custom layers allow you to build any layer you want in a way that is compatible with Keras, e.g.:
class MyDenseLayer(tf.keras.layers.Layer):
def __init__(self, num_outputs):
super(MyDenseLayer, self).__init__()
self.num_outputs = num_outputs
def build(self, input_shape):
self.kernel = self.add_weight("kernel",
shape=[int(input_shape[-1]),
self.num_outputs],
initializer='normal')
self.bias = self.add_weight("bias",
shape=[self.num_outputs,],
initializer='normal')
def call(self, inputs):
return tf.matmul(inputs, self.kernel) + self.bias
and the functional api allows you to access the outputs of said layers and re-use them:
inputs = keras.Input(shape=(784,), name='img')
x1 = MyDenseLayer(64, activation='relu')(inputs)
x2 = MyDenseLayer(64, activation='relu')(x1)
outputs = MyDenseLayer(10, activation='softmax')(x2)
model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')
Here x1 and x2 can be connected to other subnets.
I am using the intermediate outputs of a larger model as the input to smaller models and I'm trying to make it one contiguous Model. In order to do so, I have to use the K.function() as a part of the model. This leads to the question:
Is there any way to use a K.function() within a Keras layer?
I created a simple custom layer using:
class ActivationExtraction(Layer):
"""
Extracts all of the outputs of the input_model network and feeds it as input
to the next layer
"""
def __init__(self, input_model, **kwargs):
self.input_model = input_model
# Extracts all outputs
outputs = [layer.output for layer in input_model.layers]
self.output_dim = np.array(outputs).shape
self.names = [layer.name for layer in input_model.layers]
# Evaluation function
self.output_function = K.function([input_model.input] + [K.learning_phase()],
outputs)
super(ActivationExtraction, self).__init__(**kwargs)
def build(self, input_shape):
super(ActivationExtraction, self).build(input_shape)
def call(self, x):
return self.output_function([x, 0])
def compute_output_shape(self, input_shape):
return self.output_dim
However, when I define the model, it returns the error
TypeError: The value of a feed cannot be a tf.Tensor object. Acceptable feed
values include Python scalars, strings, lists, numpy ndarrays, or TensorHandles.
I don't know a workaround for evaluating the tensor at compile time (because of the input shape being dynamic). I have tried using
def call(self, x):
x = K.get_session.run(x)
return self.outputfuntion([x, 0])
as a long shot to try and evaluate the tensor, but I'm not sure what I would feed it (I have limited experience with tensorflow).
As a last resort, I haven't been able to find a way to evaluate a tensor on the fly in a Keras layer either.
Like stated in the title, I was wondering as to how to have the custom layer returning multiple tensors: out1, out2,...outn?
I tried
keras.backend.concatenate([out1, out2], axis = 1)
But this does only work for tensors having the same length, and it has to be another solution rather than concatenating two by two tensors every time, is it?
In the call method of your layer, where you perform the layer calculations, you can return a list of tensors:
def call(self, inputTensor):
#calculations with inputTensor and the weights you defined in "build"
#inputTensor may be a single tensor or a list of tensors
#output can also be a single tensor or a list of tensors
return [output1,output2,output3]
Take care of the output shapes:
def compute_output_shape(self,inputShape):
#calculate shapes from input shape
return [shape1,shape2,shape3]
The result of using the layer is a list of tensors.
Naturally, some kinds of keras layers accept lists as inputs, others don't.
You have to manage the outputs properly using a functional API Model. You're probably going to have problems using a Sequential model while having multiple outputs.
I tested this code on my machine (Keras 2.0.8) and it works perfectly:
from keras.layers import *
from keras.models import *
import numpy as np
class Lay(Layer):
def init(self):
super(Lay,self).__init__()
def build(self,inputShape):
super(Lay,self).build(inputShape)
def call(self,x):
return [x[:,:1],x[:,-1:]]
def compute_output_shape(self,inputShape):
return [(None,1),(None,1)]
inp = Input((2,))
out = Lay()(inp)
print(type(out))
out = Concatenate()(out)
model = Model(inp,out)
model.summary()
data = np.array([[1,2],[3,4],[5,6]])
print(model.predict(data))
import keras
print(keras.__version__)