I'm using the functional API of TensorFlow 2 and tensorflow.keras.layers to build the model.
I have an input tensor (in_1) with shape [batch_size, length, dim] and I would like to compute the mean along the length dimension and obtain an output tensor (out_1) with shape [batch_size, dim].
Which of this should I use to do it? (all these options works, in terms of output shape and training)
out_1 = Lambda(lambda x: tf.math.reduce_mean(x, axis=1))(in_1)
out_1 = Lambda(lambda x: tf.keras.backend.mean(x, axis=1))(in_1)
out_1 = tf.math.reduce_mean(in_1, axis=1)
This last one automatically creates a TensorFlowOpLayer, is this something that should be avoided?
Are there other ways to do this?
What's the difference between tf.math.reduce_mean and tf.keras.backend.mean, which should I use?
I know that custom functions should be called inside the Lambda layer, but is it true also for TensorFlow functions such as tf.math.reduce_mean which can process the tensor in "one fell swoop"? How should I call them if I need to specify a parameter (e.g. axis)?
First, for the difference between tf.keras.backend.mean and tf.math.reduce_mean: There is none. You can check the source code for the keras backend version, which simply uses reduce_mean (from math_ops, but internally that's the same one that's exposed in tf.math). IMHO this is a bit of a failure in the TF re-design where they incorporated Keras: Keras is now contained in TF, but Keras also uses TF in the "backend", so you basically have every operation twice: Once the TF version, and once the Keras version which, after all, also just uses the TF version.
Anyway, for the difference between using Lambda or not: It also doesn't (really) matter. Here is a minimal example:
inp = tf.keras.Input((10,))
layer = tf.reduce_mean(inp, axis=-1)
model = tf.keras.Model(inp, layer)
print(model.layers)
gives the output
[<tensorflow.python.keras.engine.input_layer.InputLayer at 0x7f1a651500b8>,
<tensorflow.python.keras.engine.base_layer.TensorFlowOpLayer at 0x7f1a9912d8d0>]
We can see that the reduce_mean operation was automatically converted to a TensorFlowOpLayer. Now, this may be technically different from a Lambda layer, but I doubt that this makes any practical difference. I suppose this would not work for a Sequential model, where you need to supply a list of layers, so there Lambda would likely be needed.
Related
What does the following line of code do? How to interprete?
model.add(tf.keras.layers.Lambda(lambda x: x * 200))
My interpretation:
Lambda is like a function.
>>> f = lambda x: x + 1
>>> f(3)
4
In the second example the function is called using f(3). But what is the purpose of model.add?
The model.add method adds a layer to the associated Keras model. Now, the argument of this method usually is a Keras layer. In your case, it is a special kind of layer called Lambda. You are right that lambda is a function. In principle, lambda is common syntactic sugar that allows you to declare a simple function without naming it. It would be just like:
def my_func(x):
return x*200
model.add(tf.keras.layers.Lambda(my_func))
As you can see, this is way more code for a very basic functionality. Coming back to the Lambda layer, this just applies the given function to all of the nodes of the previous layer. If you don't understand what a Keras model is or how machine learning works, at least in a broad sense, you may want to start with some tutorials on that instead of looking into what the individual lines of code do. This way you could become productive way faster.
I bet it is used a a last layer. Normally, you can just a have a Dense layer output. However, you can help the training by scaling up the output to around the same figures as your labels. This will depend on the activation functions you used in your model. LSTM or SimpleRNN use tanh by default and that has an output range of [-1,1]. You will use this Lambda() layer to scale the output by 200 before it adjusts the layer weights.
I want to extract features of a optical image and save them into numpy array . I've seen similar questions , also can be seen here : https://keras.io/getting_started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer-feature-extraction , but don't know how to go about it .
Keras documentation does exaclty specify how to do that. If you have defined your model model_full you can create another one, that is just a part of it - from the input layer, to the one you're interested in.
model_part = Model(
inputs=model_full.input,
outputs=model_full.get_layer("intermed_layer").output)
Then you should be able to obtain output from intermediate layer using:
intermed_output = model_part(data)
In order to do that, you just need a model_full defined, which I assume you already have.
2nd approach
You can also use built-in Keras function, which I guess you already saw in documentation as well. It may look kind of complicated at first, but it's just creating a function with bound values i.e.
from keras import backend as K
get_3rd_layer_output = K.function(
[model.layers[0].input], # param 1 will be treated as layer[0].output
[model.layers[3].output]) # and this function will return output from 3rd layer
# here X is param 1 (input) and the function returns output from layers[3]
output = get_3rd_layer_output([X])[0]
Clearly, again model has to be defined. Not sure if there are any other requirements apart from that.
I want to use the external optimizer interface within tensorflow, to use newton optimizers, as tf.train only has first order gradient descent optimizers. At the same time, i want to build my network using tf.keras.layers, as it is way easier than using tf.Variables when building large, complex networks. I will show my issue with the following, simple 1D linear regression example:
import tensorflow as tf
from tensorflow.keras import backend as K
import numpy as np
#generate data
no = 100
data_x = np.linspace(0,1,no)
data_y = 2 * data_x + 2 + np.random.uniform(-0.5,0.5,no)
data_y = data_y.reshape(no,1)
data_x = data_x.reshape(no,1)
# Make model using keras layers and train
x = tf.placeholder(dtype=tf.float32, shape=[None,1])
y = tf.placeholder(dtype=tf.float32, shape=[None,1])
output = tf.keras.layers.Dense(1, activation=None)(x)
loss = tf.losses.mean_squared_error(data_y, output)
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, method="L-BFGS-B")
sess = K.get_session()
sess.run(tf.global_variables_initializer())
tf_dict = {x : data_x, y : data_y}
optimizer.minimize(sess, feed_dict = tf_dict, fetches=[loss], loss_callback=lambda x: print("Loss:", x))
When running this, the loss just does not change at all. When using any other optimizer from tf.train, it works fine. Also, when using tf.layers.Dense() instead of tf.keras.layers.Dense() it does work using the ScipyOptimizerInterface. So really the question is what is the difference between tf.keras.layers.Dense() and tf.layers.Dense(). I saw that the Variables created by tf.layers.Dense() are of type tf.float32_ref while the Variables created by tf.keras.layers.Dense() are of type tf.float32. As far as I now, _ref indicates that this tensor is mutable. So maybe that's the issue? But then again, any other optimizer from tf.train works fine with keras layers.
Thanks
After a lot of digging I was able to find a possible explanation.
ScipyOptimizerInterface uses feed_dicts to simulate the updates of your variables during the optimization process. It only does an assign operation at the very end. In contrast, tf.train optimizers always do assign operations. The code of ScipyOptimizerInterface is not that complex so you can verify this easily.
Now the problem is that assigining variables with feed_dict is working mostly by accident. Here is a link where I learnt about this. In other words, assigning variables via feed dict, which is what ScipyOptimizerInterface does, is a hacky way of doing updates.
Now this hack mostly works, except when it does not. tf.keras.layers.Dense uses ResourceVariables to model the weights of the model. This is an improved version of simple Variables that has cleaner read/write semantics. The problem is that under the new semantics the feed dict update happens after the loss calculation. The link above gives some explanations.
Now tf.layers is currently a thin wrapper around tf.keras.layer so I am not sure why it would work. Maybe there is some compatibility check somewhere in the code.
The solutions to adress this are somewhat simple.
Either avoid using components that use ResourceVariables. This can be kind of difficult.
Patch ScipyOptimizerInterface to do assignments for variables always. This is relatively easy since all the required code is in one file.
There was some effort to make the interface work with eager (that by default uses the ResourceVariables). Check out this link
I think the problem is with the line
output = tf.keras.layers.Dense(1, activation=None)(x)
In this format output is not a layer but rather the output of a layer, which might be preventing the wrapper from collecting the weights and biases of the layer and feed them to the optimizer. Try to write it in two lines e.g.
output = tf.keras.layers.Dense(1, activation=None)
res = output(x)
If you want to keep the original format then you might have to manually collect all trainables and feed them to the optimizer via the var_list option
optimizer = tf.contrib.opt.ScipyOptimizerInterface(loss, var_list = [Trainables], method="L-BFGS-B")
Hope this helps.
I just recently started playing around with Keras and got into making custom layers. However, I am rather confused by the many different types of layers with slightly different names but with the same functionality.
For example, there are 3 different forms of the concatenate function from https://keras.io/layers/merge/ and https://www.tensorflow.org/api_docs/python/tf/keras/backend/concatenate
keras.layers.Concatenate(axis=-1)
keras.layers.concatenate(inputs, axis=-1)
tf.keras.backend.concatenate()
I know the 2nd one is used for functional API but what is the difference between the 3? The documentation seems a bit unclear on this.
Also, for the 3rd one, I have seen a code that does this below. Why must there be the line ._keras_shape after the concatenation?
# Concatenate the summed atom and bond features
atoms_bonds_features = K.concatenate([atoms, summed_bond_features], axis=-1)
# Compute fingerprint
atoms_bonds_features._keras_shape = (None, max_atoms, num_atom_features + num_bond_features)
Lastly, under keras.layers, there always seems to be 2 duplicates. For example, Add() and add(), and so on.
First, the backend: tf.keras.backend.concatenate()
Backend functions are supposed to be used "inside" layers. You'd only use this in Lambda layers, custom layers, custom loss functions, custom metrics, etc.
It works directly on "tensors".
It's not the choice if you're not going deep on customizing. (And it was a bad choice in your example code -- See details at the end).
If you dive deep into keras code, you will notice that the Concatenate layer uses this function internally:
import keras.backend as K
class Concatenate(_Merge):
#blablabla
def _merge_function(self, inputs):
return K.concatenate(inputs, axis=self.axis)
#blablabla
Then, the Layer: keras.layers.Concatenate(axis=-1)
As any other keras layers, you instantiate and call it on tensors.
Pretty straighforward:
#in a functional API model:
inputTensor1 = Input(shape) #or some tensor coming out of any other layer
inputTensor2 = Input(shape2) #or some tensor coming out of any other layer
#first parentheses are creating an instance of the layer
#second parentheses are "calling" the layer on the input tensors
outputTensor = keras.layers.Concatenate(axis=someAxis)([inputTensor1, inputTensor2])
This is not suited for sequential models, unless the previous layer outputs a list (this is possible but not common).
Finally, the concatenate function from the layers module: keras.layers.concatenate(inputs, axis=-1)
This is not a layer. This is a function that will return the tensor produced by an internal Concatenate layer.
The code is simple:
def concatenate(inputs, axis=-1, **kwargs):
#blablabla
return Concatenate(axis=axis, **kwargs)(inputs)
Older functions
In Keras 1, people had functions that were meant to receive "layers" as input and return an output "layer". Their names were related to the merge word.
But since Keras 2 doesn't mention or document these, I'd probably avoid using them, and if old code is found, I'd probably update it to a proper Keras 2 code.
Why the _keras_shape word?
This backend function was not supposed to be used in high level codes. The coder should have used a Concatenate layer.
atoms_bonds_features = Concatenate(axis=-1)([atoms, summed_bond_features])
#just this line is perfect
Keras layers add the _keras_shape property to all their output tensors, and Keras uses this property for infering the shapes of the entire model.
If you use any backend function "outside" a layer or loss/metric, your output tensor will lack this property and an error will appear telling _keras_shape doesn't exist.
The coder is creating a bad workaround by adding the property manually, when it should have been added by a proper keras layer. (This may work now, but in case of keras updates this code will break while proper codes will remain ok)
Keras historically supports 2 different interfaces for their layers, the new functional one and the old one, that requires model.add() calls, hence the 2 different functions.
For the TF -- their concatenate() functions does not do everything that required for Keras to work, hence, the additional calls to make ._keras_shape variable correct and not to upset Keras that expects that variable to have some particular value.
I need to implement a custom objective function for Keras where i need an additional tensorflow placeholder for computation. In tensorflow, i have it as following,
pre_cost1 = tf.multiply((self.input_R - self.Decoder) , self.input_mask_R)
cost1 = tf.square(self.l2_norm(pre_cost1))
where input_mask_R is the tensorflow placeholder. input_R and Decoder are the placeholders corresponding to y_true and y_pred for Keras loss function respectively. I have the Keras loss function implemented as,
def custom_objective(y_true, y_pred):
pre_cost1 = tf.multiply((y_true - y_pred))
cost1 = tf.square(l2_norm(pre_cost1))
return cost1
I need to add the additional information for input mask in the loss function for keras. (It needs to be tensorflow placeholder since its a mask for the input which is different for each row of the input data).
Use the keras backend:
import keras.backend as K
Most functions for tensors are there, such as:
input_mask_R = K.placeholder(shape=(yourshape))
But maybe, since you want a predefined mask, what you need is:
input_mask_R = K.constant(arrayWithValues, shape=(yourshape))
And you can actually multiply and square also with K.multiply and K.square. That way, if you ever think of changing the backend, everything will be ok. (Also I'm not sure if Keras will handle direct calls to tensorflow functions.....)
See documentation: https://keras.io/backend/