I have defined the custom class as follows:
class MLPClassifier(nn.Module):
"""
A basic multi-layer perceptron classifier with 3 layers.
"""
def __init__(self, input_size, hidden_size, num_classes):
"""
The constructor for the MLPClassifier class.
"""
super(MLPClassifier, self).__init__()
self.fc1 = nn.Linear(input_size, hidden_size) # weights & biases for the input-to-hidden layer
self.ac1 = nn.ReLU() # non-linear activation for the input-to-hidden layer
self.fc2 = nn.Linear(hidden_size, num_classes) # weights & biases for the hidden-to-output layer
self.ac2 = nn.Softmax(dim=1) # non-linear activation for the hidden-to-output layer
When I run the following script I get this:
hyper_param_input_size = 4
hyper_param_hidden_size = 64
hyper_param_num_classes = 3
model = MLPClassifier(hyper_param_input_size, hyper_param_hidden_size, hyper_param_num_classes)
for p in model.parameters():
print(p.shape)
>>> torch.Size([64, 4])
>>> torch.Size([64])
>>> torch.Size([3, 64])
>>> torch.Size([3])
How on earth does PyTorch automatically know about my internally defined attributes when I never explicitly told it? Does it loop through everything in the class and check if isinstance(self, nn.Layer) or something?
The nn.Module.parameters function will recursively go through all child modules of the parent and return all its parameters. It does not have to do with the actual structure of your MLPClassifier module. When defining a new submodule attribute inside your __init__ the parent module will register it as a child module, this way their parameters (if any, eg. your nn.ReLU and nn.Softmax don't have any...) can later be accessed via the parameters call.
Related
In the keras doc, it says that if we want to pick the intermediate layer's output of the model (sequential and functional), all we need to do as follows:
model = ... # create the original model
layer_name = 'my_layer'
intermediate_layer_model = keras.Model(inputs=model.input,
outputs=model.get_layer(layer_name).output)
intermediate_output = intermediate_layer_model(data)
So, here we get two models, the intermediate_layer_model is the sub-model of its parent model. And they're independent as well. Likewise, if we get the intermediate layer's output feature maps of the parent model (or base model), and do some operation with it and get some output feature maps from this operation, then we can also impute this output feature maps back to the parent model.
input = tf.keras.Input(shape=(size,size,3))
model = tf.keras.applications.DenseNet121(input_tensor = input)
layer_name = "conv1_block1" # for example
output_feat_maps = SomeOperationLayer()(model.get_layer(layer_name).output)
# assume, they're able to add up
base = Add()([model.output, output_feat_maps])
# bind all
imputed_model = tf.keras.Model(inputs=[model.input], outputs=base)
So, in this way we have one modified model. It's quite easy with functional API. All the keras imagenet models are written with functional API (mostly). In model subclassing API, we can use these models. My concern here is, what to do if we need the intermediate output feature maps of these functional API models' inside call function.
class Subclass(tf.keras.Model):
def __init__(self, dim):
super(Subclass, self).__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim)
# building new model with the desired output layer of base model
self.mid_layer_model = tf.keras.Model(self.base.inputs,
self.base.get_layer(layer_name).output)
def call(self, inputs):
# forward with base model
x = self.base(inputs)
# forward with mid_layer_model
mid_feat = self.mid_layer_model(inputs)
# do some op with it
mid_x = SomeOperationLayer()(mid_feat)
# assume, they're able to add up
out = tf.keras.layers.add([x, mid_x])
return out
The issue is, here we've technically two models in a joint fashion. But unlike building a model like this, here we simply want the intermediate output feature maps (from some inputs) of the base model forward manner and use it somewhere else and get some output. Like this
mid_x = SomeOperationLayer()(self.base.get_layer(layer_name).output)
But it gives ValueError: Graph disconnected. So, currently, we have to build a new model from the base model based on our desired intermediate layer. In the init method we define or create new self.mid_layer_model model that gives our desired output feature maps like this: mid_feat = self.mid_layer_model(inputs). Next, we take the mid_faet and do some operation and get some output and lastly add them with tf.keras.layers.add([x, mid_x]). So by creating a new model with desired intermediate out works but by the same time, we repeat the same operation twice i.e the base model and its subset model. Maybe I'm missing something obvious, please add up something. Is it how it is! or there some strategies we can adopt. I've asked in the forum here, no response yet.
Update
Here is a working example. Let's say we have a custom layer like this
import tensorflow as tf
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Add
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
And we want to impute this layer into an ImageNet model and construct a model like this
input = tf.keras.Input(shape=(32, 32, 3))
base = DenseNet121(weights=None, input_tensor = input)
# get output feature maps of at certain layer, ie. conv2_block1_0_relu
cb = ConvBlock()(base.get_layer("conv2_block1_0_relu").output)
flat = Flatten()(cb)
dense = Dense(1000)(flat)
# adding up
adding = Add()([base.output, dense])
model = tf.keras.Model(inputs=[base.input], outputs=adding)
from tensorflow.keras.utils import plot_model
plot_model(model,
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
Here the computation from input to layer conv2_block1_0_relu is computed one time. Next, if we want to translate this functional API to subclassing API, we had to build a model from the base model's input to layer conv2_block1_0_relu first. Like
class ModelWithMidLayer(tf.keras.Model):
def __init__(self, dim=(32, 32, 3)):
super().__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim, weights=None)
# building sub-model from self.base which gives
# desired output feature maps: ie. conv2_block1_0_relu
self.mid_layer = tf.keras.Model(self.base.inputs,
self.base.get_layer("conv2_block1_0_relu").output)
self.flat = Flatten()
self.dense = Dense(1000)
self.add = Add()
self.cb = ConvBlock()
def call(self, x):
# forward with base model
bx = self.base(x)
# forward with mid layer
mx = self.mid_layer(x)
# make same shape or do whatever
mx = self.dense(self.flat(mx))
# combine
out = self.add([bx, mx])
return out
def build_graph(self):
x = tf.keras.layers.Input(shape=(self.dim))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
mwml = ModelWithMidLayer()
plot_model(mwml.build_graph(),
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
Here model_1 is actually a sub-model from DenseNet, which probably leads the whole model (ModelWithMidLayer) to compute the same operation twice. If this observation is correct, then this gives us concern.
I thought it might be much complex but it's actually rather very simple. We just need to build a model with desired output layers at the __init__ method and use it normally in the call method.
import tensorflow as tf
from tensorflow.keras.applications import DenseNet121
from tensorflow.keras.layers import Add
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten
class ConvBlock(tf.keras.layers.Layer):
def __init__(self, kernel_num=32, kernel_size=(3,3), strides=(1,1), padding='same'):
super(ConvBlock, self).__init__()
# conv layer
self.conv = tf.keras.layers.Conv2D(kernel_num,
kernel_size=kernel_size,
strides=strides, padding=padding)
# batch norm layer
self.bn = tf.keras.layers.BatchNormalization()
def call(self, input_tensor, training=False):
x = self.conv(input_tensor)
x = self.bn(x, training=training)
return tf.nn.relu(x)
class ModelWithMidLayer(tf.keras.Model):
def __init__(self, dim=(32, 32, 3)):
super().__init__()
self.dim = dim
self.base = DenseNet121(input_shape=self.dim, weights=None)
# building sub-model from self.base which gives
# desired output feature maps: ie. conv2_block1_0_relu
self.mid_layer = tf.keras.Model(
inputs=[self.base.inputs],
outputs=[
self.base.get_layer("conv2_block1_0_relu").output,
self.base.output])
self.flat = Flatten()
self.dense = Dense(1000)
self.add = Add()
self.cb = ConvBlock()
def call(self, x):
# forward with base model
bx = self.mid_layer(x)[1] # output self.base.output
# forward with mid layer
mx = self.mid_layer(x)[0] # output base.get_layer("conv2_block1_0_relu").output
# make same shape or do whatever
mx = self.dense(self.flat(mx))
# combine
out = self.add([bx, mx])
return out
def build_graph(self):
x = tf.keras.layers.Input(shape=(self.dim))
return tf.keras.Model(inputs=[x], outputs=self.call(x))
mwml = ModelWithMidLayer()
tf.keras.utils.plot_model(mwml.build_graph(),
show_shapes=True, show_dtype=True,
show_layer_names=True,expand_nested=False)
I want to name the outputs of a subclassed TensorFlow Keras Model, so I can pass targets to them in fit(), e.g. self.model.fit(np_inputs, {'q_values': np_targets}, verbose=0)
The model looks like this:
class MyModel(tf.keras.models.Model):
def __init__(self, name):
super(MyModel, self).__init__()
self.input_layer = tf.keras.Input(shape=(BOARD_SIZE * 3,))
self.d1 = tf.keras.layers.Dense(BOARD_SIZE * 3 * 9, activation='relu')
self.d2 = tf.keras.layers.Dense(BOARD_SIZE * 3 * 100, activation='relu')
self.d3 = tf.keras.layers.Dense(BOARD_SIZE * 3 * 9, activation='relu')
self.q_values_l = tf.keras.layers.Dense(BOARD_SIZE, activation=None, name='q_values')
self.probabilities_l = tf.keras.layers.Softmax(name='probabilities')
#tf.function
def call(self, input_data):
x = self.d1(input_data)
x = self.d2(x)
x = self.d3(x)
q = self.q_values_l(x)
p = self.probabilities_l(q)
return p, q
I naively assumed the name of the corresponding layers would also be assigned to the outputs, but this does not seem to be the case.
I only have targets to 1 of the outputs, thus the need to exactly specify what output the targets are for when calling fit().
In the functional way of using Keras this works well, but I can't replicate it in the subclass approach. I can't use the functional Keras way in my case for unrelated reasons.
Why not just pass a dummy target?
model.fit(np_inputs, [np.zeros((len(np_inputs),)), np_targets], ...)
Maybe even None can be passed instead of np.zeros.
You can compile the model exactly the same way:
model.compile(loss=[p_loss, q_loss], ...)
I would like to add a node-specific variable to each Keras activation function. I would like each node to compute the activation value (output) with a different value (alpha).
This can be done globally, for example with the alpha parameter for the relu activation function (link):
# Build Model
...
model.add(Dense(units=128))
model.add(Activation(lambda x: custom_activation(x, alpha=0.1)))
...
I can also write a custom activation function, but the alpha parameter is also global. (link)
# Custom activation function
def custom_activation(x, alpha=0.0):
return (K.sigmoid(x + alpha))
# Build Model
...
model.add(Dense(units=128))
model.add(Activation(lambda x: custom_activation(x, alpha=0.1)))
...
Inside the custom function I only currently have access to the following variables:
(Pdb) locals()
{'x': <tf.Tensor 'dense/Identity:0' shape=(None, 128) dtype=float32>, 'alpha': 0.1}
I would like to use a custom activation function, but for alpha to be unique to each node in the network. For example, if there are 128 units in the layer, then I would like there also to be 128 values of alpha, one for each unit / node. I would then like the activation function to
How do I create an alpha value that is unique to each unit / node in a layer?
I would not recommend to use lambda layer for that one, it is to hackish. I suggest you write your own layer as follows:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Custom layer
class CustomAct(tf.keras.layers.Layer):
def __init__(self):
super(CustomAct, self).__init__()
def build(self, input_shape):
self.alpha = self.add_weight(name='alpha',
shape=[input_shape[1], ],
initializer='uniform',
trainable=True)
super(CustomAct, self).build(input_shape)
def call(self, x):
return tf.sigmoid(x+self.alpha)
def get_alpha(self):
return self.alpha
inputs = np.random.random([16, 32]).astype(np.float32)
# Model
model = tf.keras.models.Sequential()
model.add(tf.keras.Input(inputs.shape[-1]))
model.add(tf.keras.layers.Dense(128))
model.add(CustomAct())
# Test
model.compile(loss="MSE")
alpha_after_initialization = model.layers[-1].get_alpha()
plt.plot(alpha_after_initialization.numpy())
x = np.random.random([18, 32])
y = np.random.random([18, 128])
for _ in range(20):
model.fit(x, y)
out_after_20_steps = alpha_after_initialization = model.layers[-1].get_alpha()
plt.plot(alpha_after_initialization.numpy())
plt.show()
Of course, you should change all tf refernces to your keras ones.
I've keras model defined as follow
class ConvLayer(Layer) :
def __init__(self, nf, ks=3, s=2, **kwargs):
self.nf = nf
self.grelu = GeneralReLU(leak=0.01)
self.conv = (Conv2D(filters = nf,
kernel_size = ks,
strides = s,
padding = "same",
use_bias = False,
activation = "linear"))
super(ConvLayer, self).__init__(**kwargs)
def rsub(self): return -self.grelu.sub
def set_sub(self, v): self.grelu.sub = -v
def conv_weights(self): return self.conv.weight[0]
def build(self, input_shape):
# No weight to train.
super(ConvLayer, self).build(input_shape) # Be sure to call this at the end
def compute_output_shape(self, input_shape):
output_shape = (input_shape[0],
input_shape[1]/2,
input_shape[2]/2,
self.nf)
return output_shape
def call(self, x):
return self.grelu(self.conv(x))
def __repr__(self):
return f'ConvLayer(nf={self.nf}, activation={self.grelu})'
class ConvModel(tf.keras.Model):
def __init__(self, nfs, input_shape, output_shape, use_bn=False, use_dp=False):
super(ConvModel, self).__init__(name='mlp')
self.use_bn = use_bn
self.use_dp = use_dp
self.num_classes = num_classes
# backbone layers
self.convs = [ConvLayer(nfs[0], s=1, input_shape=input_shape)]
self.convs += [ConvLayer(nf) for nf in nfs[1:]]
# classification layers
self.convs.append(AveragePooling2D())
self.convs.append(Dense(output_shape, activation='softmax'))
def call(self, inputs):
for layer in self.convs: inputs = layer(inputs)
return inputs
I'm able to compile this model without any issues
>>> model.compile(optimizer=tf.keras.optimizers.Adam(lr=lr),
loss='categorical_crossentropy',
metrics=['accuracy'])
But when I query the summary for this model, I see this error
>>> model = ConvModel(nfs, input_shape=(32, 32, 3), output_shape=num_classes)
>>> model.summary()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-220-5f15418b3570> in <module>()
----> 1 model.summary()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/network.py in summary(self, line_length, positions, print_fn)
1575 """
1576 if not self.built:
-> 1577 raise ValueError('This model has not yet been built. '
1578 'Build the model first by calling `build()` or calling '
1579 '`fit()` with some data, or specify '
ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.
I'm providing input_shape for the first layer of my model, why is throwing this error?
The error says what to do:
This model has not yet been built. Build the model first by calling build()
model.build(input_shape) # `input_shape` is the shape of the input data
# e.g. input_shape = (None, 32, 32, 3)
model.summary()
There is a very big difference between keras subclassed model and other keras models (Sequential and Functional).
Sequential models and Functional models are datastructures that represent a DAG of layers. In simple words, Functional or Sequential model are static graphs of layers built by stacking one on top of each other like LEGO. So when you provide input_shape to first layer, these (Functional and Sequential) models can infer shape of all other layers and build a model. Then you can print input/output shapes using model.summary().
On the other hand, subclassed model is defined via the body (a call method) of Python code. For subclassed model, there is no graph of layers here. We cannot know how layers are connected to each other (because that's defined in the body of call, not as an explicit data structure), so we cannot infer input / output shapes. So for a subclass model, the input/output shape is unknown to us until it is first tested with proper data. In the compile() method, we will do a deferred compile and wait for a proper data. In order for it to infer shape of intermediate layers, we need to run with a proper data and then use model.summary(). Without running the model with a data, it will throw an error as you noticed. Please check GitHub gist for complete code.
The following is an example from Tensorflow website.
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
class ThreeLayerMLP(keras.Model):
def __init__(self, name=None):
super(ThreeLayerMLP, self).__init__(name=name)
self.dense_1 = layers.Dense(64, activation='relu', name='dense_1')
self.dense_2 = layers.Dense(64, activation='relu', name='dense_2')
self.pred_layer = layers.Dense(10, name='predictions')
def call(self, inputs):
x = self.dense_1(inputs)
x = self.dense_2(x)
return self.pred_layer(x)
def get_model():
return ThreeLayerMLP(name='3_layer_mlp')
model = get_model()
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255
model.compile(loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
optimizer=keras.optimizers.RMSprop())
model.summary() # This will throw an error as follows
# ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic build.
# Need to run with real data to infer shape of different layers
history = model.fit(x_train, y_train,
batch_size=64,
epochs=1)
model.summary()
Thanks!
Another method is to add the attribute input_shape() like this:
model = Sequential()
model.add(Bidirectional(LSTM(n_hidden,return_sequences=False, dropout=0.25,
recurrent_dropout=0.1),input_shape=(n_steps,dim_input)))
# X is a train dataset with features excluding a target variable
input_shape = X.shape
model.build(input_shape)
model.summary()
Make sure you create your model properly. A small typo mistake like the following code may also cause a problem:
model = Model(some-input, some-output, "model-name")
while the correct code should be:
model = Model(some-input, some-output, name="model-name")
If your Tensorflow, Keras version is 2.5.0 then just add Tensorflow when you import Keras package
Not this:
from tensorflow import keras
from keras.models import Sequential
import tensorflow as tf
Like this:
from tensorflow import keras
from tensorflow.keras.models import Sequential
import tensorflow as tf
Version issues of your Tensorflow, Keras, can be the reason for this.
Same problem I encountered during training for the LSTM model for regression.
Error:
ValueError: This model has not yet been built. Build the model first
by calling build() or by calling the model on a batch of data.
Earlier:
from tensorflow.keras.models import Sequential
from tensorflow.python.keras.models import Sequential
Corrected:
from keras.models import Sequential
I was also facing same error, so I have removed model.summary(). Then issue is resolved. As it arises if model of summary is defined before the model is built.
Here is the LINK for description which states that
Raises:
ValueError: if `summary()` is called before the model is built.**
I want to create a simple neural network using Tensorflow and Keras.
When I try to instantiate a Model by subclassing the Model class
class TwoLayerFC(tf.keras.Model):
def __init__(self, hidden_size, num_classes):
super(TwoLayerFC, self).__init__()
self.fc1 = keras.layers.Dense(hidden_size,activation=tf.nn.relu)
self.fc2 = keras.layers.Dense(num_classes)
def call(self, x, training=None):
x = tf.layers.flatten(x)
x = self.fc1(x)
x = self.fc2(x)
return x
This is how I test the network
def test_TwoLayerFC():
tf.reset_default_graph()
input_size, hidden_size, num_classes = 50, 42, 10
model = TwoLayerFC(hidden_size, num_classes)
with tf.device(device):
x = tf.zeros((64, input_size))
scores = model(x)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
scores_np = sess.run(scores)
print(scores_np.shape)
I get an error:
TypeError: init() takes at least 3 arguments (2 given)
I followed this tutorial, and it seems that there should be two parameters.
I read your code and I see a PyTorch model being created, including the mistake in the second Dense layer with two passed numbers.
Keras models should not follow the same logic of PyTorch models.
This model should be created like this:
input_tensor = Input(input_shape)
output_tensor = Flatten()(input_tensor)
output_tensor = Dense(hidden_size, activation='relu')(output_tensor)
output_tensor = Dense(num_classes)
model = keras.models.Model(input_tensor, output_tensor)
This model instance is ready to be compiled and trained:
model.compile(optimizer=..., loss = ..., metrics=[...])
model.fit(x_train, y_train, epochs=..., batch_size=..., ...)
There is no reason in Keras to subclass Model, unless you're a really advanced user trying some very unconventional things.
By the way, be careful not to mix tf.keras.anything with keras.anything. The first is a version of Keras maitained directly by tensorflow, while the second is original Keras. They're not exactly the same, tensorflow's version seems more buggy and mixing the two in the same code sounds like a bad idea.