Reading through the documentation of implementing custom layers with tf.keras, they specify two options to inherit from, tf.keras.Layer and tf.keras.Model.
Under the context of creating custom layers, I'm asking myself what is the difference between these two? Technically what is different?
If I were to implement the transformer encoder for example, which one would be more suitable? (assuming the transformer is a only a "layer" in my full model)
In the documentation:
The Model class has the same API as Layer, with the following
differences: - It exposes built-in training, evaluation, and
prediction loops (model.fit(), model.evaluate(), model.predict()). -
It exposes the list of its inner layers, via the model.layers
property. - It exposes saving and serialization APIs.
Effectively, the "Layer" class corresponds to what we refer to in the
literature as a "layer" (as in "convolution layer" or "recurrent
layer") or as a "block" (as in "ResNet block" or "Inception block").
Meanwhile, the "Model" class corresponds to what is referred to in the
literature as a "model" (as in "deep learning model") or as a
"network" (as in "deep neural network").
So if you want to be able to call .fit(), .evaluate(), or .predict() on those blocks or you want to be able to save and load those blocks separately or something you should use the Model class. The Layer class is leaner so you won't bloat the layers with unnecessary functionality...but I would guess that that generally wouldn't be a big problem.
A layer takes in a tensor and give out a tensor which is a result of
some tensor operations
A model is a composition of multiple layers.
If you are building a new model architecture using existing keras/tf layers then build a custom model.
If you are implementing your own custom tensor operations with in a layer, then build a custom layer. If you are using non tensor operation inside your custom layer, then you have to code how the layer will forward propagate and backward propagate.
Related
I'm learning to use PyTorch. If anyone is familiar with PyTorch could they tell me if all models can be nn.Sequential?
I'm asking because some framework features only accept as input a model defined as nn.Sequential
Yes, but only in the sense that you could wrap a highly complex model as a single step in ‘nn.Sequential’. If you want an example of a model that breaks sequential behavior, look up ResNet and its ilk. These require data to be passed between “layers”; however, even those can be implemented using ‘nn.Sequential’ by creating special ‘nn.Module’ classes to handle the residual functionality, and stacking those blocks together into sequential.
I am trying to use a custom neural network with the DqnAgent() from tf. In my model I need to use layer sharing. Thus, I use the functional API to build the model. The model has a dict as input and one layer with n neurons as output. The last layer is a Concat- and not a Dens-Layer though. The type of the model that i get from the functional API keras.Model(inputs=[...], outpunts[...]
is "keras.engine.functional.Functional".
Now i want to use my Model with the tf Agent like this:
agent = dqn_agent.DqnAgent(
train_env.time_step_spec(),
train_env.action_spec(),
q_network=model,
optimizer=optimizer,
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter)
I get the following Error thoug:
AttributeError: 'Functional' object has no attribute 'create_variables'
In call to configurable 'DqnAgent' (<class 'tf_agents.agents.dqn.dqn_agent.DqnAgent'>)
The q_network expects a network from type "network.Network". I am not sure how to convert or wrap my environment in such a way, that the DqnAgent() will accept it. How could I manage to do this? Any support is much appreciated. If you need more information about anything let me know.
Additional information about my Network:
Input dict with multiple inputs.
multiple shared dens layers. output of the last one is shape (1,).
concatenate all those outputs of shape (1,)
one multipy layer to eliminate infeasible actions by multiplying the outputs with 0 or 1 respectively.
In some tf. keras tutorials, I've seen them instantiated their model class like this:
model = tf.keras.Sequential()
While in some places, they use something like this:
model = tf.keras.Model(inputs=input, outputs=output)
But seeing here in the docs, they do seem the same, but I am not sure nor is it explicitly mentioned. What are the differences between the two?
There are two class API to define a model in tf. keras. According to the doc
Sequential class: Sequential groups a linear stack of layers into a tf. keras.Model.
Model class: Model group's layers into an object with training and inference features.
An Sequential model is the simplest type of model, a linear stack of layers. But there are some flaws in using the sequential model API, it's limited in certain points. We can't build complex networks such as multi-input or multi-output networks using this API.
But using Model class, we can instantiate a Model with the Functional API (and also with Subclassing the Model class) that allows us to create arbitrary graphs of layers. From this, we can get more flexibility and easily define models where each layer can connect not just with the previous and next layers but also share feature information with other layers in the model, for example, model-like ResNet, EfficientNet.
In fact, most of the SOTA model that you can get from tf.keras.applications is basically implemented using the Functional API. However, in subclassing API, we define our layers in __init__ and we implement the model's forward pass in the call method.
Generally speaking, all the model definitions using Sequential API, can be achieved in Functional API or Model Subclassing API. And in Functional API or Model Subclassing API, we can create complex layers that not possible to achieve in Sequential API. If you wondering which one to choose, the answer is, it totally depends on your need. However, check out the following blog post where we have discussed the various model strategies in tf. keras with more examples. Model Sub-Classing and Custom Training Loop from Scratch in TensorFlow 2
It's because they are from different versions of tensorflow. According to the documentation (TensorFlow 2.0), tf.keras.Sequential is the most recent way of calling the function. If you go to the documentation, and click on "View aliases", you can see the different aliases used in older version of Tensorflow for that function.
I was trying to load the keras model which I saved during my training.So I went to keras documentation where I saw this.
Only topological loading (by_name=False) is supported when loading
weights from the TensorFlow format. Note that topological loading
differs slightly between TensorFlow and HDF5 formats for user-defined
classes inheriting from tf.keras.Model: HDF5 loads based on a
flattened list of weights, while the TensorFlow format loads based on
the object-local names of attributes to which layers are assigned in
the Model's constructor.
Could you please explain the above one?
For clarity purpose let's consider two cases.
Case 1: Simple model, and
Case 2: Complex model where user-defined classes inherited from tf.keras.Model were used.
Case 1: Simple model (as in keras Functional and Sequential models)
When you save model weights (using model.save_weights) and then load weights (using model.load_weights), by default the load_weights method uses topological loading. This is same for Tensorflow saved_model ('tf') format as well as 'h5' format. For example,
loadedh5_model.load_weights('./MyModel_h5.h5')
# the line above is same as the line below (as second and third arguments are default)
#loadedh5_model.load_weights('./MyModel_h5.h5',by_name=False, skip_mismatch=False)
In case if you want to load weights of specific layers of a saved model, then you need to use by_name=True. There are use cases that requires this type of loading.
loadedh5_model.load_weights('./MyModel_h5.h5',by_name=True, skip_mismatch=False)
Case 2: Complex model(as in Keras Subclassed models)
As of now only 'tf' format is only supported when user-defined classes inherited from tf.keras.Model were used in the model creation.
Only topological loading (by_name=False) is supported when loading
weights from the TensorFlow format. Note that topological loading
differs slightly between TensorFlow and HDF5 formats for user-defined
classes inheriting from tf.keras.Model: HDF5 loads based on a
flattened list of weights, while the TensorFlow format loads based on
the object-local names of attributes to which layers are assigned in
the Model's constructor.
The main reason is the way weights are in h5 format and tf format.
For example, consider Case 1 where HDF5 loads based on a flattened list of weights. The weights are loaded without any error. However, in Case 2, the model has user defined classes which requires different approach than just loading flattened weights. In order to take care of assigning weights of custom classes, 'tf' format load the weights based on the object-local names of attributes to which layers are assigned in the Model's constructor.
The following paragraph mentioned in keras website, further clarifies
When loading a weight file in TensorFlow format, returns the same
status object as tf.train.Checkpoint.restore. When graph building,
restore ops are run automatically as soon as the network is built (on
first call for user-defined classes inheriting from Model, immediately
if it is already built).
Another point to understand is keras Functional or Sequential models are static graphs of layers that can use flattened weights without any problem. Keras Subclassed model (as in our Case 2), is piece of Python code (a call method). There is no graph of layers. So as soon as the network is built with custom classes, restore ops are run to update status objects. Hope it helps.
Tensorflow has some docs for subclassing (tf) Keras Model and Layer.
However, it is unclear which to use for "modules" or "blocks" (e.g. several layers collectively).
Since it is technically several layers, I feel that subclassing Layer would be deceiving, and while subclassing Model works, I am unsure if there are any negative penalties for doing so.
e.g.
x = inputs
a = self.dense_1(x) # <--- self.dense_1 = tf.keras.Dense(...)
b = self.dense_2(a)
c = self.add([x, b])
which is appropriate to use?
(Please note that this answer is old, later, Keras changed to allow and use subclassing regularly)
Initially, there is no need to sublass anything with Keras. Unless you have a particular reason for that (which is not building, training, predicting), you don't subclass for Keras.
Buiding a Keras model:
Either using Sequential (the model is ready already, just add layers), or using Model (create a graph with layers and finally call Model(inputs, outputs)), you don't need to subclass.
The moment you create an instance of Sequential or Model, you have a fully defined model, ready to use in all situations.
This model can even be used as parts of other models, its layers can be easily accessed to get intermetiate outputs and create new branches in your graph.
So, I don't see any reason at all to subclass Model, unless you are using some additional framework that would require this (but I don't think so). This seems to be something from PyTorch users (because this kind of model building is typical for PyTorch, create a subclass for Module and add layers and a call method). But Pytorch doesn't offer the same ease as Keras does for getting intermediate results.
The main advantage of using Keras is exactly this: you can easily access layers from blocks and models and instantly start branching from that point without needing to rebuild any call methods or adding any extra code for that in the models. So, when you subclass Model, you just defeat the purpose of Keras making it all more difficult.
The docs you mentioned say:
Model subclassing is particularly useful when eager execution is enabled since the forward pass can be written imperatively.
I don't really understand what "imperatively" means, and I don't see how it would be easier than just building a model with regular layers.
Another quote from the docs:
Key Point: Use the right API for the job. While model subclassing offers flexibility, it comes at a cost of greater complexity and more opportunities for user errors. If possible, prefer the functional API.
Well... it's always possible.
Subclassing layers
Here, there may be good reasons to do so. And these reasons are:
You want a layer that performs custom calculations that are not available with regular layers
This layer must have persistent weights.
If you don't need "both" things above, you don't need to subclass a layer. If you just want "custom calculations" without weights, a Lambda layer is enough.