I'm learning to use PyTorch. If anyone is familiar with PyTorch could they tell me if all models can be nn.Sequential?
I'm asking because some framework features only accept as input a model defined as nn.Sequential
Yes, but only in the sense that you could wrap a highly complex model as a single step in ‘nn.Sequential’. If you want an example of a model that breaks sequential behavior, look up ResNet and its ilk. These require data to be passed between “layers”; however, even those can be implemented using ‘nn.Sequential’ by creating special ‘nn.Module’ classes to handle the residual functionality, and stacking those blocks together into sequential.
Related
In some tf. keras tutorials, I've seen them instantiated their model class like this:
model = tf.keras.Sequential()
While in some places, they use something like this:
model = tf.keras.Model(inputs=input, outputs=output)
But seeing here in the docs, they do seem the same, but I am not sure nor is it explicitly mentioned. What are the differences between the two?
There are two class API to define a model in tf. keras. According to the doc
Sequential class: Sequential groups a linear stack of layers into a tf. keras.Model.
Model class: Model group's layers into an object with training and inference features.
An Sequential model is the simplest type of model, a linear stack of layers. But there are some flaws in using the sequential model API, it's limited in certain points. We can't build complex networks such as multi-input or multi-output networks using this API.
But using Model class, we can instantiate a Model with the Functional API (and also with Subclassing the Model class) that allows us to create arbitrary graphs of layers. From this, we can get more flexibility and easily define models where each layer can connect not just with the previous and next layers but also share feature information with other layers in the model, for example, model-like ResNet, EfficientNet.
In fact, most of the SOTA model that you can get from tf.keras.applications is basically implemented using the Functional API. However, in subclassing API, we define our layers in __init__ and we implement the model's forward pass in the call method.
Generally speaking, all the model definitions using Sequential API, can be achieved in Functional API or Model Subclassing API. And in Functional API or Model Subclassing API, we can create complex layers that not possible to achieve in Sequential API. If you wondering which one to choose, the answer is, it totally depends on your need. However, check out the following blog post where we have discussed the various model strategies in tf. keras with more examples. Model Sub-Classing and Custom Training Loop from Scratch in TensorFlow 2
It's because they are from different versions of tensorflow. According to the documentation (TensorFlow 2.0), tf.keras.Sequential is the most recent way of calling the function. If you go to the documentation, and click on "View aliases", you can see the different aliases used in older version of Tensorflow for that function.
I'm using Tensorflow 1.14 and the tf.keras API to build a number (>10) of differnet neural networks. (I'm also interested in the answers to this question using Tensorflow 2). I'm wondering how I should organize my project.
I convert the keras models into estimators using tf.keras.estimator.model_to_estimator and Tensorboard for visualization. I'm also sometimes using model.summary(). Each of my models has a number (>20) of hyperparameters and takes as input one of three types of input data. I sometimes use hyperparameter optimization, such that I often manually delete models and use tf.keras.backend.clear_session() before trying the next set of hyperparameters.
Currently I'm using functions that take hyperparameters as arguments and return the respective compiled keras model to be turned into an estimator. I use three different "Main_Datatype.py" scripts to train models for the three different input data types. All data is loaded from .tfrecord files and there is an input function for each data type, which is used by all estimators taking that type of data as input. I switch between models (i.e. functions returning a model) in the Main scripts. I also have some building blocks that are part of more than one model, for which I use helper functions returning them, piecing together the final result using the Keras functional API.
The slight incompatibilities of the different models are begining to confuse me and I've decided to organise the project using classes. I'm planing to make a class for each model that keeps track of hyperparameters and correct naming of each model and its model directory. However, I'm wondering if there are established or recomended ways to do this in Tensorflow.
Question: Should I be subclassing tf.keras.Model instead of using functions to build models or python classes that encapsulate them? Would subclassing keras.Model break (or require much work to enable) any of the functionality that I use with keras estimators and tensorboard? I've seen many issues people have with using custom Model classes and am somewhat reluctant to put in the work only to find that it doesn't work for me. Do you have other suggestions how to better organize my project?
Thank you very much in advance.
Subclass only if you absolutely need to. I personally prefer following the following order of implementation. If the complexity of the model you are designing, can not be achieved using the first two options, then of course subclassing is the only option left.
tf.keras Sequential API
tf.keras Functional API
Subclass tf.keras.Model
Seems like a reasonable thing to do: https://www.tensorflow.org/guide/keras/custom_layers_and_models https://www.tensorflow.org/api_docs/python/tf/keras/Model guide
Tensorflow has some docs for subclassing (tf) Keras Model and Layer.
However, it is unclear which to use for "modules" or "blocks" (e.g. several layers collectively).
Since it is technically several layers, I feel that subclassing Layer would be deceiving, and while subclassing Model works, I am unsure if there are any negative penalties for doing so.
e.g.
x = inputs
a = self.dense_1(x) # <--- self.dense_1 = tf.keras.Dense(...)
b = self.dense_2(a)
c = self.add([x, b])
which is appropriate to use?
(Please note that this answer is old, later, Keras changed to allow and use subclassing regularly)
Initially, there is no need to sublass anything with Keras. Unless you have a particular reason for that (which is not building, training, predicting), you don't subclass for Keras.
Buiding a Keras model:
Either using Sequential (the model is ready already, just add layers), or using Model (create a graph with layers and finally call Model(inputs, outputs)), you don't need to subclass.
The moment you create an instance of Sequential or Model, you have a fully defined model, ready to use in all situations.
This model can even be used as parts of other models, its layers can be easily accessed to get intermetiate outputs and create new branches in your graph.
So, I don't see any reason at all to subclass Model, unless you are using some additional framework that would require this (but I don't think so). This seems to be something from PyTorch users (because this kind of model building is typical for PyTorch, create a subclass for Module and add layers and a call method). But Pytorch doesn't offer the same ease as Keras does for getting intermediate results.
The main advantage of using Keras is exactly this: you can easily access layers from blocks and models and instantly start branching from that point without needing to rebuild any call methods or adding any extra code for that in the models. So, when you subclass Model, you just defeat the purpose of Keras making it all more difficult.
The docs you mentioned say:
Model subclassing is particularly useful when eager execution is enabled since the forward pass can be written imperatively.
I don't really understand what "imperatively" means, and I don't see how it would be easier than just building a model with regular layers.
Another quote from the docs:
Key Point: Use the right API for the job. While model subclassing offers flexibility, it comes at a cost of greater complexity and more opportunities for user errors. If possible, prefer the functional API.
Well... it's always possible.
Subclassing layers
Here, there may be good reasons to do so. And these reasons are:
You want a layer that performs custom calculations that are not available with regular layers
This layer must have persistent weights.
If you don't need "both" things above, you don't need to subclass a layer. If you just want "custom calculations" without weights, a Lambda layer is enough.
Reading through the documentation of implementing custom layers with tf.keras, they specify two options to inherit from, tf.keras.Layer and tf.keras.Model.
Under the context of creating custom layers, I'm asking myself what is the difference between these two? Technically what is different?
If I were to implement the transformer encoder for example, which one would be more suitable? (assuming the transformer is a only a "layer" in my full model)
In the documentation:
The Model class has the same API as Layer, with the following
differences: - It exposes built-in training, evaluation, and
prediction loops (model.fit(), model.evaluate(), model.predict()). -
It exposes the list of its inner layers, via the model.layers
property. - It exposes saving and serialization APIs.
Effectively, the "Layer" class corresponds to what we refer to in the
literature as a "layer" (as in "convolution layer" or "recurrent
layer") or as a "block" (as in "ResNet block" or "Inception block").
Meanwhile, the "Model" class corresponds to what is referred to in the
literature as a "model" (as in "deep learning model") or as a
"network" (as in "deep neural network").
So if you want to be able to call .fit(), .evaluate(), or .predict() on those blocks or you want to be able to save and load those blocks separately or something you should use the Model class. The Layer class is leaner so you won't bloat the layers with unnecessary functionality...but I would guess that that generally wouldn't be a big problem.
A layer takes in a tensor and give out a tensor which is a result of
some tensor operations
A model is a composition of multiple layers.
If you are building a new model architecture using existing keras/tf layers then build a custom model.
If you are implementing your own custom tensor operations with in a layer, then build a custom layer. If you are using non tensor operation inside your custom layer, then you have to code how the layer will forward propagate and backward propagate.
I am trying to make a custom CNN architecture using Pytorch. I want to have about the same control as what I would get if I make the architecture using numpy only.
I am new to Pytorch and would like to see some code samples of CNNs implemented without the nn.module class, if possible.
You have to implement backward() function in your custom class.
However from your question it is not clear whether
your need just a new series of CNN block (so you better use nn.module and something like
nn.Sequential(nn.Conv2d( ...) )
you just need gradient descent https://github.com/jcjohnson/pytorch-examples#pytorch-autograd , so computation backward on your own.