Saving model in pytorch and keras - python

I have trained model with keras and saved in with the help of pytorch. Will it cause any problems in the future. As far as I know the only difference between them is Keras saves its model's weights as doubles while PyTorch saves its weights as floats.

You can convert your model to double by doing
model.double()
Note that after this, you will need your input to be DoubleTensor.

Related

TorchVision using pretrained weights for entire model vs backbone

TorchVision Detection models have a weights and a weights_backbone parameter. Does using pretrained weights imply that the model uses pretrained weights_backbone under the hood? I am training a RetinaNet model and um unsure which of the two options I should use and what the differences are.
The difference is pretty simple: you can either choose to do transfer learning on the backbone only or on the whole network.
RetinaNet from Torchvision has a Resnet50 backbone. You should be able to do both of:
retinanet_resnet50_fpn(weights=RetinaNet_ResNet50_FPN_Weights.COCO_V1)
retinanet_resnet50_fpn(backbone_weights=ResNet50_Weights.IMAGENET1K_V1)
As implied by their names, the backbone weights are different. The former were trained on COCO (object detection) while the later were trained on ImageNet (classification).
To answer your question, pretrained weights implies that the whole network, including backbone weights, are initialized. However, I don't think that it calls backbone_weights under the hood.

Can I train my pretrained model with a totally different architecture?

I have trained a pretrained ResNet18 model with my custom dataset on Pytorch and wondered whether I could transfer my model file to train another one with a different architecture, e.g. ResNet50. I know I have to save my model accordingly (explained well on another post here) but this was a question that I have never thought before.
I was planning to use more advanced models like VisionTransformers (ViT) but I couldn't figure out whether I had to start with a pretrained ViT already or I could just take my previous model file and use it as the pretrained model to train a ViT.
Example Scenario: ResNet18 --> ResNet50 --> Inception v3 --> ViT
My best guess it that it's not possible due to number of weights, neurons and layer structures but I would love to hear that if I miss a crucial point here. Thanks!
Between models that only differ in number of layers (Resnet-18 and Resnet-50), it has been done to initialize some layers of the larger model from the weights of the smaller model's layers. Inversely, you can truncate a larger model by taking a subset of regularly spaced layers and initialize a smaller model. In both cases, you need to retrain everything at the end if you hope to achieve semi-decent performances.
The whole point of using architectures that vastly differ (vision transformers vs CNNs) is to learn different features from the inputs and unlock new levels of semantic understanding. Recent models like BeiT also use new self-supervised training schemes that have nothing to do with the classic ImageNet pretraining. Using trained weights from another model would go against the point.
Having said that,if you want to use a ViT, why not start from the available pretrained weights on HuggingFace and fine-tune it on the data you used to train your ResNet50 ?

If we expand or reduce the layer of the same model, can we still be able to train from pretrained model in Pytorch?

If the pretrained model such as Resnet101 were trained on ImageNet dataset, then I change some layers inside it. Can I still be able to use the pretrained model on different ABC dataset?
Lets say This is ResNet34 Model,
It is pretrained on ImageNet and saved as ResNet.pt file.
If I changed some layers inside it, lets say I made it more deeper by introducing some layers in conv4_x (check image)
model = Resnet34() #I have changes some layers inside this ResNet34()
optimizer = optim.Adam(model.parameters(), lr=0.00005)
model.load_state_dict(torch.load('Resnet.pt')['state_dict']) #This is pretrained model of ResNet before some changes
optimizer.load_state_dict(torch.load('Resnet.pt')['optimizer'])
Can I do this? or there are anyother method?
You can do anything you like - the question is: would it be better than training from scratch?
Here are a few issues you might encounter:
1. A mismatch between weights saved in ResNet.pt (the trained weights of the original ResNet18) and the state_dict of your modified model.
You would probably need to manually make sure that the old weights are correctly assigned to the original layers and only the new layer is not initialized.
2. Initializing the weights of the new layer.
Since you are training a resNet - you can take advantage of the residual connections and init the weights of the new layer such that it would initially make no contribution to the predicted value and only pass the input directly to the output via the residual link.

How to save different keras model into one .h5py

I am trying to do video classification using Conv2D and LSTM. After getting the features from Conv2D I pass them to LSTM. As they are two different models how can I merge them into one for getting a single .h5py?
And also the features from Conv2D are passed to LSTM as sequences of frames saved as ".npy" array files.
I need to save the model for different purposes.
Assuming you are using tf keras for merge you can use this method:
model = keras.models.Model(inputs=[Conv2D, LSTM], outputs=out)
And to save your model you can just use .save as explained here
model.save('your_model.h5')
Not exactly sure if this is what you need but hope it helps.

TensorFlow: What is the easiest way to incorporate predictions from one model in the training of a new model?

What is the simplest way to use tf.estimator trained model A during the training of another model B?
The weights in model A are fixed. In model B, I would like to take some inputs, compute, feed these results into model A, then do some more computations on the output.
A simple example:
ModelA returns tf.matmul(input,weights)
In ModelB, I would like to do the following:
x1 = tf.matmul(new_inputs,new_weights1)
x2 = modelA(x1) # with fixed weights
return tf.matmul(x2,new_weights2)
But with more complicated models A and B, each of which is trained as a tf.estimator (though I'm happy to not use estimators if there's another easy solution -- I'm using them because I would like to use ML Engine).
This question is related, but the proposed solution does not work for training model B, because the gradients of tf.py_func are [None]. I have tried registering a gradient for tf.py_func, but this fails with
Unsupported object type Tensor
I have also tried tf.import_graph_def for model A, but this seems to load the pretrained graph, but not the actual weights.
For model composability, Keras works a whole lot better. You can convert a Keras model to estimator:
https://cloud.google.com/blog/products/gcp/new-in-tensorflow-14-converting-a-keras-model-to-a-tensorflow-estimator
So you can still train on ML Engine.
With Keras, it is then just a matter of loading the intermediate layers' weights and biases from a checkpoint and make that layer non-trainable. See:
Is it possible to save a trained layer to use layer on Keras?

Categories