Can I train my pretrained model with a totally different architecture? - python

I have trained a pretrained ResNet18 model with my custom dataset on Pytorch and wondered whether I could transfer my model file to train another one with a different architecture, e.g. ResNet50. I know I have to save my model accordingly (explained well on another post here) but this was a question that I have never thought before.
I was planning to use more advanced models like VisionTransformers (ViT) but I couldn't figure out whether I had to start with a pretrained ViT already or I could just take my previous model file and use it as the pretrained model to train a ViT.
Example Scenario: ResNet18 --> ResNet50 --> Inception v3 --> ViT
My best guess it that it's not possible due to number of weights, neurons and layer structures but I would love to hear that if I miss a crucial point here. Thanks!

Between models that only differ in number of layers (Resnet-18 and Resnet-50), it has been done to initialize some layers of the larger model from the weights of the smaller model's layers. Inversely, you can truncate a larger model by taking a subset of regularly spaced layers and initialize a smaller model. In both cases, you need to retrain everything at the end if you hope to achieve semi-decent performances.
The whole point of using architectures that vastly differ (vision transformers vs CNNs) is to learn different features from the inputs and unlock new levels of semantic understanding. Recent models like BeiT also use new self-supervised training schemes that have nothing to do with the classic ImageNet pretraining. Using trained weights from another model would go against the point.
Having said that,if you want to use a ViT, why not start from the available pretrained weights on HuggingFace and fine-tune it on the data you used to train your ResNet50 ?

Related

Does it make sense to train a pre-trained architecture (ResNet) with specific images to further train and evaluate with my own specific imagery)

I was wondering if it is useful to train a pre-trained resnet (pre-trained with imagenet) with images that are closer to my classification problem. I want to use 50,000 labeled images of trees from a paper to update the weights of the pre-trained resnet. Then I would like to use these weights to re-train and evaluate the resnet, hopefully better fitted this way, with my own set of images of trees.
I already used the pre-trained resnet on my own images with moderate success. Due to the small dataset size (~5,000 imagery) I thought it might be smart to further train the pre-trained resnet with more similar data.
Any suggestions or experiences you want to share?

How to train pre trained model (MNIST) tensorflow

I have a project with Fashion MNIST, which predicts clothes from uploaded images, and I want to make some improvements with it. Is it possible to modify my project that it will train automatically after each uploaded and prediction?
You can train your model manually by using the transfer learning technique(Transfer learning is a method of reusing an already trained model for another task).
Instantiate a base model and load pre-trained weights into it.
Freeze all layers in the base model by setting trainable = False.
Create a new model on top of the output of one (or several) layers
from the base model. Train your new model on your new dataset.
Please refer to this gist for working code example. Thank You.

Why some TensorFlow-Hub models are not fine tunable?

I am just learning image classification with TensorFlow and found that there is a TensorFlow hub where we can use a lot of models for a lot of classification tasks. For example, I want to build food classification and develop the model so the model would cover foods in my country and have a higher accuracy on some specific foods. I try to use and tune this model: https://tfhub.dev/google/aiy/vision/classifier/food_V1/1, but why there is information that the model is not fine-tunable?
What makes a model can be fine-tuned and can't be fine-tuned?
Thank you.
The publisher/creator of the model makes he decision on whether the model is fine-tunable or not. Making a model fine-tunable requires model creator to make sure that the TF computation graph supports fine-tuning. For example, if the model contains dropout or batchnorm, the computation graph for fine-tuning and for inference-only will be different. The publisher/creator of the model has to make sure that model is exported correctly to support both these cases. Sometimes publishers do not to go through these steps and mark the model as non fine-tunable. 

How to add a new class to an existing classifier in deep learning?

I trained a deep learning model to classify the given images into three classes. Now I want to add one more class to my model. I tried to check out "Online learning", but it seems to train on new data for existing classes. Do I need to train my whole model again on all four classes or is there any way I can just train my model on new class?
You probably have used a softmax after 3 neuron dense layer at the end of the architecture to classify into 3 classes. Adding a class will lead to doing a softmax over 4 neuron dense layer so there will be no way to accommodate that extra neuron in your current graph with frozen weights, basically you're modifying the graph and hence you'll have to train the whole model from scratch
-----or-----
one way would be loading the model and removing the last layer , changing it to 4 neurons and training the network again! This will basically train the weights of the last layer from scratch . I don't think there is anyway to keep these(weights of the last layer) weights intact while adding a new class .
You have to remove the final fully-connected layer, freeze the weights in the feature extraction layers, add a new fully-connected layer with four outputs and retrain the model with images of the original three classes and the new fourth class.
I tried to check out "Online learning", but it seems to train on new data for existing classes.
Online learning is a term used to refer to a model which takes a continual or sequential stream of input data while training, in contrast to offline learning (also called batch learning), where the model is pre-trained on a static predefined dataset.
Continual learning (also called incremental, continuous, lifelong learning) refers to a branch of ML working in an online learning context where models are designed to learn new tasks while maintaining performance on historic tasks. It can be applied to multiple problem paradigms (including Class-incremental learning, where each new task presents new class labels for an ever expanding super-classification problem).
Do I need to train my whole model again on all four classes or is there any way I can just train my model on new class?
Naively re-training the model on the updated dataset is indeed a solution. Continual learning seeks to address contexts where access to historic data (i.e. the original 3 classes) is not possible, or when retraining on an increasingly large dataset is impractical (for efficiency, space, privacy etc concerns). Multiple such models using different underlying architectures have been proposed, but almost all examples exclusively deal with image classification problems.
Related q's:
How to fine-tune a keras model with existing plus newer classes?

training the same model with different data sets in tensorflow

The problem:
I have a model that I would like to train with independent data sets. Afterwards, I would like to extract the weights of each model (the model is the same for each instance but trained using different datasets) and finally, compute and average of these weights. Basically, my intention is to mimic tensorflow running on multiple devices and then average their weights so that they are used by one model.
My solution:
I added this model multiple times to tensorflow and am currently training each of these models separately with its unique dataset.. but this is using GBs of memory, and am wondering if there is a better way to do this?
One of the possible solutions is that you can fine-tune your network weights with other similar networks(similar datasets, i.e, if your dataset is images, you can use AlexNet weights)don't afraid if your network has no same architecture, you can simply load weights of layers as much as you need by 'load_with_skip' function of
https://github.com/joelthchao/tensorflow-finetune-flickr-style/blob/master/network.py
Fine-tuning takes much less than train networks from scratch.

Categories