The problem:
I have a model that I would like to train with independent data sets. Afterwards, I would like to extract the weights of each model (the model is the same for each instance but trained using different datasets) and finally, compute and average of these weights. Basically, my intention is to mimic tensorflow running on multiple devices and then average their weights so that they are used by one model.
My solution:
I added this model multiple times to tensorflow and am currently training each of these models separately with its unique dataset.. but this is using GBs of memory, and am wondering if there is a better way to do this?
One of the possible solutions is that you can fine-tune your network weights with other similar networks(similar datasets, i.e, if your dataset is images, you can use AlexNet weights)don't afraid if your network has no same architecture, you can simply load weights of layers as much as you need by 'load_with_skip' function of
https://github.com/joelthchao/tensorflow-finetune-flickr-style/blob/master/network.py
Fine-tuning takes much less than train networks from scratch.
Related
I have trained a pretrained ResNet18 model with my custom dataset on Pytorch and wondered whether I could transfer my model file to train another one with a different architecture, e.g. ResNet50. I know I have to save my model accordingly (explained well on another post here) but this was a question that I have never thought before.
I was planning to use more advanced models like VisionTransformers (ViT) but I couldn't figure out whether I had to start with a pretrained ViT already or I could just take my previous model file and use it as the pretrained model to train a ViT.
Example Scenario: ResNet18 --> ResNet50 --> Inception v3 --> ViT
My best guess it that it's not possible due to number of weights, neurons and layer structures but I would love to hear that if I miss a crucial point here. Thanks!
Between models that only differ in number of layers (Resnet-18 and Resnet-50), it has been done to initialize some layers of the larger model from the weights of the smaller model's layers. Inversely, you can truncate a larger model by taking a subset of regularly spaced layers and initialize a smaller model. In both cases, you need to retrain everything at the end if you hope to achieve semi-decent performances.
The whole point of using architectures that vastly differ (vision transformers vs CNNs) is to learn different features from the inputs and unlock new levels of semantic understanding. Recent models like BeiT also use new self-supervised training schemes that have nothing to do with the classic ImageNet pretraining. Using trained weights from another model would go against the point.
Having said that,if you want to use a ViT, why not start from the available pretrained weights on HuggingFace and fine-tune it on the data you used to train your ResNet50 ?
Using Tensorflow's Custom Object Classification API w/ SSD MobileNet V2 FPNLite 320x320 as the base, I was able to train my model to succesfully detect classes A and B using Training Data 1 (about 200 images). This performed well on Test Set 1, which only has images of class A and B.
I wanted to add several classes to the model, so I constructed a separate dataset, Training Data 2 (about 300 images). This dataset contains labeled data for class B, and new classes C, D and E. However it does NOT include data for class A. Upon training the model on this data, it performed well on Test Set 2 which contained only images of B, C, D and E (however the accuracy on B did not go up despite extra data)
Concerned, I checked the accuracy of the model on Test Set 1 again, and as I had assumed, the model didn't recognize class A at all. In this case I'm assuming I didn't actually refine the model but instead retrained the model completely.
My Question: Am I correct in assuming I cannot refine the model on a completely separate set of data, and instead if I want to add more classes to my trained model that I must combine Training Set 1 and Training Set 2 and train on the entirety of the data?
Thank you!
It mostly depends on your hyperparameters, namely, your learning rate and the number of epochs trained. Higher learning rates will make the model forget the old data faster. Also, be sure not to be overfitting your data, have a validation set as well. Models that have overfit the training data tend to be very sensitive to weight (and data) perturbations.
TLDR. If not trained on all data, ML models tend to forget old data in favor of new data.
There is a lot of "moving parts". I propose the followings:
Take the "SSD MobileNet V2 FPNLite 320x320" as a basemodel without its last classification layer (argument include_top=False when loading the model), and freeze its parameters using command basemodel.trainable=False
Add new prediction layer with command prediction_layer=tf.keras.layers.Dense(1) and make other required things (details step by step in page https://www.tensorflow.org/tutorials/images/transfer_learning)
After the procedure above verify that you have understanding which parameters of the new network (including "old" convolutional part and your own new prediction layer) are trainable and which are not. Change the hyperparameters if needed.
Next train the network using a standard procedures.
Use directly final number of classes according to your idea (25). If you have no data yet for all classes, do not worry, generate some random images for the purpose, and of course take into account that the results are not valid for the classes with no appropriate data.
For simplicity divide the data - principally independently from the number of classes - to training and test data and nothing more complicated in first hand. When amount of data increases the statistics will diminish problems with sampling. And when training, monitor how the amount of data increase the performance of the classification.
So - in a nutshell - 1) make the network - 2) select which parameters to train - 3) train with one dataset and 4) test with another.
And finally direct answer for the question in title and in the end of the question:
-According to experience first utilize out all performance of the basemodel by training only the last layers of the network. After you are sure no more performance can be found this way, begin to finetune the convolutional layers tuning carefully hyperparameters.
-You can refine the model totally only by using your own new data; this is special benefit and art of transfer learning
If a big model consists of end-to-end individual models, can I (after training) preserve only one model and freeze/discard other models during inference?
An example: this struct2depth (see below) have three models training in an unsupervised fashion. However, what I really need is the object motion, namely 3D Object Motion Estimation part. So I wonder if this is feasible to
train on the original networks, but
inference with only Object Motion Estimator, i.e. other following layers frozen/discarded?
I saw that in tensorflow one can obtain tensor-output of a specified layer, but to save unnecessary computation I'd like to simply freeze all other parts... don't know if it's possible.
Looking forward to some insights. Thanks in advance!
You can ignore weights by setting them to 0. For this, you can directly get a weight W and do W.assign(tf.mul(W,0)). I know that you care about speeding up inference but unless you rewrite your code to use sparse representations, you will probably not be speeding up inference since weights can't be removed fully.
What you can alternatively do, is look at existing solutions for pruning in custom layers:
class MyDenseLayer(tf.keras.layers.Dense, tfmot.sparsity.keras.PrunableLayer):
def get_prunable_weights(self):
# Prune bias also, though that usually harms model accuracy too much.
return [self.kernel, self.bias]
# Use `prune_low_magnitude` to make the `MyDenseLayer` layer train with pruning.
model_for_pruning = tf.keras.Sequential([
tfmot.sparsity.keras.prune_low_magnitude(MyDenseLayer(20, input_shape=input_shape)),
tf.keras.layers.Flatten()
])
You can e.g. use ConstantSparsity (see here) and set the parameters such that your layers are fully pruned.
Another alternative is to construct a second, smaller model that you only use for inference. You can then save the required weights separately (instead of saving the entire model) after training and load them in the second model.
In my case, I would like to weekly tune/adjust the model parameters value.
I have pre-trained the model by using the 100K data rows, Keras, and saved the model.
Then, as the new data collection (10K data rows), I need to tune the model parameter but don't want to retrain the whole dataset (110K).
How can I just partially fit the data on the model? load model -> model.fit(10K_data)?
Yes, that is correct you will train only on the new dataset (10k) model.fit(10K_data). I will recommend to change the learning rate for the retraining (reducing the learning rate) as you will just want to do a minor update to the parameters while keeping the earlier learning intact (or trying to leavarage the earlier learning).
I trained a deep learning model to classify the given images into three classes. Now I want to add one more class to my model. I tried to check out "Online learning", but it seems to train on new data for existing classes. Do I need to train my whole model again on all four classes or is there any way I can just train my model on new class?
You probably have used a softmax after 3 neuron dense layer at the end of the architecture to classify into 3 classes. Adding a class will lead to doing a softmax over 4 neuron dense layer so there will be no way to accommodate that extra neuron in your current graph with frozen weights, basically you're modifying the graph and hence you'll have to train the whole model from scratch
-----or-----
one way would be loading the model and removing the last layer , changing it to 4 neurons and training the network again! This will basically train the weights of the last layer from scratch . I don't think there is anyway to keep these(weights of the last layer) weights intact while adding a new class .
You have to remove the final fully-connected layer, freeze the weights in the feature extraction layers, add a new fully-connected layer with four outputs and retrain the model with images of the original three classes and the new fourth class.
I tried to check out "Online learning", but it seems to train on new data for existing classes.
Online learning is a term used to refer to a model which takes a continual or sequential stream of input data while training, in contrast to offline learning (also called batch learning), where the model is pre-trained on a static predefined dataset.
Continual learning (also called incremental, continuous, lifelong learning) refers to a branch of ML working in an online learning context where models are designed to learn new tasks while maintaining performance on historic tasks. It can be applied to multiple problem paradigms (including Class-incremental learning, where each new task presents new class labels for an ever expanding super-classification problem).
Do I need to train my whole model again on all four classes or is there any way I can just train my model on new class?
Naively re-training the model on the updated dataset is indeed a solution. Continual learning seeks to address contexts where access to historic data (i.e. the original 3 classes) is not possible, or when retraining on an increasingly large dataset is impractical (for efficiency, space, privacy etc concerns). Multiple such models using different underlying architectures have been proposed, but almost all examples exclusively deal with image classification problems.
Related q's:
How to fine-tune a keras model with existing plus newer classes?