I have been browsing the documentation for the tensorflow.keras.save_model() API and I came across the parameter include_optimizer and I am wondering what would be the advantage of not including the optimizer, or perhaps what problems could arise if the optimizer isn't saved with the model?
To give more context for my specific use-case, I want to save a model and then use the generated .pb file with Tensorflow Serving. Is there any reason I would need to save the optimizer state, would not saving it reduce the overall size of the resultant file? If I don't save it is it possible that the model will not work correctly in TF serving?
Saving the optimizer state will require more space, as the optimizer has parameters that are adjusted during training. For some optimizers, this space can be significant, as several meta-parameters are saved for each tuned model parameter.
Saving the optimizer parameters allows you to restart training in exactly the same state as you saved the checkpoint, whereas without saving the optimizer state, even the same model parameters might result in a variety of training outcomes with different optimizer parameters.
Thus, if you plan on continuing to train your model from the saved checkpoint, you'd probably want to save the optimizer's state as well. However, if you're instead saving the model state for future use only for inference, you don't need the optimizer state for anything. Based on your description of wanting to deploy the model on TF Serving, it sounds like you'll only be doing inference with the saved model, so are safe to exclude the optimizer.
I am training a model using transfer learning based on Resnet152. Based on PyTorch tutorial, I have no problem in saving a trained model, and loading it for inference. However, the time needed to load the model is slow. I don't know if I did it correct, here is my code:
To save the trained model as state dict:
torch.save(model.state_dict(), 'model.pkl')
To load it for inference:
model = models.resnet152()
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, len(classes))
st = torch.load('model.pkl', map_location='cuda:0' if torch.cuda.is_available() else 'cpu')
model.load_state_dict(st)
model.eval()
I timed the code and found that the first line model = models.resnet152() takes the longest time to load. On CPU, it takes 10 seconds to test one image. So my thinking is that this might not be the proper way to load it?
If I save the entire model instead of the state.dict like this:
torch.save(model, 'model_entire.pkl')
and test it like this:
model = torch.load('model_entire.pkl')
model.eval()
on the same machine it takes only 5 seconds to test one image.
So my question is: is it the proper way to load the state_dict for inference?
In the first code snippet, you are downloading a model from TorchVision (with random weights), and then loading your (locally stored) weights to it.
In the second example you are loading a locally stored model (and its weights).
The former will be slower since you need to connect to the server hosting the model and download it, as opposed to a local file, but it is more reproduceable not relying on your local file. Also, the time difference should be a one-off initialisation, and they should have the same time complexity (as by the point you are performing inference the model has already been loaded in both, and they are equivalent).
I am building a model for machine comprehension. It's a heavy model required to train on lots of data and this requires me more time. I have used keras callbacks to save model after every epoch and also save a history of loss and accuracy.
The problem is, when I am loading a trained model, and try to continue it's training using initial_epoch argument, the loss and accuracy values are same as untrained model.
Here is the code: https://github.com/ParikhKadam/bidaf-keras
The code used to save and load model is in /models/bidaf.py
The script I am using to load the model is:
from .models import BidirectionalAttentionFlow
from .scripts.data_generator import load_data_generators
import os
import numpy as np
def main():
emdim = 400
bidaf = BidirectionalAttentionFlow(emdim=emdim, num_highway_layers=2,
num_decoders=1, encoder_dropout=0.4, decoder_dropout=0.6)
bidaf.load_bidaf(os.path.join(os.path.dirname(__file__), 'saved_items', 'bidaf_29.h5'))
train_generator, validation_generator = load_data_generators(batch_size=16, emdim=emdim, shuffle=True)
model = bidaf.train_model(train_generator, epochs=50, validation_generator=validation_generator, initial_epoch=29,
save_history=False, save_model_per_epoch=False)
if __name__ == '__main__':
main()
The training history is quite good which is:
epoch,accuracy,loss,val_accuracy,val_loss
0,0.5021367247352657,5.479433422293752,0.502228641179383,5.451400522458351
1,0.5028450897193741,5.234336488338403,0.5037527732234647,5.0748545675049
2,0.5036885394022954,5.042028017280698,0.5039489093881276,5.0298488218407975
3,0.503893446146289,4.996997425685413,0.5040753162241299,4.976164487656699
4,0.5040576918224873,4.955544574118662,0.5041905890181151,4.931354981493792
5,0.5042372655790888,4.909940965651957,0.5043896965802341,4.881359395178988
6,0.504458428129642,4.8542871887472465,0.5045972716586732,4.815464454729135
7,0.50471843351102,4.791098495962496,0.5048680457262408,4.747811231472629
8,0.5050776754196002,4.713560494026321,0.5054184527602898,4.64730478015052
9,0.5058853749443502,4.580552254050073,0.5071290369370443,4.446513280167718
10,0.5081544614246304,4.341471499420364,0.5132941329030303,4.145318906086552
11,0.5123970410575613,4.081624463197288,0.5178775145611896,4.027316586998608
12,0.5149879128865782,3.9577423109634613,0.5187159608315838,3.950151870168726
13,0.5161411008840144,3.8964761709052578,0.5191430166876064,3.906301355196609
14,0.5168211272672539,3.8585826589385697,0.5191263493850466,3.865382308412537
15,0.5173216891201444,3.830764191839807,0.519219763635108,3.8341492204942607
16,0.5177805591697787,3.805340048675155,0.5197178382215892,3.8204319018292585
17,0.5181171635676399,3.7877712072310343,0.5193657963810704,3.798006804522368
18,0.5184295824699279,3.77086071548255,0.5193122694008523,3.7820449101377243
19,0.5187343664397653,3.7555085003534194,0.5203585262348183,3.776260506494833
20,0.519005008308583,3.7430062334375065,0.5195983755362352,3.7605361109533995
21,0.5192872482429703,3.731001830462149,0.5202017035842986,3.7515058917231405
22,0.5195097722222706,3.7194103983513553,0.5207148585133065,3.7446572377159795
23,0.5197511249107636,3.7101052441559905,0.5207420740297026,3.740088335181619
24,0.5199862479678652,3.701593302911729,0.5200187951731082,3.7254406861185188
25,0.5200847805044403,3.6944093077914464,0.520112738649039,3.7203616696860786
26,0.5203289568582412,3.6844954882274092,0.5217114634669081,3.7214983577364547
27,0.5205629846610852,3.6781935968943595,0.520915311442328,3.705435317731209
28,0.5206827641463226,3.6718110897539193,0.5214088439286978,3.7003081666703377
Also, I have already taken care of loading custom objects such as layers, loss function and accuracy.
I am kind of frustrated by now as I took me days to train this model upto epochs and now I can't resume training. I have referred various threads in keras issues and found many people are facing such issues but can't find a solution.
Someone in a thread said that "Keras will not save RNN states" (I ain't using stateful RNNs) and someone else said "Keras reinitializes all the weights before saving which we can handle using a flag." I mean, if such problems exist in Keras, what will be the use of functions like save().
I have also tried saving only weights after every epoch and then building model from scratch and then loading those weights into it. But that didn't work. You can find the old code I used to save weights only in the above listed github repo's older branches.
I have referred this issue with no help - #4875
That issue is open from past two years. Can't understand what all the developers are doing! Is anyone here who can help? Should I switch to tensorflow or I will face the same issues in that too?
Please help...
Edit1:
I haven't tried saving model using model.save() but I have seen people on other threads saying that the issue was solved with model.save() and models.save_model(). If it is actually solved, ModelCheckpoint should also save optimizer state to resume training but it doesn't (or can't) whatsoever the reason. I have verified the code of ModelCheckpoint callback which indirectly calls model.save() which leads a call to models.save_model(). So theoretically, if the issue in the base i.e. models.save_model() is solved, then it should also be solved in other functions.
Sorry, but I don't have a powerful machine to check this practically. If someone here has, I have shared my code on github and the link is provided in the issue. Please try the resume training on it and detect the cause of this problem.
I am using the computer provided by a national institute and hence, students here need to share this single computer for their projects. I can't use that computer for such tasks. Thank you..
Edit2:
Recently, I tried to check if the weights are saved correctly. For that, I evaluated the model with my validation generator. I saw that the output loss and accuracy remained the same as those in the beginning of model training. Seeing this, I reached a conclusion that its actually the issue with saving weights of the model. I might be wrong here..
BTW, I have also used multi_gpu_model() in my model code. Can it cause this issue? I can't try training model on CPU as it too heavy for that and will take a few days for only 1 epoch to complete. Can anyone help debugging?
I see no response in such issues these days. Just list current issues on the README.md in keras github so that users can be aware before trying keras out and wasting months behind it.
I am trying to convert a keras model to tpu model in google colab, but this model has another model inside.
Take a look at the code:
https://colab.research.google.com/drive/1EmIrheKnrNYNNHPp0J7EBjw2WjsPXFVJ
This is a modified version of one of the examples in the google tpu documentation:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb
If the sub_model is converted and used directly it works, but if the sub model is inside another model it does not work. I need the sub model type of network because i am trying to train a GAN network that has 2 networks inside (gan=generator+discriminator) so if this test works probably it will work with the gan too.
I have tried several things:
Convert to tpu the model without converting the sub model, in that case when training starts an error is prompted related to the inputs of the sub model.
Convert both the model and sub model to tpu, in that case an error is prompted when converting the "parent" model, the exception only says at the end "layers".
Convert only the sub model to tpu, in that case no error is prompted but the training is not accelerated by the tpu and it is extremely slow like if no conversion to tpu was made at all.
Using fixed batch size or not, both have the same result, the model does not work.
Any ideas? Thanks a lot.
Divide into parts only use submodel at tpu first. Then put something simple instead of submodel and use the model in TPU. If this does not work , create something very simple which includes similar structure with models you are sure that are working and then step by step add things to converge your complex model which you want to use in TPU.
I am struggling with such things. What I did at the very beginning using MNIST is trained the model and get the coefficients outside rewrite relu dense dropout and NN matricies myself and run the model using numpy and then cupy and then pyopencl and then I replaced functions with my own raw cuda C and opencl functions so that getting deeper and simpler I can find what is wrong when something does not work. At last I write my genetic selective training algo and learned a lot.
And most important it gave me the opportunity to try some crazy ideas for training and modelling and manuplating and making sense of NN coffecients.
The problem in my opinion is TF - Keras etc are too high level. Optimizers - Solvers , there is too much unknown. Even neural networks are not under control. GAN is problematic while training it does not converge everytime takes days to train most of the time. Even if you train. You dont know any idea how it converges. Most of the tricks - techniques which protects you from vanishing gradient are not mathematically backed they are nevertheless works very amazingly. (?!?)
**Go simpler deeper and and complexity step by step. Follow a practicing on which you comprehend as much as you can ** It will cost some time and energy but you will benefit it tremendously in my opinion.
I have Python code that generates a deep convolutional neural network using Keras. I'm trying to save the model, but the result is gigantic (100s of MBs). I'd like to pare that down a bit to make something more manageable.
The problem is that model.save() stores (quoting the Keras FAQ):
the architecture of the model, allowing to re-create the model
the weights of the model
the training configuration (loss, optimizer)
the state of the optimizer, allowing to resume training exactly where you left off.
If I'm not doing any more training, I think I just need the first two.
I can use model.to_json() to make a JSON string of the architecture and save that off, and model.save_weights() to make a separate file containing the weights. That's about a third the size of the full model.save() result. But I'm wondering if there's some way to store these in a single self-contained file? (Short of outputting two files, zipping them together, and deleting the originals.) Alternatively, maybe there's a way to delete the training configuration and optimizer state when training is complete, so that model.save() doesn't give me something nearly so big?
Thanks.
The save function of a Model has a parameter exactly for this, called include_optimizer, setting it to false will save the model without including the optimizer state, which should lead to a much smaller HDF5 file:
model.save("something.hdf5", include_optimizer=False)