Keras model creation taking too much memory? [Kaggle] - python

I am building a model for semantic segmentation using kaggle and keras, and no matter what the model complexity is, just building the model itself is taking 15.1GB of GPU space and therefore I don't have any space to actually load any images even while using a generator. I always get an OOM error. Am I doing something wrong? Why does keras take 15GB just to create the model?
I have not yet compiled the model. This is right after just building the model [model = Model(img_input, o)]

Related

Why does "load_model" cause RAM memory problems while predicting?

I trained neural network (transformer architecture) and saved it by using:
model.save(directory + args.name, save_format="tf")
After that, I want to load the model again with another script to test it by letting it make iterative predictions:
from keras.models import load_model
model = load_model(args.model)
for i in range(very_big_number):
out, _ = model(something, training=False)
However, I have noticed that the RAM usage increases with each prediction and I don't know why. At some point the programme stops because there is no more memory available. You can also see the RAM consumption in the following screenshot:
If I use the same architecture, but only load the weights of the model with model.load_weigts( ... ), I do not have the problem.
My question now is, why does load_model seem to cause this and how do I solve the problem?
I'm using tensorflow 2.5.0.
Edit:
As I was not able to solve the problem and the answers did not help either, I simply used the load_weights method so that I created a new model and loaded the weights of the saved model like this:
model = myModel()
saved_model = load_model(args.model)
model.load_weights(saved_model + "/variables/variables")
In this way, the usage of RAM remained constant. Nevertheless an non-optimal solution, in my opinion.
There is a fundamental difference between load_model and load_weights. When you save an model using save_model you save the following things:
A Keras model consists of multiple components:
The architecture, or configuration, which specifies what layers the model contain, and how they're connected.
A set of weights values (the "state of the model").
An optimizer (defined by compiling the model).
A set of losses and metrics (defined by compiling the model or calling
add_loss() or add_metric()).
However when you save the weights using save_weights, you only saves the weights, and this is useful for the inference purpose, while when you want to resume the training process, you need a model object, that is the reason we save everything in the model. When you just want to predict and get the result save_weights is enough. To learn more, you can check the documentation of save/load models.
So, as you can see when you do load_model, it has many things to load as compared to load_weights, thus it will have more overhead hence your RAM usage.

Saving model in pytorch and keras

I have trained model with keras and saved in with the help of pytorch. Will it cause any problems in the future. As far as I know the only difference between them is Keras saves its model's weights as doubles while PyTorch saves its weights as floats.
You can convert your model to double by doing
model.double()
Note that after this, you will need your input to be DoubleTensor.

Tensorflow Object Detection API Untrained Faster-RCNN Model

I am currently trying to build an Object Detector using the the Tensorflow Object Detection API with python. I have managed to retrain the faster-rcnn model by following the instructions posted here and here
However, training time is considerably long as I understand that I am. I understand that I am using transfer learning as opposed to training a faster-rcnn model from scratch. I am wondering if there is anyway to download an untrained faster-rcnn model and train it from scratch (end-to-end) instead of having to recourse to transfer-learning.
I am familiar with the advantages of transfer learning, however, my object detector is aimed at being quickly trainable, narrow in scope, and trained on letters as opposed to objects, so I do not think transfer learning is the best route.
I beleive solving this will have something to do with the pipeline.config file, particulary in this part:
fine_tune_checkpoint: "PATH/TO/PRETRAINED/model.ckpt"
from_detection_checkpoint: true
num_steps: 200000
But I am not sure how to specify that there is no fine_tune_checkpoint
To train your own model from scratch do the following:
Comment out the following lines
# fine_tune_checkpoint: <YOUR PATH>
# from_detection_checkpoint: true
Remove your downloaded pretrained model or rename its path in case you followed the tutorial.
You don't have to download an "empty" model. Instead you can specify your own weight initialization in the config file, e.g., as done here: How to initialize weight for convolution layers in Tensorflow Object Detection API?

What is the proper way to load a transfer learning model for inference in PyTorch?

I am training a model using transfer learning based on Resnet152. Based on PyTorch tutorial, I have no problem in saving a trained model, and loading it for inference. However, the time needed to load the model is slow. I don't know if I did it correct, here is my code:
To save the trained model as state dict:
torch.save(model.state_dict(), 'model.pkl')
To load it for inference:
model = models.resnet152()
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, len(classes))
st = torch.load('model.pkl', map_location='cuda:0' if torch.cuda.is_available() else 'cpu')
model.load_state_dict(st)
model.eval()
I timed the code and found that the first line model = models.resnet152() takes the longest time to load. On CPU, it takes 10 seconds to test one image. So my thinking is that this might not be the proper way to load it?
If I save the entire model instead of the state.dict like this:
torch.save(model, 'model_entire.pkl')
and test it like this:
model = torch.load('model_entire.pkl')
model.eval()
on the same machine it takes only 5 seconds to test one image.
So my question is: is it the proper way to load the state_dict for inference?
In the first code snippet, you are downloading a model from TorchVision (with random weights), and then loading your (locally stored) weights to it.
In the second example you are loading a locally stored model (and its weights).
The former will be slower since you need to connect to the server hosting the model and download it, as opposed to a local file, but it is more reproduceable not relying on your local file. Also, the time difference should be a one-off initialisation, and they should have the same time complexity (as by the point you are performing inference the model has already been loaded in both, and they are equivalent).

How to use keras model inside other model in TPU

I am trying to convert a keras model to tpu model in google colab, but this model has another model inside.
Take a look at the code:
https://colab.research.google.com/drive/1EmIrheKnrNYNNHPp0J7EBjw2WjsPXFVJ
This is a modified version of one of the examples in the google tpu documentation:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb
If the sub_model is converted and used directly it works, but if the sub model is inside another model it does not work. I need the sub model type of network because i am trying to train a GAN network that has 2 networks inside (gan=generator+discriminator) so if this test works probably it will work with the gan too.
I have tried several things:
Convert to tpu the model without converting the sub model, in that case when training starts an error is prompted related to the inputs of the sub model.
Convert both the model and sub model to tpu, in that case an error is prompted when converting the "parent" model, the exception only says at the end "layers".
Convert only the sub model to tpu, in that case no error is prompted but the training is not accelerated by the tpu and it is extremely slow like if no conversion to tpu was made at all.
Using fixed batch size or not, both have the same result, the model does not work.
Any ideas? Thanks a lot.
Divide into parts only use submodel at tpu first. Then put something simple instead of submodel and use the model in TPU. If this does not work , create something very simple which includes similar structure with models you are sure that are working and then step by step add things to converge your complex model which you want to use in TPU.
I am struggling with such things. What I did at the very beginning using MNIST is trained the model and get the coefficients outside rewrite relu dense dropout and NN matricies myself and run the model using numpy and then cupy and then pyopencl and then I replaced functions with my own raw cuda C and opencl functions so that getting deeper and simpler I can find what is wrong when something does not work. At last I write my genetic selective training algo and learned a lot.
And most important it gave me the opportunity to try some crazy ideas for training and modelling and manuplating and making sense of NN coffecients.
The problem in my opinion is TF - Keras etc are too high level. Optimizers - Solvers , there is too much unknown. Even neural networks are not under control. GAN is problematic while training it does not converge everytime takes days to train most of the time. Even if you train. You dont know any idea how it converges. Most of the tricks - techniques which protects you from vanishing gradient are not mathematically backed they are nevertheless works very amazingly. (?!?)
**Go simpler deeper and and complexity step by step. Follow a practicing on which you comprehend as much as you can ** It will cost some time and energy but you will benefit it tremendously in my opinion.

Categories