I'm attempting to make multiple sequential predictions from a tensorflow network, but performance seems very poor (~500ms per prediction for a 2-layer 8x8 convolutional network) even for a CPU. I suspect that part of the problem is that it appears to be reloading the network parameters every time. Each call to classifier.predict in the code below results in the following line of output - which I therefore see hundreds of times.
INFO:tensorflow:Restoring parameters from /tmp/model_data/model.ckpt-102001
How can I reuse the checkpoint that is already loaded?
(I can't do batch predictions here because the output of the network is a move to play in a game, which then needs to be applied to the current state before feeding the the new game state.)
Here's the loop that's doing the predictions.
def rollout(classifier, state):
while not state.terminated:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(x={"x": state.as_nn_input()}, shuffle=False)
prediction = next(classifier.predict(input_fn=predict_input_fn))
index = np.random.choice(NUM_ACTIONS, p=prediction["probabilities"]) # Select a move according to the network's output probabilities
state.apply_move(index)
classifier is a tf.estimator.Estimator created with...
classifier = tf.estimator.Estimator(
model_fn=cnn_model_fn, model_dir=os.path.join(tempfile.gettempdir(), 'model_data'))
The Estimator API is a high-level API.
The tf.estimator framework makes it easy to construct and train
machine learning models via its high-level Estimator API. Estimator
offers classes you can instantiate to quickly configure common model
types such as regressors and classifiers.
The Estimator API abstracts away lots of the complexity of TensorFlow, but loses some generality in the process. Having read the code, it's clear that there's no way to run multiple sequential predictions without reloading the model each time. The low-level TensorFlow APIs allow this behaviour. But...
Keras is a high-level framework that supports this use case. Simple define the model and then call predict repeatedly.
def rollout(model, state):
while not state.terminated:
predictions = model.predict(state.as_nn_input())
for _, prediction in enumerate(predictions):
index = np.random.choice(bt.ACTIONS, p=prediction)
state.apply_mode(index)
Unscientific benchmarking shows that this is ~100x faster.
Related
I trained neural network (transformer architecture) and saved it by using:
model.save(directory + args.name, save_format="tf")
After that, I want to load the model again with another script to test it by letting it make iterative predictions:
from keras.models import load_model
model = load_model(args.model)
for i in range(very_big_number):
out, _ = model(something, training=False)
However, I have noticed that the RAM usage increases with each prediction and I don't know why. At some point the programme stops because there is no more memory available. You can also see the RAM consumption in the following screenshot:
If I use the same architecture, but only load the weights of the model with model.load_weigts( ... ), I do not have the problem.
My question now is, why does load_model seem to cause this and how do I solve the problem?
I'm using tensorflow 2.5.0.
Edit:
As I was not able to solve the problem and the answers did not help either, I simply used the load_weights method so that I created a new model and loaded the weights of the saved model like this:
model = myModel()
saved_model = load_model(args.model)
model.load_weights(saved_model + "/variables/variables")
In this way, the usage of RAM remained constant. Nevertheless an non-optimal solution, in my opinion.
There is a fundamental difference between load_model and load_weights. When you save an model using save_model you save the following things:
A Keras model consists of multiple components:
The architecture, or configuration, which specifies what layers the model contain, and how they're connected.
A set of weights values (the "state of the model").
An optimizer (defined by compiling the model).
A set of losses and metrics (defined by compiling the model or calling
add_loss() or add_metric()).
However when you save the weights using save_weights, you only saves the weights, and this is useful for the inference purpose, while when you want to resume the training process, you need a model object, that is the reason we save everything in the model. When you just want to predict and get the result save_weights is enough. To learn more, you can check the documentation of save/load models.
So, as you can see when you do load_model, it has many things to load as compared to load_weights, thus it will have more overhead hence your RAM usage.
Using Tensorflow's Custom Object Classification API w/ SSD MobileNet V2 FPNLite 320x320 as the base, I was able to train my model to succesfully detect classes A and B using Training Data 1 (about 200 images). This performed well on Test Set 1, which only has images of class A and B.
I wanted to add several classes to the model, so I constructed a separate dataset, Training Data 2 (about 300 images). This dataset contains labeled data for class B, and new classes C, D and E. However it does NOT include data for class A. Upon training the model on this data, it performed well on Test Set 2 which contained only images of B, C, D and E (however the accuracy on B did not go up despite extra data)
Concerned, I checked the accuracy of the model on Test Set 1 again, and as I had assumed, the model didn't recognize class A at all. In this case I'm assuming I didn't actually refine the model but instead retrained the model completely.
My Question: Am I correct in assuming I cannot refine the model on a completely separate set of data, and instead if I want to add more classes to my trained model that I must combine Training Set 1 and Training Set 2 and train on the entirety of the data?
Thank you!
It mostly depends on your hyperparameters, namely, your learning rate and the number of epochs trained. Higher learning rates will make the model forget the old data faster. Also, be sure not to be overfitting your data, have a validation set as well. Models that have overfit the training data tend to be very sensitive to weight (and data) perturbations.
TLDR. If not trained on all data, ML models tend to forget old data in favor of new data.
There is a lot of "moving parts". I propose the followings:
Take the "SSD MobileNet V2 FPNLite 320x320" as a basemodel without its last classification layer (argument include_top=False when loading the model), and freeze its parameters using command basemodel.trainable=False
Add new prediction layer with command prediction_layer=tf.keras.layers.Dense(1) and make other required things (details step by step in page https://www.tensorflow.org/tutorials/images/transfer_learning)
After the procedure above verify that you have understanding which parameters of the new network (including "old" convolutional part and your own new prediction layer) are trainable and which are not. Change the hyperparameters if needed.
Next train the network using a standard procedures.
Use directly final number of classes according to your idea (25). If you have no data yet for all classes, do not worry, generate some random images for the purpose, and of course take into account that the results are not valid for the classes with no appropriate data.
For simplicity divide the data - principally independently from the number of classes - to training and test data and nothing more complicated in first hand. When amount of data increases the statistics will diminish problems with sampling. And when training, monitor how the amount of data increase the performance of the classification.
So - in a nutshell - 1) make the network - 2) select which parameters to train - 3) train with one dataset and 4) test with another.
And finally direct answer for the question in title and in the end of the question:
-According to experience first utilize out all performance of the basemodel by training only the last layers of the network. After you are sure no more performance can be found this way, begin to finetune the convolutional layers tuning carefully hyperparameters.
-You can refine the model totally only by using your own new data; this is special benefit and art of transfer learning
I have a basic question. I want to know whether it is possible to use Keras (e.g. the functional API) to specify a neural network model, and then use the Keras optimization routines to train the network outside of the Keras training process. In other words, use Keras purely to specify a neural network, pull the weights out and put them into a loss function (outside the Keras APIs if necessary) then use one of the built-in optimization routines purely to minimize the loss function over a single batch of data (no multiple epochs etc. or anything beyond minimization of the loss function with regard to one set of data at this point).
My reason for wanting to do this is that I would like to use a dynamic optimization process that changes from batch iteration to batch iteration, which seems difficult to implement entirely within the Keras APIs.
I am trying to convert a keras model to tpu model in google colab, but this model has another model inside.
Take a look at the code:
https://colab.research.google.com/drive/1EmIrheKnrNYNNHPp0J7EBjw2WjsPXFVJ
This is a modified version of one of the examples in the google tpu documentation:
https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb
If the sub_model is converted and used directly it works, but if the sub model is inside another model it does not work. I need the sub model type of network because i am trying to train a GAN network that has 2 networks inside (gan=generator+discriminator) so if this test works probably it will work with the gan too.
I have tried several things:
Convert to tpu the model without converting the sub model, in that case when training starts an error is prompted related to the inputs of the sub model.
Convert both the model and sub model to tpu, in that case an error is prompted when converting the "parent" model, the exception only says at the end "layers".
Convert only the sub model to tpu, in that case no error is prompted but the training is not accelerated by the tpu and it is extremely slow like if no conversion to tpu was made at all.
Using fixed batch size or not, both have the same result, the model does not work.
Any ideas? Thanks a lot.
Divide into parts only use submodel at tpu first. Then put something simple instead of submodel and use the model in TPU. If this does not work , create something very simple which includes similar structure with models you are sure that are working and then step by step add things to converge your complex model which you want to use in TPU.
I am struggling with such things. What I did at the very beginning using MNIST is trained the model and get the coefficients outside rewrite relu dense dropout and NN matricies myself and run the model using numpy and then cupy and then pyopencl and then I replaced functions with my own raw cuda C and opencl functions so that getting deeper and simpler I can find what is wrong when something does not work. At last I write my genetic selective training algo and learned a lot.
And most important it gave me the opportunity to try some crazy ideas for training and modelling and manuplating and making sense of NN coffecients.
The problem in my opinion is TF - Keras etc are too high level. Optimizers - Solvers , there is too much unknown. Even neural networks are not under control. GAN is problematic while training it does not converge everytime takes days to train most of the time. Even if you train. You dont know any idea how it converges. Most of the tricks - techniques which protects you from vanishing gradient are not mathematically backed they are nevertheless works very amazingly. (?!?)
**Go simpler deeper and and complexity step by step. Follow a practicing on which you comprehend as much as you can ** It will cost some time and energy but you will benefit it tremendously in my opinion.
I'm in the process of getting my feet wet with tensorflow. I've started and successfully run the MNIST classifier with layers tutorial on my computer, now my objective is to play around with the parameters of the tutorial program (step size, layer properties). To do this, I want to collect the data on each run of the convolutional neural network and see if I can learn something from the changes of the network's output based on the changes of the network's design.
At the end of each run of the program (CNN MNIST Classifier) tensorflow returns numerous lines labeling what the classifier is doing. This one caught my eye:
INFO:tensorflow:Saving dict for global step 161: accuracy = 0.1663,
global_step = 161, loss = 2.286773
Now, I can copy/screenshot this line every time I want to record my run, but I'd like to figure out how to access the dictionary where this data is being collected.
I've tried find tensorflow in my bash, this wasn't very helpful.
My question is:
How would I find the run history for this classifier? Can you give me any advice on how to record/access this information in the future?