I am conducting a research which requires me to know the memory used during run time by the model when i run a deep learning model(CNN) in google colab. Is there any code i can use to know the same .Basically I want to know how much memory has been used in total model run .(after all epoch has been complete). I am coding in python
Regards
Avik
As explained in this post and my own observations, Tensorflow always tries to allocate the entire memory, no matter how small or big is your model. Unlike for example MXNet that only allocates enough memory to run the model.
Diving a little deeper, I learned that this is indeed the default
behaviour in Tensorflow: use all available RAM to speed things up.
Fair enough :)
You might think more memory allocation means faster training, but that's not the case most of the times. You can restrict your TF memory usage, as shown in the following code:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
config.gpu_options.visible_device_list = "0"
set_session(tf.Session(config=config))
Here is the Tensorflow documentation if you need more details of how to set restrictions on TF memory usage.
Related
I'm using a GPU to train quite a lot of models. I want to tune the architecture of the network, so I train different models sequentially to compare their performances (I'm using keras-tuner).
The problem is that some models are very small, and some others are very large. I don't want to allocate all the GPU memory to my trainings, but only the quantity I need. I've TF_FORCE_GPU_ALLOW_GROWTH to true, meaning that when a model requires a large quantity of memory, then the GPU will allocate it. However, once that big model has been trained, the memory will not be released, even if the next trainings are tiny models.
Is there a way to force the GPU to release unused memory? Something like TF_FORCE_GPU_ALLOW_SHRINK?
Maybe having an automatic shrinking might be difficult to achieve. If so I would be happy with a manual releasing that I could add in a callback to be run after each training.
You can try by limiting GPU memory growth using this code:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
The second method is to configure a virtual GPU device with tf.config.set_logical_device_configuration and set a hard limit on the total memory to allocate it to the GPU.
Please check this link for more details.
I have a CPU+GPU instance that I'm using to train tf models. My data is on a SSD. I have used tf's Dataset API, with interleaving, mapping and no pyfunc in order for it to run efficiently without being i/o bound. It was working well with <1% time spent waiting on input data but I can't track down the changes that caused the program to become i/o bound. A quick summary of the code is that it loads npy files using tf.data.FixedLengthRecordDataset , stacks them and batches them. Any hints you can see from the profile? It looks sparse with a lot of interuptions as if parallelism isn't working properly.
ds = dataset.interleave(
numpy_file_parser, tf.data.experimental.AUTOTUNE
)
ds_train = (ds
.repeat()
.shuffle(1000, reshuffle_each_iteration=True)
.batch(batch_size)
.prefetch(tf.data.experimental.AUTOTUNE)
)
Inefficient attempt:
Here is the profile without i/o bound.
Turns out it was caused by the TF 2.3.0. I'm using a 6.1 GPU which is not fully supported in TF 2.3. In the release notes
GPU TF 2.3 includes PTX kernels only for compute capability 7.0 to reduce the TF pip binary size. Earlier releases included PTX for a
variety of older compute capabilities.
Reverting to TF 2.2 fixes the problem.
I want to find out how much GPU memory my Tensorflow model needs at inference. So I used tf.contrib.memory_stats.MaxBytesInUse which returned 6168 MB.
But with config.gpu_options.per_process_gpu_memory_fraction I can use a way smaller fraction of my GPU and the model still runs fine without needing more time for one inference step.
Is there a way to determine how much GPU memory a Tensorflow model requires? I could just decrease the GPU memory fraction until TF crashes, but I guess there is a more elegant and precise way?
We have Ubuntu 18.04 installed machine with an RTX 2080 Ti GPU with about 3-4 users using it remotely. Is it possible to give a maximum threshold GPU usage per user (say 60%) so any other could use the rest?
We are running tensorflow deep learning models if it helps to suggest an alternative.
My apologies for taking so long to come back here to answer the question, even after figuring out a way to do this.
It is indeed possible to threshold the GPU usage with tensorflow's per_process_gpu_memory_fraction. [Hence I edited the question]
Following snippet assigns 46% of GPU memory for the user.
init = tf.global_variables_initializer()
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.46)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
sess.run(init)
############
#Training happens here#
############
Currently, we have 2 users using the same GPU simultaneously without any issues. We have assigned 46% per each. Make sure you don't make it 50-50% (aborted, core dumped error occurs if you do so). Try to keep around 300MB memory in idle.
And as a matter of fact, this GPU division does not slow down the training process. Surprisingly, it offers about the same speed as if the full memory is used, at least according to our experience. Although, this may change with high dimensional data.
I am using a pre-tained vector to create an embedding like so
import numpy
import gensim
import tensorflow
ft_model=gensim.models.KeyedVectors.load_word2vec_format("ft_model.vec")
vocabulary=ft_model.vocab
embeddings=numpy.array([ft_model.word_vec(x) for x in vocabulary.keys()])
vocabulary_size=len(vocabulary)
embedding_size=embeddings.shape[1]
W=tensorflow.Variable(
tensorflow.constant(0.0, shape=[vocabulary_size, embedding_size]),
trainable=False,
name="W"
)
embedding_placeholder=tensorflow.placeholder(
tensorflow.float32,[vocabulary_size,embedding_size],
name="fasttext_vector"
)
embedding_init=W.assign(embedding_placeholder)
data_placeholder=tensorflow.placeholder(tensorflow.int32,shape=[None, max_length])
embedding_layer=tensorflow.nn.embedding_lookup(W, data_placeholder)
I get an error after it briefly runs through 1 or two training batches and the code crashes completely!
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5000,14621,100]
The stack trace cleary states that this is caused by the embedding_layer=tensorflow.nn.embedding_lookup(W, data_placeholder) line.
Any idea what could be causing this? 100 is the embedding size but those other numbers (5000, 14621) are rather strange, larger than I exected, and seem to be causing TensorFlow to completely chew up all GPU memory!
embedding lookups seem like such a common thing and the .vec file I am incorporating is very small.
It could be that you computer runs out of memory (RAM). Take a look at the taskmanager before you initiate your model.
I have 16 GB and was at 79%, so it ran out. It might help using a jupyter notebook to see the amount of Ram left after having the data prepared