I have a Python application where I'm using TensorFlow and I'm running it inside of a Docker container. When running locally I see the memory usage stay well under 4GB of Ram, but there are some large files being written and processed. When TensorFlow reaches the point of creating its first checkpoint file I get the following exception:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
My model is complex so this file may be above 1GB, but also my data is images so I've already downloaded about 30GB of data just to begin running the model so I don't know if it's just by chance this keeps happening here or if this file actually happens to be too large. I'm only loading a small batch of images into memory for model training per epoch so I'm trying to keep the RAM usage low. My VirtualBox config looks like so:
The error appears to be using C++ so I assume it's coming from TensorFlow code internally. Has anyone seen anything like this or know what I can change? I feel that there is enough RAM allocated but maybe my disk access isn't configured correctly?
Most likely not enough RAM. 4GB is very small for running TensorFlow (particularly in python with a default install) for training. Your model size may well be below 1GB, but during training - and when writing out a checkpoint - TensorFlow will temporarily allocate more memory for buffering, and that's probably where you're getting an OOM error. If you run fine normally with 4GB until checkpointing, 8GB should be fine.
Related
I started utilizing my GPU to train a CNN model for the Cats and Dogs dataset.
But when the run the model sometimes I get this error:
InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized
What should I do to prevent this?
The error message is generated when there is not enough memory in GPU when training usually caused by the batch size.
The simplest solution is to restart the kernel in your jupyter Lab and reduce the batch size to an optimal amount.
What you can do further is to monitor memory usage during runs and log run metadata, which then can be used to determine the optimal batch size. For this you can use Tensorboard. Additionally, by default Tensorflow will try to allocate as much GPU memory as possible. You can change this using the GPUConfig options, so that Tensorflow will only allocate as much memory as needed. Check this GitHub issue out.
Note that repeatedly running your models can create overhead and it is best if you can restart your runtime from time to time to just clear everything out if you conducting very intensive experiments.
I am trying to train a neural network on a subset of the GPU's that I have access to.
The problem is that I want to use only two GPU's so I wrote the following command in the .bashrc:
export CUDA_VISIBLE_DEVICES=6,7
However, when I watch the gpus, I find that my program is mainly using other GPU's (4,5) with 100% memory utilization and using only the visible gpus with very low memory utilization (9% maximum).
I have already closed and reopened all the terminals to restart the ~/.bashrc but the problem still persists!
Any help is appreciated!
P.S: I already successfully ran the same program before on only two GPUs but on a different server, so the memory is not a constraint here.
Recently I discovered something rather strange with the project that I worked on for quite a while already. The model I have is rather conventional: a convnet with a few fully connected layers. For data loading I use tf.data API, but the same thing happens with queue-based code that I had before porting to tf.data. After a few hours since the training of the model begins, the CPU usage rises to very high levels, 1500-2000% as reported by the htop util. And at the beginning of training everything is fine, the main process shows only about 200% CPU usage. Attached is the screenshot of the htop output, and another thing that's worrying is all the child processes that also have pretty high CPU load.
I am using tensorflow-gpu version 1.11, running it on NVIDIA Tesla V100. I am pretty sure that the model does run on the GPU and not on the CPU: nvidia-smi shows that GPU is occupied at an about 70% rate.
Obviously, I cannot ask for an exact cause of this, and it would be difficult to strip the problem down to a reproducible test case. However, may be you could point me at some debugging techniques that are applicable in such case.
I made a very interesting observation while trying to tune my recommendation engine on a Tesla K80 housed on Google Cloud Platform. Unfortunately I have been unsuccessful at finding any literature which might point me in the right direction. Here is my predicament ...
I run the same code for fitting a fully connected model using a python script and a jupyter notebook. What is surprising is that with the same hyper-parameters (batch size etc.), the code runs faster using the jupyter notebook kernel and uses more memory on the GPU than when I fit the model using a python code invoked from the shell.
Since, I want my code to run in the least possible time and jupyter often ends up closing the connection when unattended due to web socket time-out, is there any way I can increase the amount of memory a python process can use on the GPU ? Open to any alternating way of making this work. Thanks in advance.
I have a Theano code which is using batches of data. When I am increasing the batch size, it is throwing me the SegFault error. It is working perfectly fine upto batch size = 750. But increasing it to 1000 is throwing me error. I also checked that it is using only 50MB of GPU memory at any time. But I have 128MB GPU memory on my system. Can anyone help me here to debug the issue.
You can profile Theano's memory use by enabling the profile and profile_memory configuration flags. See the documentation for more details.
There is also more information on Python/Theano memory management here.