About cleaning memory used in GPU with Tensorflow operation - python

Here is my installed environment,
Cuda 10.0
Nvcc
Ubuntu 20.04
Python 3.7
Tensorflow 2.0 GPU
When I simply run a style transfer code like the code provided here
https://www.tensorflow.org/tutorials/generative/style_transfer
It runs well if the process is done, but when I try to use Django-q to run it. The GPU memory will be kept even after my style transfer task is done.
I have tried the following methods to test if the GPU memory can be released but keep my Django-q working.
I manually used the command 'kill -9 ' and restart the Django-q with the command, it worked
I ran the commands in Method 1 programmatically, it released the GPU memory but the future style transfer job could no longer reach the GPU.
I restarted the server, and used a script to init my Django-q again, it worked but it took time.
I used the session cleared method of Tensorflow, but it did not allow my later style transfer job to utilize the GPU.
I have the following questions, which will be greatly appreciated if they can also be answered,
Why I ran the command programmatically, it cannot utilize GPU for my later style transfer jobs
How I can release and use the GPU for later jobs but keeping the Django-q without restarting the server. (Which is my main question)
Thanks for your time.

Related

How to make my JupyterLab on a Server runs on GPU?

I am currently using a JupyterLab in a Tambora Server provided by my campus. Spec of server: https://server-if.github.io/.
I am running this code, from this https://github.com/huggingface/transformers/tree/main/examples/flax/image-captioning using my own dataset.
Obviously, I would need a GPU/TPU to run this code, otherwise I will be spending so much time and resource to run. I've tried to utilize Tensorflow.
Then, I try to use it so my code can run on GPU:
But on the output text, this always shows (look at the bottom), even after I use TF_CPP_MIN_LOG_LEVEL=0:
Is there any way I can use GPU on this JupyterLab?
Also, I've tried to check this:

Display GPU Usage While Code is Running in Colab

I have a program running on Google Colab in which I need to monitor GPU usage while it is running. I am aware that usually you would use nvidia-smi in a command line to display GPU usage, but since Colab only allows one cell to run at once at any one time, this isn't an option. Currently, I am using GPUtil and monitoring GPU and VRAM usage with GPUtil.getGPUs()[0].load and GPUtil.getGPUs()[0].memoryUsed but I can't find a way for those pieces of code to execute at the same time as the rest of my code, thus the usage numbers are much lower than they actually should be. Is there any way to print the GPU usage while other code is running?
If you have Colab Pro, can open Terminal, located on the left side, indicated as '>_' with a black background.
You can run commands from there even when some cell is running
Write command to see GPU usage in real-time:
watch nvidia-smi
Used wandb to log system metrics:
!pip install wandb
import wandb
wandb.init()
Which outputs a URL in which you can view various graphs of different system metrics.
A little more clear explaination.
Go to weights and bias and create
your account.
Run the following commands.
!pip install wandb
import wandb
wandb.init()
Go to the link in your notebook for authorization - copy the API key.
Paste the key in notebook input field.
After Authorization you will find another link in notebook - see your Model + System matrices there.
You can run a script in background to track GPU usage.
Step 1: Create a file to monitor GPU usage in a jupyter cell.
%%writefile gpu_usage.sh
#! /bin/bash
#comment: run for 10 seconds, change it as per your use
end=$((SECONDS+10))
while [ $SECONDS -lt $end ]; do
nvidia-smi --format=csv --query-gpu=power.draw,utilization.gpu,memory.used,memory.free,fan.speed,temperature.gpu >> gpu.log
#comment: or use below command and comment above using #
#nvidia-smi dmon -i 0 -s mu -d 1 -o TD >> gpu.log
done
Step 2: Execute the above script in the background in another cell.
%%bash --bg
bash gpu_usage.sh
Step 3: Run the inference.
Note that the script will record GPU usage for first 10 seconds, change it as per your model running time.
The GPU utilization results will be saved in gpu.log file.
There is another way to see gpu usage but this method only works for seeing the memory usage. Go to click runtime -> manage sessions. This allows you to see how much memory it takes so that you can increase your batch size.
You can use Netdata to do this - it's open source and free, and you can monitor a lot more than just gpu usage on your colab instance. Here's me monitoring CPU usage while training a large language model.
Just claim your colab instance as a node, and if you're using Netdata cloud you can monitor multiple colab instances simultaneously as well.
Pretty neat.

GPU crashes when running Keras/tensorflow-gpu, specifically when clock speed goes to idle at 0 MHz

I'm using Jupyter Notebook to run Keras with a Tensorflow GPU backend. I've done some testing with various dummy models while simultaneously monitoring my GPU usage using MSI Afterburner, GPU-Z, nvidia-smi and Task Manager. My GPU is a GeForce GTX 960M, which has no issues running games. The temperatures are also low when running Keras.
What I've noticed is that the Keras runs fine (e.g. loading or training a model) in the beginning but whenever Keras is not running anything, the GPU naturally wants to idle from 1097 MHz to 0 MHz and as soon as it does that the GPU crashes. I can see that the "GPU is lost" on NVSMI. I have to then disable and re-enable my GPU in the Device Manager to get it to work.
Does anyone have any idea why this might be happening?
Edit: I can temporarily prevent this from happening for very small programs by using the "allow_growth" feature as follows:
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
set_session(sess)
However, this only works if the operation is really small such that it uses only about 0.1 GB of GPU memory such as loading a model or running a really small model. However, if the program is using memory of even 0.3 GB of memory my GPU crashes since the memory does not go to 0 GB before the clock speed drops to 0 MHz (lower power state).
I was finally able to figure out the issue thanks to someone from another forum. It was a driver issue. The latest drivers provided by Nvidia are causing the issue unlike the old drivers provided by my laptop manufacturer.
Since I was not able to run tensorflow with my old drivers and do more troubleshooting, what I did was download eDrawings Viewer and open up some random assembly drawings I found online. First I tried with the latest Nvidia drivers, and I see that when I manipulate the models, my card is at P0 state but if I don't do anything and let the software idle, my card goes to a lower power state and crashes my GPU. But when I did the same exercise with my ASUS manufacturer-certified drivers (since this software was compatible even with the older drivers unlike TF), my GPU did NOT crash.
What I also discovered was that eDrawings Viewer does not crash even with the latest Nvidia drivers if I go into the Nvidia Control Panel and select "Prefer Maximum Performance" under Power Management Mode. The card stays at P0 state whenever I have the software open even after idling for minutes. Unfortunately, since python.exe does not have a graphical interface, this option does not work for my case. As a workaround, I can still run tensorflow without getting it to crash by running eDrawings Viewer in the background (or really any program that uses a graphical interface), which keeps my card at the P0 State.

sending job via ssh in python

I have to run a script on several machines in a compute cluster using SSH. But before I run the script I have to log in into a node in the cluster using ssh, and then use nvidia-smi to check which GPU is free (as there is no job-scheduler in place at the moment). Each node has several GPUs. So I typically access, say GPU1, by issuing ssh gpu1...followed by nvidia-smi which just outputs a list of gpus and processes and utilization of each gpu.
I need to automate all this. That is, say we have 4 GPUs : gpu1...gpu4.
I want to be able to ssh into each of these, check their utlization, then run a python script run_test.py -arg1 on the gpu that is free.
How can I write a python script that can do all these ?
I'm new to Python so need some help pls...

Running python process on EC2 from local ipython notebook

I'm playing around with some python deep learning packages (Theano/Lasagne/Keras). I've been running it on CPU on my laptop, which takes a very long time to train the models.
For a while I was also using Amazon GPU instances, with an iPython notebook server running, which obviously ran much faster for full runs, but was pretty expensive to use for prototyping.
Is there any way to set things up that would let me prototype in iPython on my local machine, and then when I have a large model to train spin up a GPU instance, do all processing/training on that, then shut down the instance.
Is a setup like this possible, or does anyone have any suggestions to combine the convenience of the local machine with temporary processing on AWS?
My thoughts so far were along the lines of
Prototype on local ipython notebook
Set up cell to run a long process from start to
finish.
Use boto to start up an ec2 instance ssh into the instance
using boto's sshclient_from_instance
ssh_client = sshclient_from_instance(instance,
key_path='<path to SSH keyfile>',
user_name='ec2-user')
Get the contents of the cell I've set up using the script using the solution here, say the script is in cell 13 Execute that script using
ssh_client.run('python -c "'+ _i13 + '"' )
Shut down instance using boto
This just seems a bit convoluted, is there a proper way to do this?
So when it comes to EC2 you don't have to shut down the instance every time. The beauty of AWS is that you stop and start your instance when you use it, and only pay for the time you have it up and running. Also you can always try your code on a smaller and cheaper instance, and if its too slow for your liking then you just scale up to a larger instance.

Categories