I am currently using a JupyterLab in a Tambora Server provided by my campus. Spec of server: https://server-if.github.io/.
I am running this code, from this https://github.com/huggingface/transformers/tree/main/examples/flax/image-captioning using my own dataset.
Obviously, I would need a GPU/TPU to run this code, otherwise I will be spending so much time and resource to run. I've tried to utilize Tensorflow.
Then, I try to use it so my code can run on GPU:
But on the output text, this always shows (look at the bottom), even after I use TF_CPP_MIN_LOG_LEVEL=0:
Is there any way I can use GPU on this JupyterLab?
Also, I've tried to check this:
Related
I'm running a CNN algorithm on an interpreter which is running on a remote machine. The code is pretty simple and resembles this: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html
But I added a TensorBoard writer to draw the training and validation loss as well as 3 images per epoch.
I used both PyCharm and MobaXterm to access the remote server (an Amazon AWS EC2 instance). But I've noticed significant performance differences whether I run the code from PyCharm or from MobaXterm with a command line.
The results averaged on 10 iterations.
Pycharm: Each iteration takes 55s.
MobaXterm: Each iteration takes 95s.
I've made sure that both are using the same conda environment and the same Python to run (Python 3.7).
Maybe I'm missing a key comprehension of MobaXterm and working with a remote interpreter?
I have a program running on Google Colab in which I need to monitor GPU usage while it is running. I am aware that usually you would use nvidia-smi in a command line to display GPU usage, but since Colab only allows one cell to run at once at any one time, this isn't an option. Currently, I am using GPUtil and monitoring GPU and VRAM usage with GPUtil.getGPUs()[0].load and GPUtil.getGPUs()[0].memoryUsed but I can't find a way for those pieces of code to execute at the same time as the rest of my code, thus the usage numbers are much lower than they actually should be. Is there any way to print the GPU usage while other code is running?
If you have Colab Pro, can open Terminal, located on the left side, indicated as '>_' with a black background.
You can run commands from there even when some cell is running
Write command to see GPU usage in real-time:
watch nvidia-smi
Used wandb to log system metrics:
!pip install wandb
import wandb
wandb.init()
Which outputs a URL in which you can view various graphs of different system metrics.
A little more clear explaination.
Go to weights and bias and create
your account.
Run the following commands.
!pip install wandb
import wandb
wandb.init()
Go to the link in your notebook for authorization - copy the API key.
Paste the key in notebook input field.
After Authorization you will find another link in notebook - see your Model + System matrices there.
You can run a script in background to track GPU usage.
Step 1: Create a file to monitor GPU usage in a jupyter cell.
%%writefile gpu_usage.sh
#! /bin/bash
#comment: run for 10 seconds, change it as per your use
end=$((SECONDS+10))
while [ $SECONDS -lt $end ]; do
nvidia-smi --format=csv --query-gpu=power.draw,utilization.gpu,memory.used,memory.free,fan.speed,temperature.gpu >> gpu.log
#comment: or use below command and comment above using #
#nvidia-smi dmon -i 0 -s mu -d 1 -o TD >> gpu.log
done
Step 2: Execute the above script in the background in another cell.
%%bash --bg
bash gpu_usage.sh
Step 3: Run the inference.
Note that the script will record GPU usage for first 10 seconds, change it as per your model running time.
The GPU utilization results will be saved in gpu.log file.
There is another way to see gpu usage but this method only works for seeing the memory usage. Go to click runtime -> manage sessions. This allows you to see how much memory it takes so that you can increase your batch size.
You can use Netdata to do this - it's open source and free, and you can monitor a lot more than just gpu usage on your colab instance. Here's me monitoring CPU usage while training a large language model.
Just claim your colab instance as a node, and if you're using Netdata cloud you can monitor multiple colab instances simultaneously as well.
Pretty neat.
I am working on some very lengthy calculations (8 hours). While doing these calculations, I was working on something else in chrome. Something went wrong on that website, and chrome shut down, where also my jupyter notebook file was running. Now I have started it back up and the logo is still indicates the program is running (it shows the hourglass icon), but I am not sure if this is actually true, in that case I would like to restart the program as quickly as I can.
Hope you guys can help! Thanks!
I have just tested this on locally running Jupyter 4.4.0.
Cells submitted for running will complete as usual (assuming no exception occurs) as long as the kernel is still alive. After that computation is done, you can continue working on the notebook as usual. All changes to that kernel session are preserved, for example if you define a function or save your result in a variable, they will be available later. If you have it doing intensive computation, you can check your system monitor: python consuming lots of CPU means that it is probably still running.
If you have unsaved changes to your notebook, for example new code or cells, they will be lost. The code in them still seems to be executed though if it was set to run (Ctrl+Enter).
If you open localhost:8888 in a browser again, you should be able to see if the kernel is running (e.g. the hourglass icon). The running/idle detection seems to work fine upon reconnect.
However, the new browser session never gets updates from other sessions. This means that everything sent by the running code to the standard output (e.g. with print) after the disconnect is irretrievably lost, but you can still see what it printed before you got disconnected, assuming it was (auto-)saved. Once the kernel is done and you run cells from this new session, your browser will correctly get updates and display output as usual. Apparently (#641, #1150, #2833; thanks #unutbu) it is still not fixed due to Jupyter's architecture requiring a huge rework for that to function.
You can also attach a console with jupyter console --existing your-kernel-session-uuid, but it will not respond until the kernel is idle.
I'm using an EC2 spot instance (my windows to ubuntu instance) to run a function that was well beyond my laptop's capabilities. The kernel busy dot has been filled for hours. Previously, I would just listen to my laptop as it was obvious when something was running as opposed to ipnb getting stuck. Is there any way I can tell now?
If I try something like 1+1 in the box below my function it will also turn into an asterisk, but I can open a new notebook and have zero issues running simple commands in the new notebook.
this is because all though each notebook has multiple cells but only one kernel, so the commands in the other cells are queued until the first cell finishes its task. When you open a new notebook that notebook is provided with its own kernel so it can do other simple commands quickly without attempting whatever it is that's taking so much cpu power
I'm playing around with some python deep learning packages (Theano/Lasagne/Keras). I've been running it on CPU on my laptop, which takes a very long time to train the models.
For a while I was also using Amazon GPU instances, with an iPython notebook server running, which obviously ran much faster for full runs, but was pretty expensive to use for prototyping.
Is there any way to set things up that would let me prototype in iPython on my local machine, and then when I have a large model to train spin up a GPU instance, do all processing/training on that, then shut down the instance.
Is a setup like this possible, or does anyone have any suggestions to combine the convenience of the local machine with temporary processing on AWS?
My thoughts so far were along the lines of
Prototype on local ipython notebook
Set up cell to run a long process from start to
finish.
Use boto to start up an ec2 instance ssh into the instance
using boto's sshclient_from_instance
ssh_client = sshclient_from_instance(instance,
key_path='<path to SSH keyfile>',
user_name='ec2-user')
Get the contents of the cell I've set up using the script using the solution here, say the script is in cell 13 Execute that script using
ssh_client.run('python -c "'+ _i13 + '"' )
Shut down instance using boto
This just seems a bit convoluted, is there a proper way to do this?
So when it comes to EC2 you don't have to shut down the instance every time. The beauty of AWS is that you stop and start your instance when you use it, and only pay for the time you have it up and running. Also you can always try your code on a smaller and cheaper instance, and if its too slow for your liking then you just scale up to a larger instance.