I want to fine tune ALBERT.
I see one can distribute neural net training over multiple gpus using tensorflow: https://www.tensorflow.org/guide/distributed_training
I was wondering if it's possible to distribute fine-tuning across both my laptop's gpu and a colab gpu?
I don't think that's possible. Because in order to do GPU distributed training, you need NVLinks among your GPUs. You don't have such a link between your laptop's GPU and Colab GPUs. This is a good read https://lambdalabs.com/blog/introduction-multi-gpu-multi-node-distributed-training-nccl-2-0/
Related
I'm running some notebooks which, at different points, are both CPU and GPU intensive. Running the notebook on my local PC is fast in terms of CPU power, but slow as my GPU cannot be used for Torch (I have a Ryzen 9 with an AMD GPU). On the other hand, running the notebook on the Colab GPU is fast in the GPU sections, but terribly slow in the CPU sections.
I know that it is possible to use my CPU using local runtime, but then I am also stuck with my local GPU. Is it possible to allocate only my local CPU and use the Google colab GPU at the same time?
(An alternative solution would be to run the CPU intensive code on my local machine, store the intermediate results, and then use the Google GPU for the GPU intensive parts. But this is of course sub optimal.)
As per my knowledge, you can distribute training among multiple GPUs, multiple machines, or TPUs with a built-in distribution strategy.
The MultiWorkerMirroredStrategy is the one you are looking for most likely. It has two implementations for cross-device communications. Most of all, CommunicationImplementation.RING is RPC-based and supports both CPU and GPU.
Details: Link 1 Link 2
When I train a simple neural network and I check the GPU usage, I've noticed that my TensorFlow scripts was run in CPU. And this's my configuration :
enter image description here
I am experimenting with designing a semantic segmentation network using Pytorch. It performs well on my computer. For better performance, we experimented by moving the network to a computer with a lot of GPU capacity. Basically, if you set the same environment for only the versions of Pytorch and torchvision and proceed with the experiment, performance degradation occurs even on the moved PC or Google Colab. I just copied and pasted the code and ran it as a test before doing other experiments.
The network structure is the same, but are there other external factors that will degrade performance? (ex: gpu, ram, etc...)...
I just got an intern position in which I need to help them with serving inference requests on multiple GPUs. However, the available Github resource I could find are all about training.
Is there any example using multiple GPU to do inference in Tensorflow (python).
I am testing the new Tensorflow Object Detection API in Python, and I succeeded in installing it on Windows using docker. However, my trained model (Faster RCNN resnet101 COCO) takes up to 15 seconds to make a prediction (with very good accuracy though), probably because I only use Tensorflow CPU.
My three questions are:
Considering the latency, where is the problem? I heard Faster RCNN was a good model for low latency visual detection, is it because of the CPU-only execution?
With such latency, is it possible to make efficient realtime video processing by using tensorflow GPU, or should I use a more popular model like YOLO?
The popular mean to use tensorflow GPU in docker is nvidia-docker but is not supported on windows. Should I continue to look for a docker (or conda) solution for local prediction, or should I deploy my model directly to a virtual instance with GPU (I am comfortable with Google Cloud Platform)?
Any advice and/or good practice concerning real-time video processing with Tensorflow is very welcome!
Considering the latency, where is the problem ? I heard Faster RCNN
was a good model for low latency visual detection, is it because of
the CPU-only execution ?
Of course, it's because you are using CPU.
With such latency, is it possible to make efficient realtime video
processing by using tensorflow GPU, or should I use a more popular
model like YOLO ?
Yolo is fast, but I once used it for face and accuracy was not that great. But a good alternative.
The popular mean to use tensorflow GPU in docker is nvidia-docker but
is not supported on windows. Should I continue to look for a docker
(or conda) solution for local prediction, or should I deploy my model
directly to a virtual instance with GPU (I am comfortable with Google
Cloud Platform) ?
I think you can still use your local GPU in windows, as Tensorflow supports GPU on python.
And here is an example, simply to do that. It has a client which can read webcam or IP cam stream. The server is using Tensorflow python GPU version and ready to use pre-trained model for predictions.
Unfortunately, Tensoflow does not support tensorflow-serving on windows. Also as you said Nvidia-Docker is not supported on windows. Bash on windows has no support for GPU either. So I think this is the only easy way to go for now.