I am experimenting with designing a semantic segmentation network using Pytorch. It performs well on my computer. For better performance, we experimented by moving the network to a computer with a lot of GPU capacity. Basically, if you set the same environment for only the versions of Pytorch and torchvision and proceed with the experiment, performance degradation occurs even on the moved PC or Google Colab. I just copied and pasted the code and ran it as a test before doing other experiments.
The network structure is the same, but are there other external factors that will degrade performance? (ex: gpu, ram, etc...)...
Related
I'm running some notebooks which, at different points, are both CPU and GPU intensive. Running the notebook on my local PC is fast in terms of CPU power, but slow as my GPU cannot be used for Torch (I have a Ryzen 9 with an AMD GPU). On the other hand, running the notebook on the Colab GPU is fast in the GPU sections, but terribly slow in the CPU sections.
I know that it is possible to use my CPU using local runtime, but then I am also stuck with my local GPU. Is it possible to allocate only my local CPU and use the Google colab GPU at the same time?
(An alternative solution would be to run the CPU intensive code on my local machine, store the intermediate results, and then use the Google GPU for the GPU intensive parts. But this is of course sub optimal.)
As per my knowledge, you can distribute training among multiple GPUs, multiple machines, or TPUs with a built-in distribution strategy.
The MultiWorkerMirroredStrategy is the one you are looking for most likely. It has two implementations for cross-device communications. Most of all, CommunicationImplementation.RING is RPC-based and supports both CPU and GPU.
Details: Link 1 Link 2
I want to fine tune ALBERT.
I see one can distribute neural net training over multiple gpus using tensorflow: https://www.tensorflow.org/guide/distributed_training
I was wondering if it's possible to distribute fine-tuning across both my laptop's gpu and a colab gpu?
I don't think that's possible. Because in order to do GPU distributed training, you need NVLinks among your GPUs. You don't have such a link between your laptop's GPU and Colab GPUs. This is a good read https://lambdalabs.com/blog/introduction-multi-gpu-multi-node-distributed-training-nccl-2-0/
Recently I discovered something rather strange with the project that I worked on for quite a while already. The model I have is rather conventional: a convnet with a few fully connected layers. For data loading I use tf.data API, but the same thing happens with queue-based code that I had before porting to tf.data. After a few hours since the training of the model begins, the CPU usage rises to very high levels, 1500-2000% as reported by the htop util. And at the beginning of training everything is fine, the main process shows only about 200% CPU usage. Attached is the screenshot of the htop output, and another thing that's worrying is all the child processes that also have pretty high CPU load.
I am using tensorflow-gpu version 1.11, running it on NVIDIA Tesla V100. I am pretty sure that the model does run on the GPU and not on the CPU: nvidia-smi shows that GPU is occupied at an about 70% rate.
Obviously, I cannot ask for an exact cause of this, and it would be difficult to strip the problem down to a reproducible test case. However, may be you could point me at some debugging techniques that are applicable in such case.
I am testing the new Tensorflow Object Detection API in Python, and I succeeded in installing it on Windows using docker. However, my trained model (Faster RCNN resnet101 COCO) takes up to 15 seconds to make a prediction (with very good accuracy though), probably because I only use Tensorflow CPU.
My three questions are:
Considering the latency, where is the problem? I heard Faster RCNN was a good model for low latency visual detection, is it because of the CPU-only execution?
With such latency, is it possible to make efficient realtime video processing by using tensorflow GPU, or should I use a more popular model like YOLO?
The popular mean to use tensorflow GPU in docker is nvidia-docker but is not supported on windows. Should I continue to look for a docker (or conda) solution for local prediction, or should I deploy my model directly to a virtual instance with GPU (I am comfortable with Google Cloud Platform)?
Any advice and/or good practice concerning real-time video processing with Tensorflow is very welcome!
Considering the latency, where is the problem ? I heard Faster RCNN
was a good model for low latency visual detection, is it because of
the CPU-only execution ?
Of course, it's because you are using CPU.
With such latency, is it possible to make efficient realtime video
processing by using tensorflow GPU, or should I use a more popular
model like YOLO ?
Yolo is fast, but I once used it for face and accuracy was not that great. But a good alternative.
The popular mean to use tensorflow GPU in docker is nvidia-docker but
is not supported on windows. Should I continue to look for a docker
(or conda) solution for local prediction, or should I deploy my model
directly to a virtual instance with GPU (I am comfortable with Google
Cloud Platform) ?
I think you can still use your local GPU in windows, as Tensorflow supports GPU on python.
And here is an example, simply to do that. It has a client which can read webcam or IP cam stream. The server is using Tensorflow python GPU version and ready to use pre-trained model for predictions.
Unfortunately, Tensoflow does not support tensorflow-serving on windows. Also as you said Nvidia-Docker is not supported on windows. Bash on windows has no support for GPU either. So I think this is the only easy way to go for now.
I have been studying neural networks for some weeks. Furthermore even if I always used R the Keras library in Python was really helpful with someone with a small programming background like me.
Keras it's a very nice interface which allows the customization I need without even invoking the backend, unless for some custom loss metrics I used.
Being that straightforward is also the Hardware specification, which for example allows you to switch from the CPU of the machine where you have your Python+Keras installed to the machine (compatible) GPU, allowing to exploit the strong parallelization of neural networks when training them.
I was wondering if there is anything which allows you to switch to hadoop cluster training of neural networks with the same kind of ease.
Moreover is there some hadoop open source cluster available to do so?
Thank you for your help