CatBoost on GPU provides much worse performance than on CPU

CatBoost on GPU provides much worse performance than on CPU - python

We are testing CatBoost on both CPU and GPU.
While it runs much faster on GPU than on CPU, the results we are getting are so much worse and we are using the same data.
I am talking around 50% worse.
How is this possible?
We are using the following code to run it on CPU and only changing the task_type to GPU when running on GPU:
catBoostModel = CatBoostClassifier(
task_type="CPU",
early_stopping_rounds=50,
eval_metric="Precision",
cat_features=["Symbol"],
auto_class_weights="Balanced",
thread_count=-1
)
What are we missing?

There are some hyperparameters for which CatBoost uses different default values on CPU and GPU. There are also some hyperparameters that are only available on GPU or only available on CPU. The CatBoost documentation provides all the details.
This means that even if you are running the same code both on CPU and on GPU, you are likely training two different models. You can use model.get_all_params() (where model is your trained model object) to get the list of all hyperparameters and compare between CPU and GPU.

Related

How to release GPU memory in tensorflow? (opposite of `allow_growth` → `allow_shrink`?)

I'm using a GPU to train quite a lot of models. I want to tune the architecture of the network, so I train different models sequentially to compare their performances (I'm using keras-tuner).
The problem is that some models are very small, and some others are very large. I don't want to allocate all the GPU memory to my trainings, but only the quantity I need. I've TF_FORCE_GPU_ALLOW_GROWTH to true, meaning that when a model requires a large quantity of memory, then the GPU will allocate it. However, once that big model has been trained, the memory will not be released, even if the next trainings are tiny models.
Is there a way to force the GPU to release unused memory? Something like TF_FORCE_GPU_ALLOW_SHRINK?
Maybe having an automatic shrinking might be difficult to achieve. If so I would be happy with a manual releasing that I could add in a callback to be run after each training.

You can try by limiting GPU memory growth using this code:
import tensorflow as tf
gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
The second method is to configure a virtual GPU device with tf.config.set_logical_device_configuration and set a hard limit on the total memory to allocate it to the GPU.
Please check this link for more details.

CPU RAM out of memory when using multiple Pytorch models in GPU

I'm interested in running multiple Pytorch models in a single GPU, more precisely YOLOv5-small in a single 3090. However, I have a problem when loading several models as the CPU RAM runs out of memory and I want to run inference in the GPU.
First I tried loading the architecture by the default way:
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
model = model.to('cuda')
but whenever the model is loaded in the GPU, both the CPU RAM and GPU RAM increase by a huge amount.
Then I tried to load a torchvision model to see if there was a problem in the yolov5 implementation:
model = torch.hub.load('pytorch/vision:v0.9.0', 'deeplabv3_resnet101', pretrained=True)
In this case, loading the model to CPU increased in a reasonable amount the CPU RAM. But when the model is sent to the GPU with model.to('cuda'), I obtain the same performance as in the first scenario.
I have tried using gc.collect() and del model once it has send to gpu, but I haven't managed solving this issue.
I am using:
CPU: AMD Ryzen 9 3900XT 12-Core Processor.
32GB of RAM memory.
GPU: GeForce RTX 3090.
Hope someone knows what exactly happens.
Thanks in advance.

Do I need gpu while working with pretrained model?

I am already using Google Colab to train my model. So I will not use my own GPU for training. I want to ask, is there a performance difference beetween GPU and CPU while working with pre-trained model. I already trained a model with Google Colab GPU and used with my own local CPU. Should I use GPU for testing?

It depends how many predictions you need to do. Usually in training you are making many calculations therefore parallelisation by GPU shortens overall training time. Usually, when using a trained model you just need to do a sparse prediction per time unit. In such situation CPU approach should be OK. However, if you need to do as many predictions as during training then GPU would be beneficial. This can particularly be true with reinforcement training, when your model must adopt to continuously changing environmental input.

How can I determine how much GPU memory a Tensorflow model requires?

I want to find out how much GPU memory my Tensorflow model needs at inference. So I used tf.contrib.memory_stats.MaxBytesInUse which returned 6168 MB.
But with config.gpu_options.per_process_gpu_memory_fraction I can use a way smaller fraction of my GPU and the model still runs fine without needing more time for one inference step.
Is there a way to determine how much GPU memory a Tensorflow model requires? I could just decrease the GPU memory fraction until TF crashes, but I guess there is a more elegant and precise way?

Why will GPU usage run low in NN training?

I'm running a NN training on my GPU with pytorch.
But the GPU usage is strangely "limited" at about 50-60%.
That's a waste of computing resources but I can't make it a bit higher.
I'm sure that the hardware is fine because running 2 of my process at the same time,or training a simple NN (DCGAN,for instance) can both occupy 95% or more GPU.(which is how it supposed to be)
My NN contains several convolution layers and it should use more GPU resources.
Besides, I guess that the data from dataset has been feeding fast enough,because I used workers=64 in my dataloader instance and my disk works just fine.
I just confused about what is happening.
Dev details:
GPU : Nvidia GTX 1080 Ti
os:Ubuntu 64-bit

I can only guess without further research but it could be that your network is small in terms of layer-size (not number of layers) so each step of the training is not enough to occupy all the GPU resources. Or at least the ratio between the data size and the transfer speed (to the gpu memory) is bad and the GPU stays idle most of the time.
tl;dr: the gpu jobs are not long enough to justify the memory transfers

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

CatBoost on GPU provides much worse performance than on CPU - python

Related

How to release GPU memory in tensorflow? (opposite of `allow_growth` → `allow_shrink`?)

CPU RAM out of memory when using multiple Pytorch models in GPU

Do I need gpu while working with pretrained model?

How can I determine how much GPU memory a Tensorflow model requires?

Why will GPU usage run low in NN training?

Categories

Resources