I am already using Google Colab to train my model. So I will not use my own GPU for training. I want to ask, is there a performance difference beetween GPU and CPU while working with pre-trained model. I already trained a model with Google Colab GPU and used with my own local CPU. Should I use GPU for testing?
It depends how many predictions you need to do. Usually in training you are making many calculations therefore parallelisation by GPU shortens overall training time. Usually, when using a trained model you just need to do a sparse prediction per time unit. In such situation CPU approach should be OK. However, if you need to do as many predictions as during training then GPU would be beneficial. This can particularly be true with reinforcement training, when your model must adopt to continuously changing environmental input.
Related
We are testing CatBoost on both CPU and GPU.
While it runs much faster on GPU than on CPU, the results we are getting are so much worse and we are using the same data.
I am talking around 50% worse.
How is this possible?
We are using the following code to run it on CPU and only changing the task_type to GPU when running on GPU:
catBoostModel = CatBoostClassifier(
task_type="CPU",
early_stopping_rounds=50,
eval_metric="Precision",
cat_features=["Symbol"],
auto_class_weights="Balanced",
thread_count=-1
)
What are we missing?
There are some hyperparameters for which CatBoost uses different default values on CPU and GPU. There are also some hyperparameters that are only available on GPU or only available on CPU. The CatBoost documentation provides all the details.
This means that even if you are running the same code both on CPU and on GPU, you are likely training two different models. You can use model.get_all_params() (where model is your trained model object) to get the list of all hyperparameters and compare between CPU and GPU.
I am trying to train an XGBoost classifier in Python using the xgboost package. I am using the defaults on all the parameters for the classifier and my training set has around 16,000 elements and 180,000 features for each element. I am not using the gpu to train the model, but still, the training process has taken more than five hours and is still going. I have 32GB of RAM and a 6 core Intel I7. I am wondering if this is normal time for training this classifier with the amount of data I have because I have heard of people training the model in a couple of minutes.
If training time is concern then one can switch the tree growing policy tree_method to hist which is histogram based method. With GPU it should be set to gpu_hist. You can find more details about its xgboost implementation here http://arxiv.org/abs/1603.02754
This is the secret sauce which leads to super fast training without much compromise in the solution quality. In fact GPU based training and even lightGBM etc relies on histogram based techniques for faster training and subsequently iterations/experiments which matters a lot in time constrained kaggle type competitions. hist may cut training time to half or less and gpu_hist on gpu may take it to minutes.
PS: I would suggest to reduce the dimensionality of your data (16k X 180k) by removing correlated/rank-correlated features which will further improve not only your training time but also model performance.
I work on medical imaging so I need to train massive 3D CNNs that are difficult to fit into one GPU. I wonder if there is way to split a massive Keras or TensorFlow graph amongst multiple GPUs such that each GPU only runs a small part of the graph during training and inference. Is this type of distribute training possible with either Keras or TensorFlow?
I have tried using with tf.device('\gpu:#') when building the graph but I am experiencing memory overflow. The logs seem to indicate the entire graph is still being run on gpu:0.
My Keras model is currently looking at a lot of data and I personally don't feel comfortable letting my GPU reach 85 degrees...is there a way during a set amount epochs to tell my GPU to take a break?
I understand I could just break down the process into multiple training cycles but because I am using a ReduceLROnPlateau in my callback on an RNN model I would still like the entire training process to be done in one training cycle with the GPU taking small breaks to allow for longer training times with less risk to my personal hardware.
(Not adding code due to this just being a general question.)
First of all: this question is connected to neural network inference and not training.
I have discovered, that when doing inference of a trained neural network with only one image over and over on a GPU (e.g. P100) the utilization of the computing power with Tensorflow is not reaching 100%, but instead around 70%. This is also the case if the image does not have to be transferred to the GPU. Therefore, the issue has to be connected to constraints in the parallelization of the calculations. My best guesses for the reasons are:
Tensorflow can only utilize the parallelization capabilities of a GPU up to a certain level. (Also the higher utilization of the same model as a TensorRT models suggest that). In this case, the question is: What is the reason for that?
The inherent neural network structure with several subsequent layers avoids that a higher usage is possible. Therefore the problem is not overhead of a framework but lies in the general design of neural networks. In this case, the question is: What are the restrictions to that?
Both of the above combined.
Thanks for your ideas on the issue!
Why do you expect the GPU utilization to go to 100% when you run the neuronal network prediction for one image?
The GPU utilization is per time unit (e.g. 1 second). This means, when the neuronal network algorithm finished before this time unit elapsed (e.g within 0.5s) Then the rest of the time the GPU may get used by other programs or not get used at all. If the GPU is not used by any other programs neither then well you will not reach 100%.