I am trying to train an XGBoost classifier in Python using the xgboost package. I am using the defaults on all the parameters for the classifier and my training set has around 16,000 elements and 180,000 features for each element. I am not using the gpu to train the model, but still, the training process has taken more than five hours and is still going. I have 32GB of RAM and a 6 core Intel I7. I am wondering if this is normal time for training this classifier with the amount of data I have because I have heard of people training the model in a couple of minutes.
If training time is concern then one can switch the tree growing policy tree_method to hist which is histogram based method. With GPU it should be set to gpu_hist. You can find more details about its xgboost implementation here http://arxiv.org/abs/1603.02754
This is the secret sauce which leads to super fast training without much compromise in the solution quality. In fact GPU based training and even lightGBM etc relies on histogram based techniques for faster training and subsequently iterations/experiments which matters a lot in time constrained kaggle type competitions. hist may cut training time to half or less and gpu_hist on gpu may take it to minutes.
PS: I would suggest to reduce the dimensionality of your data (16k X 180k) by removing correlated/rank-correlated features which will further improve not only your training time but also model performance.
Related
I am already using Google Colab to train my model. So I will not use my own GPU for training. I want to ask, is there a performance difference beetween GPU and CPU while working with pre-trained model. I already trained a model with Google Colab GPU and used with my own local CPU. Should I use GPU for testing?
It depends how many predictions you need to do. Usually in training you are making many calculations therefore parallelisation by GPU shortens overall training time. Usually, when using a trained model you just need to do a sparse prediction per time unit. In such situation CPU approach should be OK. However, if you need to do as many predictions as during training then GPU would be beneficial. This can particularly be true with reinforcement training, when your model must adopt to continuously changing environmental input.
I have an implementation of a GRU based network in PyTorch, which I train using a 4 GB GPU present in my laptop, and obviously it takes a lot of time (4+ hrs for 1 epoch). I am looking for ideas/leads on how I can move this deep-learning model to train on a couple of spark clusters instead.
So far, I have only come across this GitHub library called SparkTorch, which unfortunately has limited documentation and the examples provided are way too trivial.
https://github.com/dmmiller612/sparktorch
To summarize, I am looking for answers to the following two questions:
Is it a good idea to train a deep learning model on spark clusters, since I read at places that the communication overhead undermines the gains in training speed
How to convert the PyTorch model (and the underlying dataset) in order to perform a distributed training across the worker nodes.
Any leads appreciated.
My Keras model is currently looking at a lot of data and I personally don't feel comfortable letting my GPU reach 85 degrees...is there a way during a set amount epochs to tell my GPU to take a break?
I understand I could just break down the process into multiple training cycles but because I am using a ReduceLROnPlateau in my callback on an RNN model I would still like the entire training process to be done in one training cycle with the GPU taking small breaks to allow for longer training times with less risk to my personal hardware.
(Not adding code due to this just being a general question.)
I have installed Tensorflow cpu version.I have only few images as dataset and I am training on a machine with 4GB ram and Core i5 3340m 2.70GHZ with batch size 1 and it is still extremely slow.the size of all images is same (200X185 i think).Will it train like this ? kindly tell me how can I speed up this process?
Training porcess
If your network is deep, it could take a long time to train your network using CPU as it is not optimized like GPU for calculations.
I would suggest you to get a graphic card, even a old version of graphic card can significantly improve the performance (it could be like 100x faster).
Let's put some numbers here. You are dealing with images with a size of 200x185. Do you realize we are talking about 37000 features right? If we deal with gray levels. If we deal with RGB multiply that by 3. How many images are you using for training? Keep also in mind that SGD (Stochastic Gradient Descent, mini-batch size = 1) tend to be very slow for big datasets... Give us some numbers. How many training images and what is "slow". How much time for one epoch. Something else: programming languages, library (tensorflow, etc.), optimizer, etc. would help us in judging if your code is "slow" and can it be made faster.
batch size is another param affect training time: higher size will help reduce time each epoch, but will require more epoch to have the same effiency like size=1
And if your network is deep (using CNN, etc), you should run on GPU
For learning purposes, I am trying to implement a CNN from scratch, but the results do not seem to improve from random guessing. I know this is not the best approach on home hardware, and following course.fast.ai I have obtained much better results via transfer learning, but for a deeper understanding I would like to see, at least in theory, how one could do it otherwise.
Testing on CIFAR-10 posed no issues - a small CNN trained from scratch in a matter of minutes with an error of less than 0.5%.
However, when trying to test against the Cats vs. Dogs Kaggle dataset, the results did not bulge from 50% accuracy. The architecture is basically a copy of AlexNet, including the non-state-of-the-art choices (large filters, histogram equalization, Nesterov-SGD optimizer). For more details, I put the code in a notebook on GitHub:
https://github.com/mspinaci/deep-learning-examples/blob/master/dogs_vs_cats_with_AlexNet.ipynb
(I also tried different architectures, more VGG-like and using Adam optimizer, but the result was the same; the reason why I followed the structure above was to match as closely as possible the Caffe procedure described here:
https://github.com/adilmoujahid/deeplearning-cats-dogs-tutorial
and that seems to converge quickly enough, according to the author's description here: http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/).
I was expecting some fitting to happen quickly, possibly flattening out due to the many suboptimal choices made (e.g. small dataset, no data augmentation). Instead, I saw no increment at all, as the notebook shows.
So I thought that maybe I was simply overestimating my GPU and patience, and that the model was too complicated even to overfit my data in a few hours (I ran 70 epochs, each time roughly 360 batches of 64 images). Therefore I tried to overfit as hard as I could, running these other models:
https://github.com/mspinaci/deep-learning-examples/blob/master/Ridiculously%20overfitting%20models...%20or%20maybe%20not.ipynb
The purely linear model started showing some overfit - around 53.5% training accuracy vs 52% validation accuracy (which I guess is thus my best result). That followed my expectations. However, to try and overfit as hard as I could, the second model is a simple 2 layers feedforward neural network, without any regularization, that I trained on just 2000 images with batch size up to 500. I was expecting the NN to overfit wildly, quickly getting to 100% train accuracy (after all it has 77M parameters for 2k pictures!). Instead, nothing happened, and the accuracy flattened to 50% quickly enough.
Any tip about why none of the "multi-layer" models seems able to pick any feature (be it "true" or out of overfitting) would be very much appreciated!
Note on versions etc: the notebooks were run on Python 2.7, Keras 2.0.8, Theano 0.9.0. The OS is Windows 10, and the GPU is a not-so-powerful, but that should be sufficient for basic tasks, GeForce GTX 960M.