I want to use transfer learning to classify image. That my first try using transfer learning. I curently use VGG16 model. Since my data are very different from image used for the original training model, theory told me I should train many layers, potentialy including hidden layers.
My computer has 8GO ram, using i5 2.40 Hz no gpu. My data set is small (3000 images), but data are stored as matrix in python memory, not saved in a folder. Almost all my RAM is takent by those images
Original VGG16 model has 130 million parameters. If I only take weight of hiden layer, and create 2 new (and small, size 512 and 256) fully connected layer at the end, I still have 15M parameter to train, for a total of 30m parameter.
I actually use image size of 224*224 like vgg16 input
My computer need 1H30 for 1 epoch. At 10 epoch I have a bad accuracy (50% vs 90% with conv net from scratch).
My question:
computer crash after X epoch, I don't know why. Could it be RAM problem? Since when vgg started to train for 1 epoch, and other epoch are just weight adjustement, other epoch should not impact memory?
Should I unfreeze input layer to use image of reduced dimension to reduce memory problem and training time? It'll not affect too much conv net performance?
Is it normal to need 1h30 to compute 1 epoch with 15M trainable parameter? Since I still need to find optimal number of layer to unfreeze, shape of new fully connected layer, learning rate, otpimizer... it's look impossible to me to optimise a transfer learning model with my curent commputing ressources in a decent amount of time
Do you have any tips to for transfer learning?
thanks
No specific tips for transfer learning, but if you are lacking on computing power, it might be helpful to consider transitioning to cloud resources. AWS, Google cloud, Azure or other services are available at really reasonable prices.
Most of them also provide some free resources, which can be enough for small ML projects or student tasks.
Notably:
Google colab provides a free GPU for a limited time
AWS provides ~ 250 hours of training per month on sagemaker
Azure notebooks also provides some free (but limited) computing power
Most of these services also provide free general compute power, on which you can also run ML tasks, but that might require some additional manual tweaking.
Related
Let’s say one saves a Tensorflow model created using large data and GPUs. If one wanted to then use the saved model to do a single prediction with one small piece of data, does one still need the huge computers that created the model?
I’m wondering more generally how the size and computing resources needed to generate a deep learning model relate to using the model to make predictions.
This is relevant because if one is using Google Cloud Compute it costs more money if one has to use the huge computers all the time. If one could just use the huge computers to train the model and then more modest ones to run their app that make predictions it would save a lot of money.
Resources needed for prediction depend on the model size - not on the training device.
If the model has 200 bln variables - you will not be able to run it on workstation (because you have not enough memory).
But you can use model with 10 mln variables with no problems even if it was trained on GPU or TPU.
Every variable takes 4 to 8 bytes. If you have 8 GB of memory - you will probably be able to run a model with hundreds million variables.
Prediction is fast (assuming you have enough memory). Resources needed to train model quickly. It is efficient to train on GPU/TPU even if your model is small.
I have an implementation of a GRU based network in PyTorch, which I train using a 4 GB GPU present in my laptop, and obviously it takes a lot of time (4+ hrs for 1 epoch). I am looking for ideas/leads on how I can move this deep-learning model to train on a couple of spark clusters instead.
So far, I have only come across this GitHub library called SparkTorch, which unfortunately has limited documentation and the examples provided are way too trivial.
https://github.com/dmmiller612/sparktorch
To summarize, I am looking for answers to the following two questions:
Is it a good idea to train a deep learning model on spark clusters, since I read at places that the communication overhead undermines the gains in training speed
How to convert the PyTorch model (and the underlying dataset) in order to perform a distributed training across the worker nodes.
Any leads appreciated.
First of all: this question is connected to neural network inference and not training.
I have discovered, that when doing inference of a trained neural network with only one image over and over on a GPU (e.g. P100) the utilization of the computing power with Tensorflow is not reaching 100%, but instead around 70%. This is also the case if the image does not have to be transferred to the GPU. Therefore, the issue has to be connected to constraints in the parallelization of the calculations. My best guesses for the reasons are:
Tensorflow can only utilize the parallelization capabilities of a GPU up to a certain level. (Also the higher utilization of the same model as a TensorRT models suggest that). In this case, the question is: What is the reason for that?
The inherent neural network structure with several subsequent layers avoids that a higher usage is possible. Therefore the problem is not overhead of a framework but lies in the general design of neural networks. In this case, the question is: What are the restrictions to that?
Both of the above combined.
Thanks for your ideas on the issue!
Why do you expect the GPU utilization to go to 100% when you run the neuronal network prediction for one image?
The GPU utilization is per time unit (e.g. 1 second). This means, when the neuronal network algorithm finished before this time unit elapsed (e.g within 0.5s) Then the rest of the time the GPU may get used by other programs or not get used at all. If the GPU is not used by any other programs neither then well you will not reach 100%.
I have installed Tensorflow cpu version.I have only few images as dataset and I am training on a machine with 4GB ram and Core i5 3340m 2.70GHZ with batch size 1 and it is still extremely slow.the size of all images is same (200X185 i think).Will it train like this ? kindly tell me how can I speed up this process?
Training porcess
If your network is deep, it could take a long time to train your network using CPU as it is not optimized like GPU for calculations.
I would suggest you to get a graphic card, even a old version of graphic card can significantly improve the performance (it could be like 100x faster).
Let's put some numbers here. You are dealing with images with a size of 200x185. Do you realize we are talking about 37000 features right? If we deal with gray levels. If we deal with RGB multiply that by 3. How many images are you using for training? Keep also in mind that SGD (Stochastic Gradient Descent, mini-batch size = 1) tend to be very slow for big datasets... Give us some numbers. How many training images and what is "slow". How much time for one epoch. Something else: programming languages, library (tensorflow, etc.), optimizer, etc. would help us in judging if your code is "slow" and can it be made faster.
batch size is another param affect training time: higher size will help reduce time each epoch, but will require more epoch to have the same effiency like size=1
And if your network is deep (using CNN, etc), you should run on GPU
For learning purposes, I am trying to implement a CNN from scratch, but the results do not seem to improve from random guessing. I know this is not the best approach on home hardware, and following course.fast.ai I have obtained much better results via transfer learning, but for a deeper understanding I would like to see, at least in theory, how one could do it otherwise.
Testing on CIFAR-10 posed no issues - a small CNN trained from scratch in a matter of minutes with an error of less than 0.5%.
However, when trying to test against the Cats vs. Dogs Kaggle dataset, the results did not bulge from 50% accuracy. The architecture is basically a copy of AlexNet, including the non-state-of-the-art choices (large filters, histogram equalization, Nesterov-SGD optimizer). For more details, I put the code in a notebook on GitHub:
https://github.com/mspinaci/deep-learning-examples/blob/master/dogs_vs_cats_with_AlexNet.ipynb
(I also tried different architectures, more VGG-like and using Adam optimizer, but the result was the same; the reason why I followed the structure above was to match as closely as possible the Caffe procedure described here:
https://github.com/adilmoujahid/deeplearning-cats-dogs-tutorial
and that seems to converge quickly enough, according to the author's description here: http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/).
I was expecting some fitting to happen quickly, possibly flattening out due to the many suboptimal choices made (e.g. small dataset, no data augmentation). Instead, I saw no increment at all, as the notebook shows.
So I thought that maybe I was simply overestimating my GPU and patience, and that the model was too complicated even to overfit my data in a few hours (I ran 70 epochs, each time roughly 360 batches of 64 images). Therefore I tried to overfit as hard as I could, running these other models:
https://github.com/mspinaci/deep-learning-examples/blob/master/Ridiculously%20overfitting%20models...%20or%20maybe%20not.ipynb
The purely linear model started showing some overfit - around 53.5% training accuracy vs 52% validation accuracy (which I guess is thus my best result). That followed my expectations. However, to try and overfit as hard as I could, the second model is a simple 2 layers feedforward neural network, without any regularization, that I trained on just 2000 images with batch size up to 500. I was expecting the NN to overfit wildly, quickly getting to 100% train accuracy (after all it has 77M parameters for 2k pictures!). Instead, nothing happened, and the accuracy flattened to 50% quickly enough.
Any tip about why none of the "multi-layer" models seems able to pick any feature (be it "true" or out of overfitting) would be very much appreciated!
Note on versions etc: the notebooks were run on Python 2.7, Keras 2.0.8, Theano 0.9.0. The OS is Windows 10, and the GPU is a not-so-powerful, but that should be sufficient for basic tasks, GeForce GTX 960M.