Although it is stated in the slim model that train_image_classifier.py can be used to train models from scratch, I found it hard in practice. In my case, I am trying to train ResNet from scratch on a local machine with 6xK80s. I used this:
DATASET_DIR=/nv/hmart1/ashaban6/scratch/data/imagenet_RF_record
TRAIN_DIR=/nv/hmart1/ashaban6/scratch/train_dir
DEPTH=50
NUM_CLONES=8
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7,8" python train_image_classifier.py --train_dir=${TRAIN_DIR} --dataset_name=imagenet --model_name=resnet_v1_${DEPTH} --max_number_of_steps=100000000 --batch_size=32 --learning_rate=0.1 --learning_rate_decay_type=exponential --dataset_split_name=train --dataset_dir=${DATASET_DIR} --optimizer=momentum --momentum=0.9 --learning_rate_decay_factor=0.1 --num_epochs_per_decay=30 --weight_decay=0.0001 --num_readers=12 --num_clones=$NUM_CLONES
I followed the same settings as is suggested in the paper. I am using 8 GPUs on a local machine with batch_size 32 so the effective batch size is 32x8=256. Learning rate is initially set to 0.1 and will be decayed by 10 every 30 epochs. After 70K steps (70000x256/1.2e6 ~ 15 epochs), the top-1 performance on the validation set is as low as ~14% while it should be around 50% after that many iterations. I used this command to get the top-1 performance:
DATASET_DIR=/nv/hmart1/ashaban6/scratch/data/imagenet_RF_record
CHECKPOINT_FILE=/nv/hmart1/ashaban6/scratch/train_dir/
DEPTH=50
CUDA_VISIBLE_DEVICES="10" python eval_image_classifier.py --alsologtostderr --checkpoint_path=${CHECKPOINT_FILE} --dataset_dir=${DATASET_DIR} --dataset_name=imagenet --dataset_split_name=validation --model_name=resnet_v1_${DEPTH}
With the lack of working examples it is hard to say if there is a bug in the slim training code or a problem in my script. It anything wrong in my script? Has anyone successfully trained the resent from scratch?
Related
I’m training DeepSpeech from scratch (without checkpoint) with a language model generated using KenLM as stated in its doc. The dataset is a Common Voice dataset for Persian language.
My configurations are as follows:
Batch size = 2 (due to cuda OOM)
Learning rate = 0.0001
Num. neurons = 2048
Num. epochs = 50
Train set size = 7500
Test and Dev sets size = 5000
dropout for layers 1 to 5 = 0.2 (also 0.4 is experimented, same results)
Train and val losses decreases through the training process but after a few epochs val loss does not decrease anymore. Train loss is about 18 and val loss is about 40.
The predictions are all empty strings at the end of the process. Any ideas how to improve the model?
The Persian dataset in Common Voice has around 280 hours of validated audio, so this should be enough to create a model that has better accuracy than you're reporting.
What would help here is to know what the CER and WER figures are for the model? Being able to see these indicates whether the best course of action lies with the hyperparameters of the acoustic model or with the KenLM language model. The difference is explained here in the testing section of the DeepSpeech PlayBook.
It is also likely you would need to perform transfer learning on the Persian dataset. I am assuming that the Persian dataset is written in Alefbā-ye Fārsi. This means that you need to drop the alphabet layer in order to learn from the English checkpoints (which use Latin script).
More information on how to perform transfer learning is in the DeepSpeech documentation, but essentially, you need to do two things:
Use the --drop_source_layers 3 flag to drop the source layers, to allow for transfer learning from another alphabet
Use the --load_checkpoint_dir deepspeech-data/deepspeech-0.9.3-checkpoint flag to specify where to load checkpoints from on which to perform transfer learning.
maybe you need to decrease learning rate or use a learning rate scheduler.
I'm trying to train my yolo model to identify fire extinguishers and to label it as "Fire Safety". Currently is either I get a overfit or underfit images(see below).
My sample images size with annotations is around ~1500
yolo-new.cfg config of width=608 and height=608
And I have trained using the following command:
python flow --model cfg/yolo-new.cfg --labels one_label.txt --train
--trainer adam --dataset "C://Users//G//Desktop//Development//ML//YOLO//BBox-Label-Tool//Images//002"
--annotation "C://Users//G//Desktop//Development//ML//YOLO//BBox-Label-Tool//AnnotationsXML//002"
--batch 4 --gpu 0.8
So after 13000 steps:
So I went to validate my results and this is what I get(Checkpoint 13000):
So perhaps I thought this might be a case of severe overfitting, thus I iterate through the checkpoints to see which has the closest fit.
This is what I get using checkpoint 6500
This is what I get using checkpoint 6000
This is what I get using checkpoint 5500
So, as you can see checkpoint 6000 is the best possible result in my case but it isn't good enough. How do I improve on this? Increase batch size ?(My GPU 1070Ti cant handle. Cuda out of memory occurs) Any Ideas to solve this?
Using Yolov3 to train my imageset solved my issue. https://github.com/AlexeyAB/darknet
One thing to note is not to leave any blanks during annotation, perhaps this might be one of the reasons why the detection did not work as planned.
I'm running a keras model for binary classification on two separate computers; the first is running python 2.7.5 with Tensorflow 1.0.1 and keras 2.0.2 and cpu computations; the second is running python 2.7.5 with Tensorflow 1.2.1 and keras 2.0.6 and uses the gpu.
My model is modified from the siamese network model at https://gist.github.com/mmmikael/0a3d4fae965bdbec1f9d. I added regularization (activity_regularizer=keras.regularizers.l1 in the Dense layers), but for everything else I'm using the same structure as the mnist example.
I use the exact same code and training data on both computers, but the first one gives me a classification accuracy 86% and recall 88% on the test set, and the other gives me accuracy 52% and recall 100% (it classifies every test sample as "positive"). These results are reproducible with separate initializations.
I'm not even sure where to start looking for why the performance is so vastly different. I've been reading through the keras/tensorflow release notes to see if any of the changes pertain to something in my model, but I don't see anything that looks helpful. It doesn't make sense that a version change in tensorflow/keras would cause that much of a difference in the performance. Any sort of help figuring this out would be greatly appreciated.
I am trying to use ssd_inception_v2_coco pre-trained model from Tensorflow API by training it with a single class dataset and also applying Transfer Learning. I trained the net for around 20k steps(total loss around 1) and using the checkpoint data, I created the inference_graph.pb and used it in the detection code.
To my surprise, when I tested the net with the training data the graph is not able to detect even 1 out of 11 cases (0/11). I am lost in finding the issue.
What might be the possibile mistake?.
P.S : I am not able to run train.py and eval.py at the same time, due to memory issues. So, I don't have info about precision from tensorboard
Has anyone faced similar kind of issue?
For learning purposes, I am trying to implement a CNN from scratch, but the results do not seem to improve from random guessing. I know this is not the best approach on home hardware, and following course.fast.ai I have obtained much better results via transfer learning, but for a deeper understanding I would like to see, at least in theory, how one could do it otherwise.
Testing on CIFAR-10 posed no issues - a small CNN trained from scratch in a matter of minutes with an error of less than 0.5%.
However, when trying to test against the Cats vs. Dogs Kaggle dataset, the results did not bulge from 50% accuracy. The architecture is basically a copy of AlexNet, including the non-state-of-the-art choices (large filters, histogram equalization, Nesterov-SGD optimizer). For more details, I put the code in a notebook on GitHub:
https://github.com/mspinaci/deep-learning-examples/blob/master/dogs_vs_cats_with_AlexNet.ipynb
(I also tried different architectures, more VGG-like and using Adam optimizer, but the result was the same; the reason why I followed the structure above was to match as closely as possible the Caffe procedure described here:
https://github.com/adilmoujahid/deeplearning-cats-dogs-tutorial
and that seems to converge quickly enough, according to the author's description here: http://adilmoujahid.com/posts/2016/06/introduction-deep-learning-python-caffe/).
I was expecting some fitting to happen quickly, possibly flattening out due to the many suboptimal choices made (e.g. small dataset, no data augmentation). Instead, I saw no increment at all, as the notebook shows.
So I thought that maybe I was simply overestimating my GPU and patience, and that the model was too complicated even to overfit my data in a few hours (I ran 70 epochs, each time roughly 360 batches of 64 images). Therefore I tried to overfit as hard as I could, running these other models:
https://github.com/mspinaci/deep-learning-examples/blob/master/Ridiculously%20overfitting%20models...%20or%20maybe%20not.ipynb
The purely linear model started showing some overfit - around 53.5% training accuracy vs 52% validation accuracy (which I guess is thus my best result). That followed my expectations. However, to try and overfit as hard as I could, the second model is a simple 2 layers feedforward neural network, without any regularization, that I trained on just 2000 images with batch size up to 500. I was expecting the NN to overfit wildly, quickly getting to 100% train accuracy (after all it has 77M parameters for 2k pictures!). Instead, nothing happened, and the accuracy flattened to 50% quickly enough.
Any tip about why none of the "multi-layer" models seems able to pick any feature (be it "true" or out of overfitting) would be very much appreciated!
Note on versions etc: the notebooks were run on Python 2.7, Keras 2.0.8, Theano 0.9.0. The OS is Windows 10, and the GPU is a not-so-powerful, but that should be sufficient for basic tasks, GeForce GTX 960M.