I am working on building an object detection model which I would like to create with 22 new classes (most of them are not in COCO or PETS datasets)
What I've already done is:
Prepared images with multiple labels using LabelIMG.
Decrease image size in 2 for images that are bigger than 500k
Convert XML to CSV file
Convert CSV and images to TFRecord
Using the Tensorflow sample config files I've trained with several pretrained checkpoints.
Results: SSD_Mobilenet and SSD_Inception resulted in no classes
found(loss ~10.0) while faster RCNN Inception did succeed to detect some of the
objects(loss ~0.7).
My questions are:
What is the difference between train.py from Object detection, which I used in the above, to retrain.py from image_retraining to train_image_classifier.py from Slim?
Which is better for my task? Or should I do it in a different way?
While running the train.py on FRCNN inception I found that the loss was around 0.7 and not going lower even after 100k steps. Is there any goal in terms of loss to achieve?
How do you suggest to change the config file to improve this?
I found other models for instance Inception V4 etc... which doesn't have sample config files - TF slim. Should I try them and if so how can I use them?
I am pretty new in this field and I need some support in understanding the terms and actions.
BTW: I am using GTX 1060 (GPU) for training but eval is not working in parallel so I can't get the mAP for validation. I tried to force eval for CPU but with no success.
Thanks.
1) What is the difference between train.py from Object detection, which I used in the above, to retrain.py from image_retraining to train_image_classifier.py from Slim
Ans : To what i know, none. Because train.py imports trainer.py which imports slim.learning.train(the same class which is used in train_image_classifier.py) to train.
2) Which is better for my task? Or should I do it in a different way?
Ans: The above answer answers this question too.
3) While running the train.py on FRCNN inception I found that the loss was around 0.7 and not going lower even after 100k steps. Is there any goal in terms of loss to achieve?
Ans: If you use tensorboard to visualize your results, you will find that when your classification loss graph is not changing a lot(has converged), your model is trained. Regarding the loss of 0.7, thats high after training for so many steps. Just check your pipeline config file parameters.
4) How you suggest to change the config file to improve this?
Ans: Learning rate value can be a good start
5) I found other models for instance Inception V4 etc... which doesn't have sample config files - TF slim ? Should I try them and if som how can I use them?
Ans: currently, I dont have an answer for this. but will get back to you.
(Not a complete answer, but I hope it helps in some way!)
Are your annotated objects small relative to the image size?
I had the same problems with no or few detections with SSD and found that model is very sensitive to the setting which determines the size of the box proposals (anchor generator). Here is a link with some details
Further, having an active eval job running is very important when debugging and tuning a model. TotalLoss or any of the parameters returned from the train job does not inform you of the performance of the actual model, only whether it is converging. The eval job gives you e.g. mAP which is a real measure of performance.
A simple way to force an eval job on cpu is by doing the following:
a) install a virtual environment dedicated for the eval job, instructions here
b) activate the virtual environment and install tensorflow cpu in the virtual environment (yes, you should install tensorflow again, and without gpu support)
c) start the train job as usual on your tensorflow-gpu (in whatever way you have installed it)
d) run the eval job in the virtual environment (this will force it to run on the cpu and works great! I also run tensorboard from this installation to minimise risk of interference with the train job)
Retrain is used to add a level in top of pretrained model... You can win time like this.. Useful for thousand of picture, useless for million labelised picture... Less efficient than train from skratch. There is template for config file. If thereis not config file create your own.. Look at tensorflow github explainations...
Related
I am using the TensorFlow Object Detection API for retraining a COCO-pretrained Faster RCNN Inception v2 model on my custom dataset and recently noticed that several of my models BoxClassifierLoss get worse over the duration of the training (from e.g. 0.17 loss up to 0.38 and after 100 epochs down to 0.24 (thereafter getting worse again or fluctuating without improvement)).
Therefore I am interested in freezing the BoxClassifier to preserve the initial weights that apparently work better.
I read that there is a 'freeze_variables' parameter in the train.proto, but I am unsure as to what variables to freeze exactly.
Best to my understanding, Vinod's answer is not related to the question asked.
If you want to freeze your model to export it, then you can use export_inference_graph.
But I understand that what you wish is to freeze variables during training.
As you mentioned yourself, you can specify variables in update_trainable_variables or freeze_variables in order to choose which variables will be trained and which will not.
Essentially these are fed to the filter_variables function on your graph in order to choose the variables to include and exclude from training. As can be seen from the description, it expects a pattern using a regular expression. In order to know your variables' names, to include or exclude them - you can inspect your graph. One way to do so, is by using TensorBoard, Graph tab.
On the other hand, I wish to say that this might not be the solution in your case. At the beginning of a training session it is natural to expect high loss or loss increase. However, if after a full training session, the loss fluctuates - then you should inspect the magnitude of the fluctuation. If it's a minor fluctuation, it's natural, if the magnitude is large - then maybe something is wrong in the training configuration. Further analysis of what is going wrong can only be done with more information, e.g. config file, loss graph, data examples, etc.
You can freeze model.ckpt meta (checkpoint files) files which are stored in following location:
C:\tensorflow1\models\research\object_detection\training
These checkpoint files are stored frequently during training, so you can check the detail of this file when your error reduces then freeze the same checkpoint to your final model.
For freezing the model, you can use following command:
python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/faster_rcnn_inception_v2_pets.config --trained_checkpoint_prefix training/model.ckpt-XXXX --output_directory inference_graph
Where, XXXX is the number in file name model.ckpt-XXXX.meta.
In my case it is model.ckpt-1970.meta, XXXX = 1970.
Checkout my folder structure in the following image.
I'm trying to train my yolo model to identify fire extinguishers and to label it as "Fire Safety". Currently is either I get a overfit or underfit images(see below).
My sample images size with annotations is around ~1500
yolo-new.cfg config of width=608 and height=608
And I have trained using the following command:
python flow --model cfg/yolo-new.cfg --labels one_label.txt --train
--trainer adam --dataset "C://Users//G//Desktop//Development//ML//YOLO//BBox-Label-Tool//Images//002"
--annotation "C://Users//G//Desktop//Development//ML//YOLO//BBox-Label-Tool//AnnotationsXML//002"
--batch 4 --gpu 0.8
So after 13000 steps:
So I went to validate my results and this is what I get(Checkpoint 13000):
So perhaps I thought this might be a case of severe overfitting, thus I iterate through the checkpoints to see which has the closest fit.
This is what I get using checkpoint 6500
This is what I get using checkpoint 6000
This is what I get using checkpoint 5500
So, as you can see checkpoint 6000 is the best possible result in my case but it isn't good enough. How do I improve on this? Increase batch size ?(My GPU 1070Ti cant handle. Cuda out of memory occurs) Any Ideas to solve this?
Using Yolov3 to train my imageset solved my issue. https://github.com/AlexeyAB/darknet
One thing to note is not to leave any blanks during annotation, perhaps this might be one of the reasons why the detection did not work as planned.
Hi I have some problem about Keras with python 3.6
My enviroment is keras with Python and Only CPU.
but the problem is when I iterate same Keras model for predict some diferrent input, its getting slower and slower..
my code is so simple just like that
for i in range(100):
model.predict(x)
the First run is fast. it takes 2 seconds may be. but second run takes 3 seconds and Third takes 5 seconds... its getting slower and slower even if I use same input.
what can I iterate predict keras model hold fast? I don't want any getting slower.. it will be very critical.
How can I Fix IT??
Try using the __call__ method directly. The documentation of the predict method states the following:
For small numbers of inputs that fit in one batch, directly use __call__() for faster execution, e.g., model(x).
I see the performance is critical in this case. So, if it doesn't help, you could use OpenVINO which is optimized for Intel hardware but it should work with any CPU. Your performance should be much better than using Keras directly.
It's rather straightforward to convert the Keras model to OpenVINO. The full tutorial on how to do it can be found here. Some snippets below.
Install OpenVINO
The easiest way to do it is using PIP. Alternatively, you can use this tool to find the best way in your case.
pip install openvino-dev[tensorflow2]
Save your model as SavedModel
OpenVINO is not able to convert HDF5 model, so you have to save it as SavedModel first.
import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')
Use Model Optimizer to convert SavedModel model
The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. It converts the Tensorflow model to IR, which is a default format for OpenVINO. You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (change data_type). Run in the command line:
mo --saved_model_dir "model" --data_type FP32 --output_dir "model_ir"
Run the inference
The converted model can be loaded by the runtime and compiled for a specific device e.g. CPU or GPU (integrated into your CPU like Intel HD Graphics). If you don't know what is the best choice for you, use AUTO.
# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")
# Get output layer
output_layer_ir = compiled_model_ir.output(0)
# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]
Disclaimer: I work on OpenVINO.
If your model calls the fit function in batches, there are different samples in the same batch with slightly different times over the course of the iteration, and then you try again and again to get more and more groups of predictive model performance time will be longer and longer.
My goal is to decrease the size and complexity of a pre-trained Model (a Tensorflow Frozen Graph as Protobuf .pb file) as far as possible to make Inference (in my case realtime object detection using Webcams) as fast as possible.
(See my project repo for more information: https://github.com/GustavZ/realtime_object_detection)
Let's take a look at the pre-trained ssd_mobilenet_v1_coco provided by the tensorflow object detection API:
link to ssd_mobilenet graph
Which Layers are not necessary for inference (so only for the already completed training) and thus can be removed (from the config file to export a new frozen model using the export_inference_graph.py script)?
It would be very nice to get a general answer on how to optimize Models for inference as well as an answer on my special case as I think this could be of interest for others.
EDIT: I know now about the optimize_for_inference.py script provided py tensorflow. But i have no experience using it, for example how do i know which are the really necessary Input and Output nodes, or how do i read them from tensorboard?
I am writing a neural network in tensorflow and I want to be able to export my final trained network and import it in another program to play a game. I have found multiple forum posts like:
Tensorflow: How to use a trained model in a application?
Tensorflow: how to save/restore a model?
I also saw in the tf documentations they were using estimators to save the model but I am not sure if that is what I'm looking for and how to apply it.
But those talk about exporting the entire session and importing it into the application and using Session.run, but as I understand it that requires an input of the predicted output and will run another training step on my network. I don't want to continue training my network - it's finished - I now want to evaluate a specific state given to me by the game only.
Thanks in advance for any help available.
As I know, there are 2 way of doing it.
checkpoint files(metagraph)
savedmodel
savedmodel is very convenient, but study curve is higher than checkpoint file. you can check this tutorial
and import model is not continue run the training, it is basically restore all the variable you learned.