RuntimeError during processing .jpg file (exception 20) retrain inceptionV3 tensorflow - python

My system setup
OS: Ubuntu 16.04LTS
GPU: GTX1060
tensorflow version: tensorflow-gpu (1.6.0)
I am trying to retrain the inceptionV3 classifier model which I trained on MSCeleb-1M dataset using https://github.com/tensorflow/models/blob/master/research/slim/train_image_classifier.py.
then I tried to retrain using custom images and classes with https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/image_retraining/retrain.py.
I have noticed that the script is for an outdated inceptionV3 architecture hence I have modified the bottleneck tensor and input tensor names to match the nodes of my retrained inceptionV3 model. However when feeding own images into the retrain script, I keep hitting this error
INFO:tensorflow:Creating bottleneck at /home/m360/MachineLearning/models/msceleb-small-inception-v3/bottleneck/tulips/5524946579_307dc74476.jpg_inception_v3.txt
Traceback (most recent call last):
File "tensorflow/examples/image_retraining/retrain.py", line 1486, in <module>
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "tensorflow/examples/image_retraining/retrain.py", line 1187, in main
bottleneck_tensor, FLAGS.architecture)
File "tensorflow/examples/image_retraining/retrain.py", line 500, in cache_bottlenecks
resized_input_tensor, bottleneck_tensor, architecture)
File "tensorflow/examples/image_retraining/retrain.py", line 442, in get_or_create_bottleneck
bottleneck_tensor)
File "tensorflow/examples/image_retraining/retrain.py", line 397, in create_bottleneck_file
str(e)))
RuntimeError: Error during processing file /home/m360/MachineLearning/my_dataset/flower_photos/tulips/5524946579_307dc74476.jpg (20)
I do not understand where has gone wrong as there is no documentation about this specific exception code anywhere online so far I have searched. I am thinking it might be problems with the decode_jpeg function in the script but could not crack my head around it.
Please help to enlighten me. Thank you very much.

Related

Error while loading fine-tuned simpletransformer model in Docker Container

I am saving and loading a model using torch.save() and torch.load() commands.
While loading a fine-tuned simple transformer model in Docker Container, I am facing this error which I am not able to resolve:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.7/dist-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
File "/usr/local/lib/python3.7/dist-packages/transformers/models/xlm_roberta/tokenization_xlm_roberta.py", line 161, in __setstate__
self.sp_model.Load(self.vocab_file)
File "/usr/local/lib/python3.7/dist-packages/sentencepiece.py", line 367, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.7/dist-packages/sentencepiece.py", line 177, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "/home/jupyter/.cache/huggingface/transformers/9df9ae4442348b73950203b63d1b8ed2d18eba68921872aee0c3a9d05b9673c6.00628a9eeb8baf4080d44a0abe9fe8057893de20c7cb6e6423cddbf452f7d4d8": No such file or directory Error #2
If anyone has any idea about it, please let me know.
I am using:
torch ==1.7.1+cu101
sentence-transformers 0.3.9
simpletransformers 0.51.15
transformers 4.4.2
tensorflow 2.2.0
I suggest using state_dict objects - the Python dictionaries as they can be easily saved, updated and restored giving you a flexibility for restoring the model later. Here are the recommended Save/Load methods for saving models with state_dict:
Save
torch.save(model.state_dict(), PATH)
Load
model = TheModelClass(*args, **kwargs)
model.load_state_dict(torch.load(PATH))
model.eval()

from where to download resnet50.h5 file

I got the following error when trying to load a ResNet50 model. Where should I download the resnet50.h5 file?
Traceback (most recent call last):
File "C:\Users\drlng\Desktop\image-captioning-keras-resnet-main\app.py", line 61, in <module>
resnet = load_model('resnet.h5')
File "C:\Users\drlng\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\keras\saving\save.py", line 211, in load_model
loader_impl.parse_saved_model(filepath)
File "C:\Users\drlng\AppData\Local\Programs\Python\Python38\lib\site-packages\tensorflow\python\saved_model\loader_impl.py", line 111, in parse_saved_model
raise IOError("SavedModel file does not exist at: %s/{%s|%s}" %
OSError: SavedModel file does not exist at: resnet.h5/{saved_model.pbtxt|saved_model.pb}
I use resnet50.py for making my model
and read weight of resnet50 from below link:
weights best!
you can download pre train models
It works well
If you are looking for pre-trained weights of ResNet-50, you can find it here

Parallel Keras model training using python mutliprocessing

I am training on a 64 core CPU workstation multiple Keras MLP models simultaneously.
Therefore I am using the Python multiprocessing pool to allocate for each CPU one model being trained.
For the model being trained I am using an Early Stopping and Model checkpoint callback defined in this manner:
es = EarlyStopping(monitor='val_mse', mode='min', verbose=VERBOSE_ALL, patience=10)
mc = ModelCheckpoint('best_model.h5', monitor='val_mse', mode='min', verbose=VERBOSE_ALL, save_best_only=True)
Using a single model the training runs through without any problems.
When I start using the multiprocessing pool however, I end up having issues with the callbacks. A hdf5 model saving issue comes up:
Traceback (most recent call last):
File "C:\Users\ICN_admin\Anaconda3\lib\site-packages\tensorflow_core\python\keras\callbacks.py", line 1029, in _save_model
self.model.save(filepath, overwrite=True)
File "C:\Users\ICN_admin\Anaconda3\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1008, in save
signatures, options)
File "C:\Users\ICN_admin\Anaconda3\lib\site-packages\tensorflow_core\python\keras\saving\save.py", line 112, in save_model
model, filepath, overwrite, include_optimizer)
File "C:\Users\ICN_admin\Anaconda3\lib\site-packages\tensorflow_core\python\keras\saving\hdf5_format.py", line 92, in save_model_to_hdf5
f = h5py.File(filepath, mode='w')
File "C:\Users\ICN_admin\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 394, in __init__
swmr=swmr)
File "C:\Users\ICN_admin\Anaconda3\lib\site-packages\h5py\_hl\files.py", line 176, in make_fid
fid = h5f.create(name, h5f.ACC_TRUNC, fapl=fapl, fcpl=fcpl)
File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py\h5f.pyx", line 105, in h5py.h5f.create
OSError: Unable to create file (file signature not found)
This error comes more or less sporadically, and through exceptions I can catch it for repeating the model training.
But is there a way to work around this issue by setting flags or using a different callback file format?
Tensorflow version: 2.1.0
Keras version: 2.3.1
library include:
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.callbacks import ModelCheckpoint

How can I convert Tensorflow frozen graph to TF Lite model?

I am using Faster RCNN, repo that I am using can be found in the link, to detect cars in a video frame. I used Keras 2.2.3 and Tensorflow 1.15.0. I want to deploy and run it on my Android device. Each part in Faster RCNN is implemented in Keras and in order to deploy it on Android I want to convert them to TF Lite model. The final network, the classifier, has a custom layer which is called RoiPoolingConv and I cannot convert the final network to TF Lite. At first, I have tried the following
converter = tf.lite.TFLiteConverter.from_keras_model_file('model_classifier_with_architecture.h5',
custom_objects={"RoiPoolingConv": RoiPoolingConv})
tfmodel = converter.convert()
open ("model_cls.tflite" , "wb") .write(tfmodel)
This gives the following error
Traceback (most recent call last):
File "Keras-FasterRCNN/model_to_tflite.py", line 26, in <module>
custom_objects={"RoiPoolingConv": RoiPoolingConv})
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/lite/python/lite.py", line 747, in from_keras_model_file
keras_model = _keras.models.load_model(model_file, custom_objects)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 146, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 212, in load_model_from_hdf5
custom_objects=custom_objects)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/saving/model_config.py", line 55, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/serialization.py", line 89, in deserialize
printable_module_name='layer')
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 192, in deserialize_keras_object
list(custom_objects.items())))
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1131, in from_config
process_node(layer, node_data)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 1089, in process_node
layer(input_tensors, **kwargs)
File "/home/alp/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 475, in __call__
previous_mask = _collect_previous_mask(inputs)
File "/home/alp/.local/lib/python3.6/site-packages/keras/engine/base_layer.py", line 1441, in _collect_previous_mask
mask = node.output_masks[tensor_index]
AttributeError: 'Node' object has no attribute 'output_masks'
As a workaround I tried was to convert Keras models to Tensorflow frozen graph and then do the TF Lite conversion on these frozen graphs. This yields the following error
Traceback (most recent call last):
File "/home/alp/.local/bin/toco_from_protos", line 11, in <module>
sys.exit(main())
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 59, in main
app.run(main=execute, argv=[sys.argv[0]] + unparsed)
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/alp/.local/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/alp/.local/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/home/alp/.local/lib/python3.6/site-packages/tensorflow/lite/toco/python/toco_from_protos.py", line 33, in execute
output_str = tensorflow_wrap_toco.TocoConvert(model_str, toco_str, input_str)
Exception: We are continually in the process of adding support to TensorFlow Lite for more ops. It would be helpful if you could inform us of how this conversion went by opening a github issue at https://github.com/tensorflow/tensorflow/issues/new?template=40-tflite-op-request.md
and pasting the following:
Some of the operators in the model are not supported by the standard TensorFlow Lite runtime. If those are native TensorFlow operators, you might be able to use the extended runtime by passing --enable_select_tf_ops, or by setting target_ops=TFLITE_BUILTINS,SELECT_TF_OPS when calling tf.lite.TFLiteConverter(). Otherwise, if you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: ADD, CAST, CONCATENATION, CONV_2D, DEPTHWISE_CONV_2D, FULLY_CONNECTED, MUL, PACK, RESHAPE, RESIZE_BILINEAR, SOFTMAX, STRIDED_SLICE. Here is a list of operators for which you will need custom implementations: AddV2.
Is there a way to achieve the conversion of model with custom layer to TF Lite model?

Trained Keras Model fails to load with load_model

I have trained a Keras model with Tensorflow backend. It was saved with model.save. I now want to reload the model using model_load, however, I get the following error:
Traceback (most recent call last):
File "<ipython-input-235-387752c910a4>", line 1, in <module>
load_model('MyModel.h5')
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\models.py", line 243, in load_model
model = model_from_config(model_config, custom_objects=custom_objects)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\models.py", line 317, in model_from_config
return layer_module.deserialize(config, custom_objects=custom_objects)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
printable_module_name='layer')
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\utils\generic_utils.py", line 144, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\engine\topology.py", line 2514, in from_config
process_layer(layer_data)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\engine\topology.py", line 2500, in process_layer
custom_objects=custom_objects)
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\layers\__init__.py", line 55, in deserialize
printable_module_name='layer')
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\utils\generic_utils.py", line 144, in deserialize_keras_object
list(custom_objects.items())))
File "C:\Anaconda\envs\tensorflow\lib\site-packages\keras\models.py", line 1367, in from_config
if 'class_name' not in config[0] or config[0]['class_name'] == 'Merge':
KeyError: 0
From what I read, there seems to be a bug in Keras when a model that was trained with an older version of Keras is loaded with a recent version. So there might be a version mismatch. However, I couldn't find a report that corresponds to my situation. Downgrading Keras or retraining is not an option.
Did anyone have this issue and maybe even found a solution? I would appreciate it a lot!
Thanks!
For future reference: It is an issue in the config files. Keras 2.2.4 has a fix for this:
Keras 2.2.4
#fchollet fchollet released this on Oct 3 ยท 79 commits to master since this release
Assets 2
This is a bugfix release, addressing two issues:
Ability to save a model when a file with the same name already exists.
Issue with loading legacy config files for the Sequential model.
So I ended up creating a new virtual environment with the most recent TF and Keras versions.

Categories