I'm trying to traing Mobilenet to recognize custom objects.
I'm following this guide:
https://towardsdatascience.com/how-to-train-your-own-object-detector-with-tensorflows-object-detector-api-bec72ecfe1d9
and using a checkpoint and pipeline.config from here:
ssdlite_mobilenet_v2_coco
The Problem
When I start traing with the following command:
python object_detection/model_main.py \
--pipeline_config_path=C:\t\models\pipeline.config \
--model_dir=C:\t\models\ \
--num_train_steps=50000 \
--alsologtostderr
I get the following:
C:\tensorflow\models-master\research>path=C:\t\models\pipeline.config \ --model_dir=C:\t\models\ \ --num_train_steps=50000 \ --alsologtostderr
WARNING:tensorflow:Estimator's model_fn (<function create_model_fn.<locals>.model_fn at 0x0000013B6CD26C80>) includes params argument, but params are not pa
ssed to Estimator.
Traceback (most recent call last):
File "object_detection/model_main.py", line 101, in <module>
tf.app.run()
File "C:\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 125, in run
_sys.exit(main(argv))
File "object_detection/model_main.py", line 97, in main
tf.estimator.train_and_evaluate(estimator, train_spec, eval_specs[0])
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\training.py", line 447, in train_and_evaluate
return executor.run()
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\training.py", line 531, in run
return self.run_local()
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\training.py", line 681, in run_local
eval_result, export_results = evaluator.evaluate_and_export()
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\training.py", line 886, in evaluate_and_export
hooks=self._eval_spec.hooks)
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 453, in evaluate
input_fn, hooks, checkpoint_path)
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1346, in _evaluate_build_graph
model_fn_lib.ModeKeys.EVAL))
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 985, in _get_features_and_labels_from_input_fn
result = self._call_input_fn(input_fn, mode)
File "C:\Python36\lib\site-packages\tensorflow\python\estimator\estimator.py", line 1074, in _call_input_fn
return input_fn(**kwargs)
File "C:\Python36\lib\site-packages\object_detection\inputs.py", line 493, in _eval_input_fn
transform_input_data_fn=transform_and_pad_input_data_fn)
File "C:\Python36\lib\site-packages\object_detection\builders\dataset_builder.py", line 150, in build
raise ValueError('Unsupported input_reader_config.')
ValueError: Unsupported input_reader_config.
A comment in "dataset_builder.py" says:
Raises:
ValueError: On invalid input reader proto.
ValueError: If no input paths are specified.
Question:
Is it a problem with pipeline.config file?
Does it mean that "dataset_builder.py" can't read it?
OR
Shall I pass some additional input path as it stated in the comment?
If I remember correctly, cause of the problem was I didn't prepare test data. There was a training data only. So I prepared a list of test images with related XML files and generated test TFrecord.
Then the error has gone.
P.S.
I had a lot of other errors later but it's another story :)
Related
Does torch.save work on hugging face models (I am using vit)? I assumed yes.
My error:
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/serialization.py", line 379, in save
_save(obj, opened_zipfile, pickle_module, pickle_protocol)
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/serialization.py", line 499, in _save
zip_file.write_record(name, storage.data_ptr(), num_bytes)
OSError: [Errno 116] Stale file handle
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/shared/rsaas/miranda9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_dist_maml_l2l.py", line 1815, in <module>
main()
File "/shared/rsaas/miranda9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_dist_maml_l2l.py", line 1748, in main
train(args=args)
File "/shared/rsaas/miranda9/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_dist_maml_l2l.py", line 1795, in train
meta_train_iterations_ala_l2l(args, args.agent, args.opt, args.scheduler)
File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/training/meta_training.py", line 213, in meta_train_iterations_ala_l2l
log_train_val_stats(args, args.it, step_name, train_loss, train_acc, training=True)
File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/supervised_learning.py", line 55, in log_train_val_stats
_log_train_val_stats(args=args,
File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/logging_uu/wandb_logging/supervised_learning.py", line 113, in _log_train_val_stats
save_for_supervised_learning(args, ckpt_filename='ckpt.pt')
File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/checkpointing_uu/supervised_learning.py", line 54, in save_for_supervised_learning
torch.save({'training_mode': args.training_mode,
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/serialization.py", line 380, in save
return
File "/home/miranda9/miniconda3/envs/metalearning_gpu/lib/python3.9/site-packages/torch/serialization.py", line 259, in __exit__
self.file_like.write_end_of_file()
RuntimeError: [enforce fail at inline_container.cc:298] . unexpected pos 2736460544 vs 2736460432
my code:
# - ckpt
args_pickable: Namespace = uutils.make_args_pickable(args)
# note not saving any objects, to make sure checkpoint is loadable later with no problems
torch.save({'training_mode': args.training_mode,
'it': args.it,
'epoch_num': args.epoch_num,
# 'args': args_pickable, # some versions of this might not have args!
# decided only to save the dict version to avoid this ckpt not working, make it args when loading
'args_dict': vars(args_pickable), # some versions of this might not have args!
'model_state_dict': get_model_from_ddp(args.model).state_dict(),
'model_str': str(args.model), # added later, to make it easier to check what optimizer was used
'model_hps': args.model_hps,
'model_option': args.model_option,
'opt_state_dict': args.opt.state_dict(),
'opt_str': str(args.opt),
'opt_hps': args.opt_hps,
'opt_option': args.opt_option,
'scheduler_str': str(args.scheduler),
'scheduler_state_dict': try_to_get_scheduler_state_dict(args.scheduler),
'scheduler_hps': args.scheduler_hps,
'scheduler_option': args.scheduler_option,
},
pickle_module=pickle,
f=args.log_root / ckpt_filename)
if this is not the right way to checkpoint hugging face (HF) models, what is?
cross: hf discussion forum: https://discuss.huggingface.co/t/torch-save-with-hugging-face-models-fails/25034
I have written the code for an entity extraction model using bert but when I run the train.py file I get a value error.
This is the structure of my code with the configuration file in VSCode, I have downloaded bert models from here
Error
>> (myenv) PS D:\Transformers\bert-entity-extraction> python src/train.py
Configuration Complete!
Traceback (most recent call last):
File "src/train.py", line 83, in <module>
model = EntityModel(num_tag = num_tag, num_pos = num_pos)
File "D:\Transformers\bert-entity-extraction\src\model.py", line 25, in __init__
self.bert = transformers.BertModel.from_pretrained(config.BASE_MODEL_PATH)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\modeling_utils.py", line 1080, in from_pretrained
**kwargs,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 427, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\configuration_utils.py", line 492, in get_config_dict
user_agent=user_agent,
File "C:\Users\hp\anaconda3\envs\myenv\lib\site-packages\transformers\file_utils.py", line 1289, in cached_path
raise ValueError(f"unable to parse {url_or_filename} as a URL or as a local path")
ValueError: unable to parse D:\Transformers\bert-entity-extraction\input\bert-base-uncased_L-12_H-768_A-12\config.json as a URL or as a local path
How to fix this?
I am trying to use Huggingface transformer api to load a locally downloaded M-BERT model but it is throwing an exception.
I clone this repo: https://huggingface.co/bert-base-multilingual-cased
bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
The directory structure is:
But I am getting this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1277, in from_pretrained
missing_keys, unexpected_keys = load_tf_weights(model, resolved_archive_file, load_weight_prefix)
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 467, in load_tf_weights
with h5py.File(resolved_archive_file, "r") as f:
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 408, in __init__
swmr=swmr)
File "/usr/local/lib/python3.7/dist-packages/h5py/_hl/files.py", line 173, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 88, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 81, in <module>
__main__()
File "train.py", line 59, in __main__
model = create_model(num_classes)
File "/content/drive/My Drive/msc-project/code/model.py", line 26, in create_model
bert = TFBertModel.from_pretrained("input/bert-base-multilingual-cased")
File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_tf_utils.py", line 1280, in from_pretrained
"Unable to load weights from h5 file. "
OSError: Unable to load weights from h5 file. If you tried to load a TF 2.0 model from a PyTorch checkpoint, please set from_pt=True.
Where am I going wrong?
Need help!
Thanks in advance.
As it was already pointed in the comments - your from_pretrained param should be either id of a model hosted on huggingface.co or a local path:
A path to a directory containing model weights saved using
save_pretrained(), e.g., ./my_model_directory/.
See documentation
Looking at your stacktrace it seems like your code is run inside:
/content/drive/My Drive/msc-project/code/model.py so unless your model is in:
/content/drive/My Drive/msc-project/code/input/bert-base-multilingual-cased/ it won't load.
I would also set the path to be similar to documentation example ie:
bert = TFBertModel.from_pretrained("./input/bert-base-multilingual-cased/")
Tensorflow object detection API using cnn
Traceback (most recent call last):
File "export_inference_graph.py", line 147, in tf.app.run()
File "C:\Users\Ali Salar\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run_sys.exit(main(argv))
File "export_inference_graph.py", line 143, in main
FLAGS.output_directory, input_shape)
File "C:\tensorflow2\models\research\object_detection\exporter.py", line 453, in export_inference_graph
graph_hook_fn=None)
File "C:\tensorflow2\models\research\object_detection\exporter.py", line 421, in _export_inference_graph
placeholder_tensor, outputs)
File "C:\tensorflow2\models\research\object_detection\exporter.py", line 280, in write_saved_model
builder = tf.saved_model.builder.SavedModelBuilder(saved_model_path)
File "C:\Users\Ali Salar\Anaconda3\envs\tensorflow\lib\sit`enter code here`e-packages\tensorflow\python\saved_model\builder_impl.py", line 90, in init
"directory: %s" % export_dir)
AssertionError: Export directory already exists. Please specify a different export directory: inference_graph\saved_model
It's saying the directory already exists. Either change your command in the terminal to point to a new directory or remove everything in this directory that you are currently setting as the output directory.
python3 export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path YOUR_PIPELINE.CONFIG_FILE_PATH \
--trained_checkpoint_prefix YOUR_MODEL_CHECKPOINT_PATH \
--output_directory exported_graphs/saved_model2
Your try changing output directory.For example exported_graphs/saved_model2 as above.
I am trying to export trained TensorFlow models to C++ using freeze_graph.py. I am trying to export the ssd_mobilenet_v1_coco_2017_11_17 model using the following syntax:
bazel build tensorflow/python/tools:freeze_graph && \
bazel-bin/tensorflow/python/tools/freeze_graph \
--input_graph=frozen_inference_graph.pb \
--input_checkpoint=model.ckpt \
--output_graph=/tmp/frozen_graph.pb --output_node_names=softmax
The terminal said that the build was successful but showed the following error:
Traceback (most recent call last):
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 350, in <module>
app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 249, in main
FLAGS.saved_model_tags)
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 227, in freeze_graph
input_graph_def = _parse_input_graph_proto(input_graph, input_binary)
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 171, in _parse_input_graph_proto
text_format.Merge(f.read(), input_graph_def)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 525, in Merge
descriptor_pool=descriptor_pool)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 579, in MergeLines
return parser.MergeLines(lines, message)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 612, in MergeLines
self._ParseOrMerge(lines, message)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 627, in _ParseOrMerge
self._MergeField(tokenizer, message)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 671, in _MergeField
name = tokenizer.ConsumeIdentifierOrNumber()
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 1144, in ConsumeIdentifierOrNumber
raise self.ParseError('Expected identifier or number, got %s.' % result)
google.protobuf.text_format.ParseError: 2:1 : Expected identifier or number, got `.
On running that command again, I got this message:
WARNING: /home/my_username/tensorflow/tensorflow/core/BUILD:1814:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /home/my_username/tensorflow/tensorflow/tensorflow.bzl:1138:30
WARNING: /home/my_username/tensorflow/tensorflow/core/BUILD:1814:1: in includes attribute of cc_library rule //tensorflow/core:framework_headers_lib: '../../external/nsync/public' resolves to 'external/nsync/public' not below the relative path of its package 'tensorflow/core'. This will be an error in the future. Since this rule was created by the macro 'cc_header_only_library', the error might have been caused by the macro implementation in /home/my_username/tensorflow/tensorflow/tensorflow.bzl:1138:30
WARNING: /home/my_username/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:exporter': No longer supported. Switch to SavedModel immediately.
WARNING: /home/my_username/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target '//tensorflow/contrib/learn:learn' depends on deprecated target '//tensorflow/contrib/session_bundle:gc': No longer supported. Switch to SavedModel immediately.
INFO: Analysed target //tensorflow/python/tools:freeze_graph (0 packages loaded).
INFO: Found 1 target...
Target //tensorflow/python/tools:freeze_graph up-to-date:
bazel-bin/tensorflow/python/tools/freeze_graph
INFO: Elapsed time: 0.419s, Critical Path: 0.00s
INFO: Build completed successfully, 1 total action
Traceback (most recent call last):
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 350, in <module>
app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/platform/app.py", line 124, in run
_sys.exit(main(argv))
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 249, in main
FLAGS.saved_model_tags)
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 227, in freeze_graph
input_graph_def = _parse_input_graph_proto(input_graph, input_binary)
File "/home/my_username/tensorflow/bazel-bin/tensorflow/python/tools/freeze_graph.runfiles/org_tensorflow/tensorflow/python/tools/freeze_graph.py", line 171, in _parse_input_graph_proto
text_format.Merge(f.read(), input_graph_def)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 525, in Merge
descriptor_pool=descriptor_pool)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 579, in MergeLines
return parser.MergeLines(lines, message)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 612, in MergeLines
self._ParseOrMerge(lines, message)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 627, in _ParseOrMerge
self._MergeField(tokenizer, message)
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 671, in _MergeField
name = tokenizer.ConsumeIdentifierOrNumber()
File "/home/my_username/.cache/bazel/_bazel_my_username/3572bc2aff1de1dd37356cf341944e54/execroot/org_tensorflow/bazel-out/k8-opt/bin/tensorflow/python/tools/freeze_graph.runfiles/protobuf_archive/python/google/protobuf/text_format.py", line 1144, in ConsumeIdentifierOrNumber
raise self.ParseError('Expected identifier or number, got %s.' % result)
google.protobuf.text_format.ParseError: 2:1 : Expected identifier or number, got `.
I am exporting the ssd_mobilenet_v1_coco_2017_11_17 just as for practice. I intend to export my own trained models and test the output with this program.
I have built TensorFlow 1.5 using Bazel v0.11.1. I validated the installation using the following code snippet provided in the TensorFlow website;
# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))
I also ran the object detection iPython example notebook and it worked.
I am using Ubuntu 17.10.1 on a laptop with an Intel Core i5-8250U CPU, 8GB RAM, 1TB HDD and an NVIDIA MX150(2 GB) GPU. Please help. How do I export a trained model to C++?
in order to export object detection models I use the export_inference_graph.py code in research/object-detection. Here's an example of running the code:
python3 export_inference_graph.py \
--input_type image_tensor \
--pipeline_config_path path/to/model.config \
--trained_checkpoint_prefix path/to/model.ckpt-CHECKPOINTNUMBER \
--output_directory path/to/frozen_inference_graph
Then I use the created frozen_inference_graph.pb with a C++ code that is essentially the same as label_image with little modifications to run for a detection model rather than a classification one.