Tensorflow GPU model failing to train on custom images - python

i have recently been playing around with tensorflow object detection with GPU processor, and i have encountered an error when trying to train my model with custom images.The error tracestack is as follows:
WARNING:tensorflow:From C:\tensorflow1\models\research\object_detection\trainer.py:260: create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.create_global_step
WARNING:tensorflow:num_readers has been reduced to 1 to match input file shards.
Traceback (most recent call last):
File "train.py", line 184, in <module>
tf.app.run()
File "C:\Users\Dan\AppData\Local\conda\conda\envs\tensorflow1\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 180, in main
graph_hook_fn=graph_rewriter_fn)
File "C:\tensorflow1\models\research\object_detection\trainer.py", line 274, in train
train_config.prefetch_queue_capacity, data_augmentation_options)
File "C:\tensorflow1\models\research\object_detection\trainer.py", line 80, in create_input_queue
include_keypoints=include_keypoints))
File "C:\tensorflow1\models\research\object_detection\core\preprocessor.py", line 3147, in preprocess
(func.__name__))
ValueError: The function random_horizontal_flip does not exist in func_arg_map
I am using an Anaconda interpreter with Python 3.6, tio reproduce this error i followed all the steps in the link https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10 .
The command which gave me this error was:
python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
it is important to note i did not encounter any issues before training the model. I would be greatfull if someone could explain this error to me and even help me fix it, thanks in advance :-)

You might forget to add the below line to ~/.bashrc:
export PYTHONPATH=$PYTHONPATH:pwd:pwd/slim

Related

Is there a way to fix this error code with DeepVirFinder?

I will try to be as much help as I can, but this is certainly a bit out of my depth.
I am trying to run the metagenomics package 'DeepVirFinder' on my fasta file 'my_seqs.fa' within terminal on my Mac. I have followed the GitHub repository instructions (as found here https://github.com/jessieren/DeepVirFinder). I have created a conda environment with all the necessary packages.
Into my terminal I have inputted
python dvf.py -i ~/Documents/PairwiseANI/my_seqs.fna -o ~/Documents/DeepVirFinder/ -l 1000 -c 2
this receives an output error of
Using Theano backend.
1. Loading Models.
model directory /data2/joshcole/DeepVirFinder/models
Traceback (most recent call last):
File "dvf.py", line 131, in <module>
modDict[contigLengthk] = load_model(os.path.join(modDir, modName))
File "/home/ggb_joshcole/miniconda3/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 419, in load_model
model = _deserialize_model(f, custom_objects, compile)
File "/home/ggb_joshcole/miniconda3/envs/dvf/lib/python3.6/site-packages/keras/engine/saving.py", line 224, in _deserialize_model
model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
From the GitHub repository, what it should return upon a successful run (using example template names) is as follows:
Using Theano backend.
1. Loading Models.
model directory /auto/cmb-panasas2/renj/software/DeepVirFinder/models
2. Encoding and Predicting Sequences.
processing line 1
processing line 1389
3. Done. Thank you for using DeepVirFinder.
output in ./test/crAssphage.fa_gt300bp_dvfpred.txt
Any help for how to fix this error would be greatly appreciated. I have tried to download and jig around with potential conda fixes, but it doesn't appear to be a problem with any dependancies + python is fully up to date.
Thank you for reading
Apologies - I found out it was an error between h5py and tensorflow. Had to downgrade h5py to 2.10.0.

Loading Model Error using Pixellib in Python

For context, I am running an apple silicon mac and have used the rosetta terminal + miniconda to create a venv that runs python 3.7.
Here is the code I am trying to run.
from pixellib.instance import instance_segmentation
segment_image = instance_segmentation()
segment_image.load_model("mask_rcnn_coco.h5")
And this is the error below. I think it may be due to issues with the access to the GPU but I cannot be sure. Have been working on it for a few days.
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "/Users/USERNAME/PycharmProjects/test/main.py", line 16, in <module>
segment_image.load_model("mask_rcnn_coco.h5")
File "/Users/USERNAME/miniconda3/envs/cowsUpdate/lib/python3.7/site-packages/pixellib/instance/__init__.py", line 65, in load_model
self.model.load_weights(model_path, by_name= True)
File "/Users/USERNAME/miniconda3/envs/cowsUpdate/lib/python3.7/site-packages/pixellib/instance/mask_rcnn.py", line 2110, in load_weights
hdf5_format.load_weights_from_hdf5_group_by_name(f, layers)
File "/Users/USERNAME/miniconda3/envs/cowsUpdate/lib/python3.7/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 718, in load_weights_from_hdf5_group_by_name
original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'
I encountered similar issue when using tf.keras.models.load_weights(), and I downgrade h5py from 2.10 to 2.8.0 in tensorflow 2.0.0, then it works, maybe you can have a try.

TypeError('Keyword argument not understood:', 'groups') in keras.models load_model

After training a model using Google Colab, I downloaded it using the following command (inside Google Colab):
model.save('model.h5')
from google.colab import files
files.download('model.h5')
My problem is that when I try to load the downloaded model.h5 using my local machine (outside Google Colab), I get the following error:
[input]
from keras.models import load_model
model = load_model(model.h5)
[output]
Traceback (most recent call last):
File "test.py", line 2, in <module>
model = load_model(filepath = 'saved_model/model2.h5',custom_objects=None,compile=True, )
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 184, in load_model
return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 177, in load_model_from_hdf5
model = model_config_lib.model_from_config(model_config,
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/saving/model_config.py", line 55, in model_from_config
return deserialize(config, custom_objects=custom_objects)
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/layers/serialization.py", line 105, in deserialize
return deserialize_keras_object(
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 369, in deserialize_keras_object
return cls.from_config(
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/engine/sequential.py", line 397, in from_config
layer = layer_module.deserialize(layer_config,
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/layers/serialization.py", line 105, in deserialize
return deserialize_keras_object(
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 375, in deserialize_keras_object
return cls.from_config(cls_config)
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 655, in from_config
return cls(**config)
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 582, in __init__
super(Conv2D, self).__init__(
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py", line 121, in __init__
super(Conv, self).__init__(
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py", line 456, in _method_wrapper
result = method(self, *args, **kwargs)
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py", line 294, in __init__
generic_utils.validate_kwargs(kwargs, allowed_kwargs)
File "/home/lucasmirachi/anaconda3/envs/myenviron/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py", line 792, in validate_kwargs
raise TypeError(error_message, kwarg)
TypeError: ('Keyword argument not understood:', 'groups')
Does anyone know what is this 'groups' keyword argument not understood?
Instead of using from keras.models I have tried using from tensorflow.keras.models but I had no success, I got the same error.
In both Google Colab and on my local machine I'm running Keras '2.4.3'
Thank you all in advance!
I commented earlier saying I had the same exact error from doing the same exact thing. I just solved it by upgrading both tensorflow and keras on my local machine
pip install --upgrade tensorflow
pip install --upgrade keras
The error was probably due to differing versions of the packages between Colab and local machine. Hope this works for you, too.
I had the same issue because I was saving and loading the model with different versions of tensorflow. I saved a model with tf 2.3.0 then loaded it with tf 2.1.0.
I made sure that both saving and loading use the same venv which fixed the issue for me.
I faced the same issue so I checked for the versions of tensorflow and keras in google-colaboratory and found the following:
I solved the issue by installing tensorflow and keras with the commands below in my anaconda environment:
pip install tensorflow-gpu==2.4.1
pip install Keras==2.4.3
If you want to stay on the same tf version, a workaround is model.load_weights("model_path"). Not the best solution, but it works
I have in colab (Google) tensorflow version 2.9.2
and in my Raspberry 4 tensorflow versio 2.4.1. So diferent versions.
I made in colab a pre-training model VGG19 with input_shape(220,220,3). And I classificated 2 types image.
I SOLVED like this:
In colab (making model):
# serialize model to JSON
model_json = loaded_model2.to_json()
with open('/content/drive/MyDrive/dataset/extract/model_5.json', "w") as json_file:
json_file.write(model_json)
# serialize weights to HDF5
loaded_model2.save_weights('/content/drive/MyDrive/model_5.h5')
print("Saved model to disk")
Then, in my Raspberry I create a model.
model_new = tf.keras.Sequential()
model_new.add(tf.keras.applications.VGG19(include_top=false, weights='imagenet',pooling='avg',input_shape=(220,220,3)))
model_new.add(tf.keras.layers.Dense(2,activation="softmax"))
opt = tf.keras.optimizers.SGC(0,004)
model_new.compile(loss='categorical_crossentropy',optimizer=opt,metrics=['accuracy'])
And then, I load weights from colab in this model made in my Raspberry 4. Only .h5 file with weights:
model_new.load_weights('/home/pi/projects/models/model_5.h5)

AttributeError: 'FasterRcnn' object has no attribute 'inplace_batchnorm_update'

I am trying to train a pretrained "faster_rcnn_resnet101_kitti" model for the tensorflow object detection API.
But everytime I try to run
python3 train.py --logtostderr --train_dir='/training/' --pipeline_config_path='/training/faster_rcnn_resnet101_kitti.config'
I receive the following error
Traceback (most recent call last):
File "train.py", line 167, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist- packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "train.py", line 163, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/usr/local/lib/python3.5/dist-packages/object_detection-0.1-py3.5.egg/object_detection/trainer.py", line 211, in train
detection_model = create_model_fn()
File "/usr/local/lib/python3.5/dist-packages/object_detection-0.1-py3.5.egg/object_detection/builders/model_builder.py", line 96, in build
add_summaries)
File "/usr/local/lib/python3.5/dist-packages/object_detection-0.1-py3.5.egg/object_detection/builders/model_builder.py", line 272, in _build_faster_rcnn_model
frcnn_config.inplace_batchnorm_update)
AttributeError: 'FasterRcnn' object has no attribute 'inplace_batchnorm_update'
I had this error too, and for me it was because I had not re-compiled my .proto-files after I pulled the last updates from the TF models repository.
To recompile (on Linux):
# From tensorflow/models/research/ folder
protoc object_detection/protos/*.proto --python_out=.
I assume that the failing code tries to read the attribute/field inplace_batchnorm_update from the faster rcnn config, which (assumable) does not exist in the older versions. I hope this helps you too.
My versions are: tensorflow-gpu 1.7.0 and have the TF models commit hash 77d3bbefeb33e89bfa1eee707151e5d794d1222b with message "Merge pull request #3888 from hsm207/patch-3 Fix typo".
Recompiling on Windows
I know from own experience that, compared to Windows, compiling many files as above is easy in Linux as a one-liner. For Windows, here is something to make the process less cumbersome:
In this issue, davemers0160 has shared
a script for compiling on Windows.
Just save this file as a .bat-file:
#echo off
setlocal
echo Searching for new .proto files...
for %%F in (object_detection\protos\*.proto) do (
echo %%F
protoc %%F --python_out=.
)
echo Complete!
Run that file from the same folder as mentioned above. As the question was in Linux, I've just added this to the bottom in case a Windows user come along to read this too.
I had the same error after I updated the models repository.
I re-compiled .proto files, but it still has the error.
According to the log:
File "/home/duane/anaconda3/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/builders/model_builder.py", line 164, in _build_ssd_model
inplace_batchnorm_update=ssd_config.inplace_batchnorm_update)
I think maybe it caused by the version of object_detection-0.1-py3.6.egg is too old , So I re-installed models/research/setup.py:
# Form /models/research/
python setup.py build
python setup.py install
Then it has no error.
NOTE: I did re-compile .proto files before I re-install setup.py.
The more details you can see #3968
Hope this can help you.

Errors when running TensorFlow object detection API model training

I was following the instruction of TensorFlow Object Detection API, and I was trying to train a model with Oxford cats data set, I did every step on the instruction but the training process didn't start and gives some errors, anyone who has similar error could share experience with me?
My system is macOS Sierra 10.12.4, Python version 2.7.13
The error message is
Traceback (most recent call last):
File "object_detection/train.py", line 197, in <module>
tf.app.run()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "object_detection/train.py", line 144, in main
model_config, train_config, input_config = get_configs_from_multiple_files()
File "object_detection/train.py", line 126, in get_configs_from_multiple_files
text_format.Merge(f.read(), train_config)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/lib/io/file_io.py", line 125, in read
pywrap_tensorflow.ReadFromStream(self._read_buf, length, status))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py", line 24, in __exit__
self.gen.next()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.FailedPreconditionError: .
The command I ran in the console:
python object_detection/train.py --logtostderr --pipline_config_path=object_detection/models/pet_model/ssd_mobilenet_v1_pets.config --train_dir=object_detection/models/pet_model/train
update
Just find the answer to this issue, it caused by a typo. The command should be
python object_detection/train.py --logtostderr --pipeline_config_path=object_detection/models/pet_model/ssd_mobilenet_v1_pets.config --train_dir=object_detection/models/pet_model/train

Categories