TensorFlow offers many common evaluation metrics, but I don't know how to create my own metric.
I'm building a CNN model based on AlexNet for grasp detection and I want to use the rectangle metric when evaluating the data (like in this paper: https://arxiv.org/pdf/1412.3128.pdf). Rectangle metric means both of the following conditions are satisfied:
- The grasp angle is within 30 degree of the ground truth grasp.
- The Jaccard index of the predicted grasp and the ground truth is greater than 25 percent.
So my first try is to use the model of AlexNet which is available on TFLearn (https://github.com/tflearn/tflearn/blob/master/examples/images/alexnet.py) and create a file for calculating the metric with numpy. Below is the metric file which includes incomplete code (because I'm not allowed to share), but it has the main parts as following:
def grasp_error(grasps,targets,max_angle = 30,min_overlap=0.25):
return np.mean([np.max([grasp_classification(grasps[i],targets,max_angle,min_overlap) for i in range(grasps.shape[0])])]) #for target in targets[i]])
#compute the error of the test set
def grasp_classification(grasp,target,max_angle = 30,min_overlap = 0.25):
...
if abs(np.arctan2(grasp[sinpos],grasp[cospos]) - np.arctan2(target[sinpos],target[cospos]))< (max_angle * 2./180.)*np.pi:
if jaccard_index(grasp,target) > min_overlap:
return 1
return 0
# computes Jaccard index of two grasping rectangeles
def jaccard_index(grasp,target):
...
return intersect/overall
I have tried adding this to the metrics.py file in Tflearn folder:
class Rectangle(Metric):
def __init__(self, name="Rechtangle"):
super(Rectangle, self).__init__(name)
self.tensor = None
def build(self, predictions, targets, inputs=None):
with tf.name_scope('Rechtangle'): # <--------- name scope
with tf.Session() as sess:
#tf.InteractiveSession()
prediction = predictions.eval()
target = targets.eval()
self.tensor = tf.convert_to_tensor(grasp_error(prediction, target, max_angle = 30,min_overlap=0.25))
self.built = True
self.tensor.m_name = self.name
return self.tensor
And then using this in the end of the AlexNet:
rect_metric = tflearn.metrics.Rectangle()
network = regression(network, metric=rect_metric, optimizer='momentum',
loss='mean_square',
learning_rate=0.0005)
And I got this error:
File "gnet.py", line 57, in <module>
learning_rate=0.0005)
File "/usr/local/lib/python2.7/dist-packages/tflearn/layers/estimator.py", line 159, in regression
metric.build(incoming, placeholder, inputs)
File "/usr/local/lib/python2.7/dist-packages/tflearn/metrics.py", line 119, in build
prediction = predictions.eval()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 569, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3741, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value is_training
[[Node: is_training/read = Identity[T=DT_BOOL, _class=["loc:#is_training"], _device="/job:localhost/replica:0/task:0/cpu:0"](is_training)]]
There's is a problem with implementing the sess.eval() in the metrics file but it's the only way to turn a tensor to numpy array, isn't it? Please let me know if you have any ideas to fix this. Thank you very much!
EDIT: I tried another way, as recommended here: https://github.com/tflearn/tflearn/issues/207 and implementing this in the code:
def rect_metric(prediction, target, inputs):
x = []
sess = tf.InteractiveSession()
with sess as default:
pred = prediction.eval(session=sess)
tar = target.eval(session=sess)
x = tf.reduce_sum(grasp_error(pred,tar))
return x
Now the error does not show up but the training stops with this exception:
Reminder: Custom metric function arguments must be defined as: custom_metric(y_pred, y_true, x).
Related
Note: I have already seen similar questions: the same error, tell torch not to use GPU, but the answers do not work for me.
I have installed PyTorch version 1.13.0+cu117 (the latest), and the code structure is as follows (an image classification task):
# os.environ["CUDA_VISIBLE_DEVICES"]="" # required?
device = torch.device("cpu") # use CPU
...
train_set = DataLoader(
torchvision.datasets.ImageFolder(path, transform), **kwargs
)
...
model = myCNN().to(device)
optimizer = SGD(args)
loss = CrossEntropyLoss()
train()
I want to train on CPU.
For dataloader, in accordance to this, I've set pin_memory=True and non_blocking=pin_memory. The error persists even on setting pin_memory=False.
The training loop has the following structure:
for epoch in n_epochs:
model.train()
inputs, labels = inputs.to(device, non_blocking=non_blocking), labels.to(device, non_blocking=non_blocking)
Compute loss, back-propagate
The error traceback (on calling train()):
Traceback (most recent call last):
File "code.py", line 233, in <module>
train()
File "code.py", line 122, in train
outputs = model(inputs)
File "...\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "code.py", line 87, in forward
output = self.network(input)
File "...\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "...\torch\nn\modules\container.py", line 204, in forward
input = module(input)
File "...\torch\nn\modules\module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "...\torch\nn\modules\conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "...\torch\nn\modules\conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
Edit: There was a comment regarding possible issues due to the model. The model is roughly:
class myCNN(nn.Module):
def __init__(self, ...other args...):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),
nn.ReLU(),
nn.MaxPool2d(kernel_size),
... similar convolutional layers ...
nn.Flatten(),
nn.Linear(in_features, out_features)
)
def forward(self, input):
output = self.network(input)
return output
Since I have transferred both model and data to the same device, what could be the reason of this error? How to correct it?
The issue was due to incorrect usage of summary from torchinfo. It does a forward pass (if input size is provided), and the device is (by default) selected on basis of torch.cuda.is_available().
If device (as specified in the question) argument is given to summary, the training happens just fine.
I defined a custom tf.keras.Model and overrode train_step for implementing custom training logic. The dataset trainDataset is a tf.data object, each element containing (image, label) with different image sizes. I would like to perform data augmentation inside the train_step as in the code below. I believe my code including the part has no logical flaws, including the part where I use model.fit to train the model.
However, an error occurs telling me that it Cannot batch tensors with different shapes. I see that something is executed before train_step and that is blocking the training process. How could I solve this?
model=GeneralCNN(cfg, network, augmentation)
model.compile(optimizer, loss, cfg['training'])
...
trainLogs=model.fit(trainDataset.batch(cfg['batch_size']), epochs=1, validation_data=valDataset)
...(subclass of tf.keras.Model)
def train_step(self, data):
tf.print('check!')
images, labels = data
images = self.augmentation(images) # <--- includes resizing
# initialize important variables.
batch_size = tf.shape(images)[0]
# Train the network
with tf.GradientTape() as tape:
predictions = self.network(images)
loss = self.loss_fn(labels, predictions)
grads = tape.gradient(loss, self.network.trainable_weights)
self.optimizer.apply_gradients(zip(grads, self.network.trainable_weights))
# Update metrics
self.lossMetric.update_state(loss)
predictionsIndicies=tf.math.argmax(predictions, axis=1)
self.accuracyMetric.update_state((predictionsIndicies, labels))
return {"loss": self.lossMetric.result(), "accuracy": self.accuracyMetric.result()}
...
Error:
raceback (most recent call last):
File "train.py", line 116, in <module>
app.run(main)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 76, in main
trainLogs=model.fit(P, epochs=1, steps_per_epoch=1000, validation_data=valDataset)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/training.py", line 1183, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 889, in __call__
result = self._call(*args, **kwds)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3024, in __call__
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1961, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 596, in call
ctx=ctx)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot batch tensors with different shapes in component 0. First element had shape [375,500,3] and element 1 had shape [333,500,3].
[[node IteratorGetNext (defined at train.py:76) ]] [Op:__inference_train_function_1852]
Function call stack:
train_function
Edit: augmentation code just in case
the code works when I resize the dataset before model.fit
def BuildAugmentation(cfg):
augmentationType = cfg['augmentation']
if augmentationType=='none':
return SimpleResize(cfg)
elif augmentationType=='simple':
return SimpleAugmentation(cfg)
def SimpleResize(cfg):
# resizing only w/o augmentations
model=tf.keras.models.Sequential([
tfPreprocessing.Resizing(cfg['image_size'], cfg['image_size'])
])
return model
def SimpleAugmentation(cfg):
# custom simple augmenation w/ humble augmentations
model=tf.keras.models.Sequential([
tfPreprocessing.RandomRotation(factor=0.02),
tfPreprocessing.RandomZoom(height_factor=0.2, width_factor=0.2),
tfPreprocessing.Resizing(cfg['image_size'], cfg['image_size']),
tfPreprocessing.RandomFlip("horizontal")
])
return model
I'm trying to do transfer learning of an Inception-resnet v2 model pretrained on imagenet, using my own dataset and classes.
My original codebase was a modification of a tf.slim sample which I can't find anymore and now I'm trying to rewrite the same code using the tf.estimator.* framework.
I am running, however, into the problem of loading only some of the weights from the pretrained checkpoint, initializing the remaining layers with their default initializers.
Researching the problem, I found this GitHub issue and this question, both mentioning the need to use tf.train.init_from_checkpoint in my model_fn. I tried, but given the lack of examples in both, I guess I got something wrong.
This is my minimal example:
import sys
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import tensorflow as tf
import numpy as np
import inception_resnet_v2
NUM_CLASSES = 900
IMAGE_SIZE = 299
def input_fn(mode, num_classes, batch_size=1):
# some code that loads images, reshapes them to 299x299x3 and batches them
return tf.constant(np.zeros([batch_size, 299, 299, 3], np.float32)), tf.one_hot(tf.constant(np.zeros([batch_size], np.int32)), NUM_CLASSES)
def model_fn(images, labels, num_classes, mode):
with tf.contrib.slim.arg_scope(inception_resnet_v2.inception_resnet_v2_arg_scope()):
logits, end_points = inception_resnet_v2.inception_resnet_v2(images,
num_classes,
is_training=(mode==tf.estimator.ModeKeys.TRAIN))
predictions = {
'classes': tf.argmax(input=logits, axis=1),
'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
}
if mode == tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
exclude = ['InceptionResnetV2/Logits', 'InceptionResnetV2/AuxLogits']
variables_to_restore = tf.contrib.slim.get_variables_to_restore(exclude=exclude)
scopes = { os.path.dirname(v.name) for v in variables_to_restore }
tf.train.init_from_checkpoint('inception_resnet_v2_2016_08_30.ckpt',
{s+'/':s+'/' for s in scopes})
tf.losses.softmax_cross_entropy(onehot_labels=labels, logits=logits)
total_loss = tf.losses.get_total_loss() #obtain the regularization losses as well
# Configure the training op
if mode == tf.estimator.ModeKeys.TRAIN:
global_step = tf.train.get_or_create_global_step()
optimizer = tf.train.AdamOptimizer(learning_rate=0.00002)
train_op = optimizer.minimize(total_loss, global_step)
else:
train_op = None
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions,
loss=total_loss,
train_op=train_op)
def main(unused_argv):
# Create the Estimator
classifier = tf.estimator.Estimator(
model_fn=lambda features, labels, mode: model_fn(features, labels, NUM_CLASSES, mode),
model_dir='model/MCVE')
# Train the model
classifier.train(
input_fn=lambda: input_fn(tf.estimator.ModeKeys.TRAIN, NUM_CLASSES, batch_size=1),
steps=1000)
# Evaluate the model and print results
eval_results = classifier.evaluate(
input_fn=lambda: input_fn(tf.estimator.ModeKeys.EVAL, NUM_CLASSES, batch_size=1))
print()
print('Evaluation results:\n %s' % eval_results)
if __name__ == '__main__':
tf.app.run(main=main, argv=[sys.argv[0]])
where inception_resnet_v2 is the model implementation in Tensorflow's models repository.
If I run this script, I get a bunch of info log from init_from_checkpoint, but then, at session creation time, it seems it attempts to load the Logits weights from the checkpoint and fails because of incompatible shapes. This is the full traceback:
Traceback (most recent call last):
File "<ipython-input-6-06fadd69ae8f>", line 1, in <module>
runfile('C:/Users/1/Desktop/transfer_learning_tutorial-master/MCVE.py', wdir='C:/Users/1/Desktop/transfer_learning_tutorial-master')
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 710, in runfile
execfile(filename, namespace)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/1/Desktop/transfer_learning_tutorial-master/MCVE.py", line 77, in <module>
tf.app.run(main=main, argv=[sys.argv[0]])
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "C:/Users/1/Desktop/transfer_learning_tutorial-master/MCVE.py", line 68, in main
steps=1000)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 302, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\estimator.py", line 780, in _train_model
log_step_count_steps=self._config.log_step_count_steps) as mon_sess:
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 368, in MonitoredTrainingSession
stop_grace_period_secs=stop_grace_period_secs)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 673, in __init__
stop_grace_period_secs=stop_grace_period_secs)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 493, in __init__
self._sess = _RecoverableSession(self._coordinated_creator)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 851, in __init__
_WrappedSession.__init__(self, self._create_session())
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 856, in _create_session
return self._sess_creator.create_session()
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 554, in create_session
self.tf_sess = self._session_creator.create_session()
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\monitored_session.py", line 428, in create_session
init_fn=self._scaffold.init_fn)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\training\session_manager.py", line 279, in prepare_session
sess.run(init_op, feed_dict=init_feed_dict)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 889, in run
run_metadata_ptr)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1317, in _do_run
options, run_metadata)
File "C:\Users\1\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\client\session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [900] rhs shape= [1001] [[Node: Assign_1145 = Assign[T=DT_FLOAT,
_class=["loc:#InceptionResnetV2/Logits/Logits/biases"], use_locking=true, validate_shape=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"](InceptionResnetV2/Logits/Logits/biases, checkpoint_initializer_1145)]]
What am I doing wrong when using init_from_checkpoint? How exactly are we supposed to "use" it in our model_fn? And why is the estimator trying to load the Logits' weights from the checkpoint when I'm explicitly telling it not to?
Update:
After the suggestion in the comments, I tried alternative ways to call tf.train.init_from_checkpoint.
Using {v.name: v.name}
If, as suggested in the comment, I replace the call with {v.name:v.name for v in variables_to_restore}, I get this error:
ValueError: Assignment map with scope only name InceptionResnetV2/Conv2d_2a_3x3 should map
to scope only InceptionResnetV2/Conv2d_2a_3x3/weights:0. Should be 'scope/': 'other_scope/'.
Using {v.name: v}
If, instead, I try using the name:variable mapping, I get the following error:
ValueError: Tensor InceptionResnetV2/Conv2d_2a_3x3/weights:0 is not found in
inception_resnet_v2_2016_08_30.ckpt checkpoint
{'InceptionResnetV2/Repeat_2/block8_4/Branch_1/Conv2d_0c_3x1/BatchNorm/moving_mean': [256],
'InceptionResnetV2/Repeat/block35_9/Branch_0/Conv2d_1x1/BatchNorm/beta': [32], ...
The error continues listing what I think are all the variable names in the checkpoint (or could it be the scopes instead?).
Update (2)
After inspecting the latest error here above, I see that InceptionResnetV2/Conv2d_2a_3x3/weights is in the list of the checkpointed variables. The problem is that :0 at the end!
I'll now verify if this does indeed solve the problem and post an answer if that's the case.
Thanks to #KathyWu's comment, I got on the right track and found the problem.
Indeed, the way I was computing the scopes would include the InceptionResnetV2/ scope, that would trigger the load of all variables "under" the scope (i.e., all variables in the network). Replacing this with the correct dictionary, however, was not trivial.
Of the possible scope modes init_from_checkpoint accepts, the one I had to use was the 'scope_variable_name': variable one, but without using the actual variable.name attribute.
The variable.name looks like: 'some_scope/variable_name:0'. That :0 is not in the checkpointed variable's name and so using scopes = {v.name:v.name for v in variables_to_restore} will raise a "Variable not found" error.
The trick to make it work was stripping the tensor index from the name:
tf.train.init_from_checkpoint('inception_resnet_v2_2016_08_30.ckpt',
{v.name.split(':')[0]: v for v in variables_to_restore})
I find out {s+'/':s+'/' for s in scopes} didn't work, just because the variables_to_restore include something like "global_step", so scopes include the global scopes which could include everything. You need to print variables_to_restore, find "global_step" thing, and put it in "exclude".
I'm trying to load inception_resnet_v2_2016_08_30.ckpt file and do testing.
The code works well with single image (entering oneFile() function only once).
If I call oneFile() function twice, the following error occur:
ValueError: Variable InceptionResnetV2/Conv2d_1a_3x3/weights already
exists, disallowed. Did you mean to set reuse=True in VarScope?
Originally defined at:
I found related solution on Sharing Variables
If tf.variable_scope meet the same problem, could call scope.reuse_variables() to resolve this problem.
But I can't find the slim.arg_scope version to reuse the scope.
def oneFile(filepath):
imgPath = filepath
testImage_string = tf.gfile.FastGFile(imgPath, 'rb').read()
testImage = tf.image.decode_jpeg(testImage_string, channels=3)
processed_image = inception_preprocessing.preprocess_image(testImage, image_size, image_size, is_training=False)
processed_images = tf.expand_dims(processed_image, 0)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception_resnet_v2_arg_scope()):
#logits, end_points = inception_resnet_v2(images, num_classes = dataset.num_classes, is_training = False)
logits, _ = inception_resnet_v2(processed_images, num_classes=16, is_training=False)
probabilities = tf.nn.softmax(logits)
init_fn = slim.assign_from_checkpoint_fn(
checkpoint_file,
slim.get_model_variables(model_name))
with tf.Session() as sess:
init_fn(sess)
np_image, probabilities = sess.run([processed_images, probabilities])
probabilities = probabilities[0, 0:]
sorted_inds = [i[0] for i in sorted(enumerate(-probabilities), key=lambda x: x[1])]
#print(probabilities)
print(probabilities.argmax(axis=0))
#names = imagenet.create_readable_names_for_imagenet_labels()
#for i in range(15):
# index = sorted_inds[i]
# print((probabilities[index], names[index]))
def main():
for image_file in os.listdir(dataset_dir):
try:
image_type = imghdr.what(os.path.join(dataset_dir, image_file))
if not image_type:
continue
except IsADirectoryError:
continue
#image = Image.open(os.path.join(dataset_dir, image_file))
filepath = os.path.join(dataset_dir, image_file)
oneFile(filepath)
inception_resnet_v2_arg_scope
def inception_resnet_v2_arg_scope(weight_decay=0.00004,
batch_norm_decay=0.9997,
batch_norm_epsilon=0.001):
"""Yields the scope with the default parameters for inception_resnet_v2.
Args:
weight_decay: the weight decay for weights variables.
batch_norm_decay: decay for the moving average of batch_norm momentums.
batch_norm_epsilon: small float added to variance to avoid dividing by zero.
Returns:
a arg_scope with the parameters needed for inception_resnet_v2.
"""
# Set weight_decay for weights in conv2d and fully_connected layers.
with slim.arg_scope([slim.conv2d, slim.fully_connected],
weights_regularizer=slim.l2_regularizer(weight_decay),
biases_regularizer=slim.l2_regularizer(weight_decay)):
batch_norm_params = {
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon,
}
# Set activation_fn and parameters for batch_norm.
with slim.arg_scope([slim.conv2d], activation_fn=tf.nn.relu,
normalizer_fn=slim.batch_norm,
normalizer_params=batch_norm_params) as scope:
return scope
Complete error message:
./data/test/teeth/1/7070.jpg Traceback (most recent call last): File
"testing.py", line 111, in
main() File "testing.py", line 106, in main
cal(processed_images) File "testing.py", line 67, in cal
logits, _ = inception_resnet_v2(processed_images, num_classes=16, is_training=False) File
"/notebooks/transfer_learning_tutorial/inception_resnet_v2.py", line
123, in inception_resnet_v2
scope='Conv2d_1a_3x3') File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py",
line 181, in func_with_args
return func(*args, **current_args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py",
line 918, in convolution
outputs = layer.apply(inputs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py",
line 320, in apply
return self.call(inputs, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py",
line 286, in call
self.build(input_shapes[0]) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/convolutional.py",
line 138, in build
dtype=self.dtype) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py",
line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py",
line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py",
line 349, in get_variable
validate_shape=validate_shape, use_resource=use_resource) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py",
line 1389, in wrapped_custom_getter
*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py",
line 275, in variable_getter
variable_getter=functools.partial(getter, **kwargs)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/layers/base.py",
line 228, in _add_variable
trainable=trainable and self.trainable) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py",
line 1334, in layer_variable_getter
return _model_variable_getter(getter, *args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/layers/python/layers/layers.py",
line 1326, in _model_variable_getter
custom_getter=getter, use_resource=use_resource) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py",
line 181, in func_with_args
return func(*args, **current_args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py",
line 262, in model_variable
use_resource=use_resource) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py",
line 181, in func_with_args
return func(*args, **current_args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py",
line 217, in variable
use_resource=use_resource) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py",
line 341, in _true_getter
use_resource=use_resource) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/variable_scope.py",
line 653, in _get_single_variable
name, "".join(traceback.format_list(tb)))) ValueError: Variable InceptionResnetV2/Conv2d_1a_3x3/weights already exists, disallowed.
Did you mean to set reuse=True in VarScope? Originally defined at:
File
"/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py",
line 217, in variable
use_resource=use_resource) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py",
line 181, in func_with_args
return func(*args, **current_args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/framework/python/ops/variables.py",
line 262, in model_variable
use_resource=use_resource)
It seems like tf.reset_default_graph() before processing each image in your oneFile() function will solve this problem, as I encountered the same issue on a very similar example code. My understanding is that once you feed the image to the neural network (NN), because of the variable scope concept TensorFlow uses, it needs to be told that the variables can be reused before you can apply the NN to another image.
My guess would be that you specified the same scope for multiple variables in the graph. This error occurs when tensorflow finds multiple variables under the same scope which is irrespective of the next image or the next batch. When you create the graph, you should create it thinking about one image or batch only. If everything works well with the first batch or first image, tensorflow will take care of the next iterations including the scoping.
So check all the scopes in your model file. I am pretty sure you used the same name twice.
tf.contrib.crf doesn't seem to support sequences of length 1.
For example, if I run the example on https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/crf (mirror) and replace num_words = 20 by num_words = 1:
import numpy as np
import tensorflow as tf
# Data settings.
num_examples = 10
num_words = 1
num_features = 100
num_tags = 5
# Random features.
x = np.random.rand(num_examples, num_words, num_features).astype(np.float32)
# Random tag indices representing the gold sequence.
y = np.random.randint(num_tags, size=[num_examples, num_words]).astype(np.int32)
# All sequences in this example have the same length, but they can be variable in a real model.
sequence_lengths = np.full(num_examples, num_words - 1, dtype=np.int32)
# Train and evaluate the model.
with tf.Graph().as_default():
with tf.Session() as session:
# Add the data to the TensorFlow graph.
x_t = tf.constant(x)
y_t = tf.constant(y)
sequence_lengths_t = tf.constant(sequence_lengths)
# Compute unary scores from a linear layer.
weights = tf.get_variable("weights", [num_features, num_tags])
matricized_x_t = tf.reshape(x_t, [-1, num_features])
matricized_unary_scores = tf.matmul(matricized_x_t, weights)
unary_scores = tf.reshape(matricized_unary_scores,
[num_examples, num_words, num_tags])
# Compute the log-likelihood of the gold sequences and keep the transition
# params for inference at test time.
log_likelihood, transition_params = tf.contrib.crf.crf_log_likelihood(
unary_scores, y_t, sequence_lengths_t)
# Add a training op to tune the parameters.
loss = tf.reduce_mean(-log_likelihood)
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
# Train for a fixed number of iterations.
session.run(tf.global_variables_initializer())
for i in range(1000):
tf_unary_scores, tf_transition_params, _ = session.run(
[unary_scores, transition_params, train_op])
if i % 100 == 0:
correct_labels = 0
total_labels = 0
for tf_unary_scores_, y_, sequence_length_ in zip(tf_unary_scores, y,
sequence_lengths):
# Remove padding from the scores and tag sequence.
tf_unary_scores_ = tf_unary_scores_[:sequence_length_]
y_ = y_[:sequence_length_]
# Compute the highest scoring sequence.
viterbi_sequence, _ = tf.contrib.crf.viterbi_decode(
tf_unary_scores_, tf_transition_params)
# Evaluate word-level accuracy.
correct_labels += np.sum(np.equal(viterbi_sequence, y_))
total_labels += sequence_length_
accuracy = 100.0 * correct_labels / float(total_labels)
print("Accuracy: %.2f%%" % accuracy)
I get the error message:
Traceback (most recent call last):
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1022, in _do_call
return fn(*args)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1004, in _run_fn
status, run_metadata)
File "C:\Anaconda\envs\py35\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 469, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape <unknown> is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#rnn/TensorArray_1"], dtype=DT_FLOAT, element_shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, rnn/TensorArrayUnstack/range, gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Francky\Documents\GitHub\nlp\neurodeid\test\CRF_v2.py", line 47, in <module>
[unary_scores, transition_params, train_op])
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 767, in run
run_metadata_ptr)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\client\session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnimplementedError: TensorArray has size zero, but element shape <unknown> is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#rnn/TensorArray_1"], dtype=DT_FLOAT, element_shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, rnn/TensorArrayUnstack/range, gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
Caused by op 'gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3', defined at:
File "C:\Users\Francky\Documents\GitHub\nlp\neurodeid\test\CRF_v2.py", line 41, in <module>
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\training\optimizer.py", line 288, in minimize
grad_loss=grad_loss)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\training\optimizer.py", line 354, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\gradients_impl.py", line 482, in gradients
in_grads = grad_fn(op, *out_grads)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\tensor_array_grad.py", line 186, in _TensorArrayScatterGrad
grad = g.gather(indices)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\tensor_array_ops.py", line 348, in gather
element_shape=element_shape)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 2226, in _tensor_array_gather_v3
element_shape=element_shape, name=name)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
op_def=op_def)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\framework\ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\framework\ops.py", line 1264, in __init__
self._traceback = _extract_stack()
...which was originally created as op 'rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3', defined at:
File "C:\Users\Francky\Documents\GitHub\nlp\neurodeid\test\CRF_v2.py", line 37, in <module>
unary_scores, y_t, sequence_lengths_t)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 156, in crf_log_likelihood
log_norm = crf_log_norm(inputs, sequence_lengths, transition_params)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\contrib\crf\python\ops\crf.py", line 123, in crf_log_norm
dtype=dtypes.float32)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\rnn.py", line 545, in dynamic_rnn
dtype=dtype)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\rnn.py", line 663, in _dynamic_rnn_loop
for ta, input_ in zip(input_ta, flat_input))
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\rnn.py", line 663, in <genexpr>
for ta, input_ in zip(input_ta, flat_input))
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\tensor_array_ops.py", line 400, in unstack
indices=math_ops.range(0, num_elements), value=value, name=name)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\tensor_array_ops.py", line 428, in scatter
name=name)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 2492, in _tensor_array_scatter_v3
name=name)
File "C:\Anaconda\envs\py35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 763, in apply_op
op_def=op_def)
UnimplementedError (see above for traceback): TensorArray has size zero, but element shape <unknown> is not fully defined. Currently only static shapes are supported when packing zero-size TensorArrays.
[[Node: gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGatherV3 = TensorArrayGatherV3[_class=["loc:#rnn/TensorArray_1"], dtype=DT_FLOAT, element_shape=<unknown>, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/TensorArrayGradV3, rnn/TensorArrayUnstack/range, gradients/rnn/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3_grad/TensorArrayGrad/gradient_flow)]]
num_words = 2 or num_words = 5 work. I see on
https://github.com/tensorflow/tensorflow/blob/e121667dc609de978a223c56ee906368d2c4ceef/tensorflow/contrib/crf/python/ops/crf.py#L121 (mirror) is already decrementing the sequence_length by 1:
# Compute the alpha values in the forward algorithm in order to get the
# partition function.
forward_cell = CrfForwardRnnCell(transition_params)
_, alphas = rnn.dynamic_rnn(
cell=forward_cell,
inputs=rest_of_input,
sequence_length=sequence_lengths - 1,
initial_state=first_input,
dtype=dtypes.float32)
log_norm = math_ops.reduce_logsumexp(alphas, [1])
return log_norm
However, changing sequence_lengths = np.full(num_examples, num_words - 1, dtype=np.int32) to sequence_lengths = np.full(num_examples, num_words, dtype=np.int32) does not solve the issue when num_words = 1 .
How to fix this issue so that the CRF layer supports sequences of length 1?
Tested with TensorFlow 1.0.0 on Windows 7 SP1 x64 Ultimate and TensorFlow-GPU 1.0.0 on Ubuntu 14.04.4 LTS x64. I created an issue in the TensorFlow repository but it got closed due to inactivity: https://github.com/tensorflow/tensorflow/issues/7751 (mirror)