Invalid argument: Incompatible shapes: [4883,1] vs. [68,1] - python

To optimize my code, I changed following:
view1ID_train_data_sparse = load_sample(batch_size_view1, f_view1ID_train_data_sparse)
row_view1ID = range(len(view1ID_train_data_sparse[:,0].astype(int)))
col_view1ID = view1ID_train_data_sparse[:,1]
value_view1ID = view1ID_train_data_sparse[:,2]
view1ID_train_data = coo_matrix( ( value_view1ID, (row_view1ID, col_view1ID.astype(int)) ), shape=( len(row_view1ID), View1Number ) ).toarray()
to:
View1ID_x_temp = tf.placeholder(tf.int32, shape = [None, None], name = 'View1ID_x_temp')
View1ID_x_label = tf.expand_dims(View1ID_x_temp[:,1],1)
View1ID_x_index = tf.expand_dims(tf.range(0, batch_size_view1),1)
concated_1ID = tf.concat([View1ID_x_index, View1ID_x_label],1)
View1ID_x = tf.sparse_to_dense(concated_1ID, [batch_size_view1,View1Number], 1.0, 0.0)
But there is an error:
2018-02-26 17:25:25.665274: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
2018-02-26 17:25:25.666627: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
Traceback (most recent call last):
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/Users/LA_rovski/anaconda/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Transfer_Model_sparse1.py", line 638, in <module>
Pi: pi})
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
Caused by op 'gradients/sub_337_grad/BroadcastGradientArgs', defined at:
File "Transfer_Model_sparse1.py", line 489, in <module>
optimize = optimizer.minimize(objective)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 315, in minimize
grad_loss=grad_loss)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 368, in _MaybeCompile
return grad_fn() # Exit early
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 609, in _SubGrad
rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 411, in _broadcast_gradient_args
name=name)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
...which was originally created as op 'sub_337', defined at:
File "Transfer_Model_sparse1.py", line 424, in <module>
sample_log_likelihood_view1 = tf.reduce_sum(log_gaussian(Rating_view1, Y_view1, sigma_prior_y))
File "Transfer_Model_sparse1.py", line 38, in log_gaussian
return -0.5 * np.log(2 * np.pi) - tf.log(tf.abs(sigma)) - tf.square(x - mu) / (2 * tf.square(sigma))
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 821, in binary_op_wrapper
return func(x, y, name=name)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2627, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
I have checked the dimension of matrix for a lot of time but couldn't find out the solution. And this problem has disturbed me for a long time, thank you so much if you could help me.
It is so weird that if i turn n_batches(the number of iteration) down to 2, the bug would disappear.

Related

cuDNN launch failure (tensorflow-gpu/CUDA)

Traceback (most recent call last):
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([202027,64,1,1])
[[Node: bn_fm_1/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=0.001, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bn_fm_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, bn_fm/gamma/read, bn_fm/beta/read, bn_fm/moving_mean/read, bn_fm/moving_variance/read)]]
[[Node: AddN/_31 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_202_AddN", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "NeuralFM.py", line 350, in <module>
model.train(data.Train_data, data.Validation_data, data.Test_data)
File "NeuralFM.py", line 266, in train
init_train = self.evaluate(Train_data)
File "NeuralFM.py", line 311, in evaluate
predictions = self.sess.run((self.out), feed_dict=feed_dict)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([202027,64,1,1])
[[Node: bn_fm_1/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=0.001, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bn_fm_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, bn_fm/gamma/read, bn_fm/beta/read, bn_fm/moving_mean/read, bn_fm/moving_variance/read)]]
[[Node: AddN/_31 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_202_AddN", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'bn_fm_1/FusedBatchNorm', defined at:
File "NeuralFM.py", line 349, in <module>
model = NeuralFM(data.features_M, args.hidden_factor, eval(args.layers), args.loss_type, args.pretrain, args.epoch, args.batch_size, args.lr, args.lamda, eval(args.keep_prob), args.optimizer, args.batch_norm, activation_function, args.verbose, args.early_stop)
File "NeuralFM.py", line 89, in __init__
self._init_graph()
File "NeuralFM.py", line 123, in _init_graph
self.FM = self.batch_norm_layer(self.FM, train_phase=self.train_phase, scope_bn='bn_fm')
File "NeuralFM.py", line 224, in batch_norm_layer
is_training=False, reuse=True, trainable=True, scope=scope_bn)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 596, in batch_norm
scope=scope)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 382, in _fused_batch_norm
is_training, _fused_batch_norm_training, _fused_batch_norm_inference)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/utils.py", line 214, in smart_cond
return static_cond(pred_value, fn1, fn2)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/utils.py", line 194, in static_cond
return fn2()
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 379, in _fused_batch_norm_inference
data_format=data_format)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 906, in fused_batch_norm
name=name)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 3465, in _fused_batch_norm
is_training=is_training, name=name)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): cuDNN launch failure : input shape ([202027,64,1,1])
[[Node: bn_fm_1/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=0.001, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bn_fm_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, bn_fm/gamma/read, bn_fm/beta/read, bn_fm/moving_mean/read, bn_fm/moving_variance/read)]]
[[Node: AddN/_31 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_202_AddN", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
I keep getting this error, I've tried everything from downgrading CUDA, cuDNN, and tensorflow-gpu.
I'm currently on CUDA 9.0, cuDNN v7.4.2 for CUDA 9.0, tensorflow-gpu 1.9 and nothing I do seems to help. I'm running out of ideas, I've got every dependency I could imagine.
I'm trying to run this:
https://github.com/hexiangnan/neural_factorization_machine
EDIT: I have a feeling this is connected to https://github.com/tensorflow/tensorflow/issues/8090 but as I'm a little new to all this, I'm not sure if I'm right or how to address this.
I met the same error. The reason for mine is that my GPU does not have enough memory for the process.
I'm probably a few of years late to be of any help Alex but I've come up on this issue when on Windows with a specific GPU. Don't ask me why but adding
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'
if you have a single GPU works for me
I solved it by adding after imports this:
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script

How to understand tensorflow error message?

I found that the error message from TensorFlow, especially at run time (i.e. in sess.run()). There'is few document explaining how to understand the error message.
For example, there is a error message:
Traceback (most recent call last):
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10669 values, but the requested shape has 11172
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/posLoss/getPosLoss/Reshape/_1897 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4151_cond/getRefinementLoss/posLoss/getPosLoss/Reshape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 155, in <module>
res = runManager.modRun(i)
File "/home/hyh/projects/RFCN-tensorflow/Utils/RunManager.py", line 97, in modRun
return self.runAndMerge(feed_dict, options=options if options is not None else self.options, run_metadata=run_metadata if run_metadata is not None else self.run_metadata)
File "/home/hyh/projects/RFCN-tensorflow/Utils/RunManager.py", line 71, in runAndMerge
res = self.sess.run(self.inputTensors, feed_dict=feed_dict, options=options, run_metadata=run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10669 values, but the requested shape has 11172
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/posLoss/getPosLoss/Reshape/_1897 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4151_cond/getRefinementLoss/posLoss/getPosLoss/Reshape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape', defined at:
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 118, in <module>
trainOp = createUpdateOp()
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 104, in createUpdateOp
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 526, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 494, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 636, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 385, in _MaybeCompile
return grad_fn() # Exit early
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 636, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 521, in _ReshapeGrad
return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2', defined at:
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 96, in <module>
tf.losses.add_loss(net.getLoss(boxes, classes))
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/BoxNetwork.py", line 50, in getLoss
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 186, in loss
return tf.cond(tf.shape(refBoxes)[0] > 0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2063, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1913, in BuildCondBranch
original_result = fn()
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 186, in <lambda>
return tf.cond(tf.shape(refBoxes)[0] > 0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 173, in calcLoss
positiveLosses, negativeLosses = calcAllLosses(inAnchros, inBoxes, inRawSizes, inScores, inBoxSizes)
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 145, in calcAllLosses
classificationLoss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=scores, labels=refScores, name="classification_loss")
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1878, in softmax_cross_entropy_with_logits_v2
cost = array_ops.reshape(cost, output_shape)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 10669 values, but the requested shape has 11172
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/posLoss/getPosLoss/Reshape/_1897 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4151_cond/getRefinementLoss/posLoss/getPosLoss/Reshape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Process finished with exit code 1
I have two questions:
Where there is so many calling stack? First is Trackback and then During handling of the above exception, another exception occurred:, and Caused by..., finally ...which was originally created as op. What do they mean respectively?
Why there is so many error node? In the message above, it seems that there are two nodes that have gone wrong. What does it mean? Which node caused this error?
Tensorflow error messages are always quite verbose and this is mainly due to how TF works (because of the Computation Graph it builds).
In your case, it seems that you are reshaping a tensor with the wrong shape:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10669 values, but the requested shape has 11172
To see if that is the case try printing the shape of the tensor given to reshape op, i.e.:
input = tf.placeholder(tf.float32, [None, 28, 28, 1])
x = tf.layers.dense(input, units=64, activation=tf.nn.relu)
x = tf.Print(x, [x])
x_rs = tf.reshape(x, [-1, 28*28])

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)

i don't know how to solve this problem, this error message is useless for me to locate the problem. Thanks for helping!
here is the data in e.csv, D.csv and F.csv
e.csv: 1,2,3
4,5,6
7,8,9
D.csv: 11,12,13
14,15,16
17,18,19
F.csv: 21,22,23
24,25,26
27,28,29
here is my code
import tensorflow as tf
import os
file_dir = './KDD2'
fileNameQueue = []
for file in os.listdir(file_dir):
fileNameQueue.append(file)
print fileNameQueue
filename_queue = tf.train.string_input_producer(fileNameQueue, shuffle=False)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
col1,col2,label = tf.decode_csv(value, record_defaults=[[1],[1],[1]])
example = tf.pack([col1,col2])
example_batch, label_batch = tf.train.batch([example, label], batch_size=5)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(10):
print example_batch.eval()
coord.request_stop()
coord.join(threads)
here is the error message
root#ubuntumagiclab:/home/magiclab/SAE# python try.py
['e.csv', 'D.csv', 'F.csv']
Traceback (most recent call last):
File "try.py", line 30, in <module>
print example_batch.eval()
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/framework/ops.py", line 575, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/framework/ops.py", line 3633, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_0_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)
[[Node: batch = QueueDequeueMany[_class=["loc:#batch/fifo_queue"], component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
Caused by op u'batch', defined at:
File "try.py", line 24, in <module>
example_batch, label_batch = tf.train.batch([example, label], batch_size=5)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 692, in batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1099, in _queue_dequeue_many
timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
OutOfRangeError (see above for traceback): FIFOQueue '_0_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)
[[Node: batch = QueueDequeueMany[_class=["loc:#batch/fifo_queue"], component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
Problem is with filepaths. Please provide complete paths as shown below to fileName Queue.
This works for me:
fileNameQueue.append('/home/****/Desktop/stackoverflow/data/' +file)
Hope this helps.

Q: Tensorflow: tf.random_crop

I used the tf.random_crop() to crop 56*56 in my patch (80, 80).patch = tf.random_crop(patch, [56, 56, 3]) I get the error like follow:
Caused by op u'random_crop', defined at:
File "train.py", line 73, in <module>
main()
File "train.py", line 26, in main
low_res_batch, high_res_batch = batch_queue_for_training(TRAINING_DATA_PATH)
File "/data/code/super_resolution/anima2x/src/data_inputs.py", line 26, in batch_queue_for_training
high_res_patch = tf.random_crop(patch, [LABEL_SIZE, LABEL_SIZE, NUM_CHENNELS])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/random_ops.py", line 322, in random_crop
return array_ops.slice(value, offset, size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 484, in slice
return gen_array_ops._slice(input_, begin, size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 2868, in _slice
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Expected size[0] in [0, 38], but got 56
[[Node: random_crop = Slice[Index=DT_INT32, T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](Reverse, random_crop/mod, random_crop/size)]]
[[Node: case/If_1/ResizeArea/images/_67 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_150_case/If_1/ResizeArea/images", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
I don't know why the expected size must be in [0, some_num smaller than 56], Could anyone help to solve it?

tf.nn.dynamic_rnn raised Attempting to use uninitialized value error

My graph looks like this
with graph.as_default():
train_inputs = tf.placeholder(tf.int32, shape=[None, None])
with tf.device('/cpu:0'):
embeddings = tf.Variable(tf.zeros([vocab_size, options.embed_size]))
restorer = tf.train.Saver({'embeddings': embeddings})
init = tf.variables_initializer([embeddings])
uninit = tf.report_uninitialized_variables()
embed = tf.nn.embedding_lookup(embeddings, train_inputs)
# length() returns a [batch_szie,] tensor of true lengths of sentences (lengths before zero-padding)
sequence_length = length(embed)
lstm = tf.nn.rnn_cell.LSTMCell(options.rnn_size)
output, _ = tf.nn.dynamic_rnn(
lstm,
embed,
dtype=tf.float32,
swequence_length=sequence_length
)
And my session:
with tf.Session(graph=graph) as session:
restorer.restore(session, options.restore_path)
# tf.global_variables_initializer.run()
init.run()
print session.run([uninit])
while len(data.ids):
# data.generate_batch returns a list of size [batch_size, max_length], and zero-padding is used, when the sentences are shorter than max_length. For example, batch_inputs = [[1,2,3,4], [3,2,1,0], [1,2,0,0]]
batch_inputs, _ = data.generate_batch(options.batch_size)
feed_dict = {train_inputs: batch_inputs}
test = session.run([tf.shape(output)], feed_dict=feed_dict)
print test
Function length():
def length(self, sequence):
length = tf.sign(sequence)
length = tf.reduce_sum(length, reduction_indices=1)
length = tf.cast(length, tf.int32)
return length
The error i got:
Traceback (most recent call last):
File "rnn.py", line 103, in <module>
test = session.run([tf.shape(output)], feed_dict=feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.FailedPreconditionError: Attempting to use uninitialized value RNN/LSTMCell/W_0
[[Node: RNN/LSTMCell/W_0/read = Identity[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](RNN/LSTMCell/W_0)]]
Caused by op u'RNN/LSTMCell/W_0/read', defined at:
File "rnn.py", line 75, in <module>
sequence_length=sequence_length,
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 845, in dynamic_rnn
dtype=dtype)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 1012, in _dynamic_rnn_loop
swap_memory=swap_memory)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2636, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2469, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2419, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 995, in _time_step
skip_conditionals=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 403, in _rnn_step
new_output, new_state = call_cell()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 983, in <lambda>
call_cell = lambda: cell(input_t, state)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 496, in __call__
dtype, self._num_unit_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 329, in _get_concat_variable
sharded_variable = _get_sharded_variable(name, shape, dtype, num_shards)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn_cell.py", line 359, in _get_sharded_variable
dtype=dtype))
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variable_scope.py", line 677, in _get_single_variable
expected_shape=shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 224, in __init__
expected_shape=expected_shape)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/variables.py", line 367, in _init_from_args
self._snapshot = array_ops.identity(self._variable, name="read")
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1424, in identity
result = _op_def_lib.apply_op("Identity", input=input, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
FailedPreconditionError (see above for traceback): Attempting to use uninitialized value RNN/LSTMCell/W_0
[[Node: RNN/LSTMCell/W_0/read = Identity[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](RNN/LSTMCell/W_0)]]
However when I printed out the uninitialized variables, i got [array([], dtype=object)]
When i replaced init.run() with tf.global_variables_initializer.run(), it worked.
Any idea why init.run() doesn't work?
You defined init as follows:
init = tf.variables_initializer([embeddings])
This definition means that init initializes only the embeddings variable. Calling the tf.nn.dynamic_rnn() function creates more variables, representing the various internal weights in the LSTM, and these are not initialized by init.
By contrast, tf.global_variables_initializer() returns an operation that, when run, will initialize all of the (global) variables in your model, including those created for the LSTM.

Categories