cuDNN launch failure (tensorflow-gpu/CUDA)

cuDNN launch failure (tensorflow-gpu/CUDA) - python

Traceback (most recent call last):
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([202027,64,1,1])
[[Node: bn_fm_1/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=0.001, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bn_fm_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, bn_fm/gamma/read, bn_fm/beta/read, bn_fm/moving_mean/read, bn_fm/moving_variance/read)]]
[[Node: AddN/_31 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_202_AddN", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "NeuralFM.py", line 350, in <module>
model.train(data.Train_data, data.Validation_data, data.Test_data)
File "NeuralFM.py", line 266, in train
init_train = self.evaluate(Train_data)
File "NeuralFM.py", line 311, in evaluate
predictions = self.sess.run((self.out), feed_dict=feed_dict)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([202027,64,1,1])
[[Node: bn_fm_1/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=0.001, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bn_fm_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, bn_fm/gamma/read, bn_fm/beta/read, bn_fm/moving_mean/read, bn_fm/moving_variance/read)]]
[[Node: AddN/_31 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_202_AddN", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'bn_fm_1/FusedBatchNorm', defined at:
File "NeuralFM.py", line 349, in <module>
model = NeuralFM(data.features_M, args.hidden_factor, eval(args.layers), args.loss_type, args.pretrain, args.epoch, args.batch_size, args.lr, args.lamda, eval(args.keep_prob), args.optimizer, args.batch_norm, activation_function, args.verbose, args.early_stop)
File "NeuralFM.py", line 89, in __init__
self._init_graph()
File "NeuralFM.py", line 123, in _init_graph
self.FM = self.batch_norm_layer(self.FM, train_phase=self.train_phase, scope_bn='bn_fm')
File "NeuralFM.py", line 224, in batch_norm_layer
is_training=False, reuse=True, trainable=True, scope=scope_bn)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 596, in batch_norm
scope=scope)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 382, in _fused_batch_norm
is_training, _fused_batch_norm_training, _fused_batch_norm_inference)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/utils.py", line 214, in smart_cond
return static_cond(pred_value, fn1, fn2)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/utils.py", line 194, in static_cond
return fn2()
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 379, in _fused_batch_norm_inference
data_format=data_format)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/nn_impl.py", line 906, in fused_batch_norm
name=name)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 3465, in _fused_batch_norm
is_training=is_training, name=name)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/home/alex/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1740, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): cuDNN launch failure : input shape ([202027,64,1,1])
[[Node: bn_fm_1/FusedBatchNorm = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=0.001, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](bn_fm_1/FusedBatchNorm-0-TransposeNHWCToNCHW-LayoutOptimizer, bn_fm/gamma/read, bn_fm/beta/read, bn_fm/moving_mean/read, bn_fm/moving_variance/read)]]
[[Node: AddN/_31 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_202_AddN", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
I keep getting this error, I've tried everything from downgrading CUDA, cuDNN, and tensorflow-gpu.
I'm currently on CUDA 9.0, cuDNN v7.4.2 for CUDA 9.0, tensorflow-gpu 1.9 and nothing I do seems to help. I'm running out of ideas, I've got every dependency I could imagine.
I'm trying to run this:
https://github.com/hexiangnan/neural_factorization_machine
EDIT: I have a feeling this is connected to https://github.com/tensorflow/tensorflow/issues/8090 but as I'm a little new to all this, I'm not sure if I'm right or how to address this.

I met the same error. The reason for mine is that my GPU does not have enough memory for the process.

I'm probably a few of years late to be of any help Alex but I've come up on this issue when on Windows with a specific GPU. Don't ask me why but adding
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'
if you have a single GPU works for me

I solved it by adding after imports this:
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'
in the script

Related

How to understand tensorflow error message?

I found that the error message from TensorFlow, especially at run time (i.e. in sess.run()). There'is few document explaining how to understand the error message.
For example, there is a error message:
Traceback (most recent call last):
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10669 values, but the requested shape has 11172
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/posLoss/getPosLoss/Reshape/_1897 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4151_cond/getRefinementLoss/posLoss/getPosLoss/Reshape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 155, in <module>
res = runManager.modRun(i)
File "/home/hyh/projects/RFCN-tensorflow/Utils/RunManager.py", line 97, in modRun
return self.runAndMerge(feed_dict, options=options if options is not None else self.options, run_metadata=run_metadata if run_metadata is not None else self.run_metadata)
File "/home/hyh/projects/RFCN-tensorflow/Utils/RunManager.py", line 71, in runAndMerge
res = self.sess.run(self.inputTensors, feed_dict=feed_dict, options=options, run_metadata=run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10669 values, but the requested shape has 11172
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/posLoss/getPosLoss/Reshape/_1897 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4151_cond/getRefinementLoss/posLoss/getPosLoss/Reshape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape', defined at:
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 118, in <module>
trainOp = createUpdateOp()
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 104, in createUpdateOp
grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables())
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/optimizer.py", line 526, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 494, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 636, in _GradientsHelper
lambda: grad_fn(op, *out_grads))
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 385, in _MaybeCompile
return grad_fn() # Exit early
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py", line 636, in <lambda>
lambda: grad_fn(op, *out_grads))
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/array_grad.py", line 521, in _ReshapeGrad
return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None]
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op
op_def=op_def)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
...which was originally created as op 'RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2', defined at:
File "/home/hyh/projects/RFCN-tensorflow/main.py", line 96, in <module>
tf.losses.add_loss(net.getLoss(boxes, classes))
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/BoxNetwork.py", line 50, in getLoss
return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses)
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 186, in loss
return tf.cond(tf.shape(refBoxes)[0] > 0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
return func(*args, **kwargs)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2063, in cond
orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1913, in BuildCondBranch
original_result = fn()
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 186, in <lambda>
return tf.cond(tf.shape(refBoxes)[0] > 0, lambda: calcLoss(), lambda: tf.constant(0.0))
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 173, in calcLoss
positiveLosses, negativeLosses = calcAllLosses(inAnchros, inBoxes, inRawSizes, inScores, inBoxSizes)
File "/home/hyh/projects/RFCN-tensorflow/BoxEngine/RPN.py", line 145, in calcAllLosses
classificationLoss = tf.nn.softmax_cross_entropy_with_logits_v2(logits=scores, labels=refScores, name="classification_loss")
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 1878, in softmax_cross_entropy_with_logits_v2
cost = array_ops.reshape(cost, output_shape)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 6113, in reshape
"Reshape", tensor=tensor, shape=shape, name=name)
File "/home/hyh/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 10669 values, but the requested shape has 11172
[[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/classification_loss/Reshape_2_grad/Shape)]]
[[Node: cond/getRefinementLoss/posLoss/getPosLoss/Reshape/_1897 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4151_cond/getRefinementLoss/posLoss/getPosLoss/Reshape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Process finished with exit code 1
I have two questions:
Where there is so many calling stack? First is Trackback and then During handling of the above exception, another exception occurred:, and Caused by..., finally ...which was originally created as op. What do they mean respectively?
Why there is so many error node? In the message above, it seems that there are two nodes that have gone wrong. What does it mean? Which node caused this error?

Tensorflow error messages are always quite verbose and this is mainly due to how TF works (because of the Computation Graph it builds).
In your case, it seems that you are reshaping a tensor with the wrong shape:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 10669 values, but the requested shape has 11172
To see if that is the case try printing the shape of the tensor given to reshape op, i.e.:
input = tf.placeholder(tf.float32, [None, 28, 28, 1])
x = tf.layers.dense(input, units=64, activation=tf.nn.relu)
x = tf.Print(x, [x])
x_rs = tf.reshape(x, [-1, 28*28])

tf.keras - Importing model with batchnormalization layers

I've gotten stuck on this issue for a little while. I'm trying to run the code below with the tf_cnnvis (https://github.com/InFoCusp/tf_cnnvis) package for visualising learnt features in the network, where I import my protobuf model and then try and provide it a tensor containing some image data (which I believe is provided as a feed_dict, although I could be mistaken).
import numpy as np
import tensorflow as tf
import keras as k
import cv2
import tf_cnnvis as tfv
from tensorflow.python.platform import gfile
from keras import backend as K
model_filename = "saved_model.pb"
image = "test.jpg"
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8, allow_growth=False)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
K.set_session(sess)
K._LEARNING_PHASE = tf.constant(0)
K.set_learning_phase(0)
with gfile.FastGFile(model_filename, 'rb') as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
tf.import_graph_def(graph_def)
X = tf.placeholder(tf.float32, shape = [None, 48, 64, 3],name = "input") # placeholder for input images
y = tf.placeholder(tf.float32, shape = [None, 8])
im = np.array(cv2.imread(image))
im = np.expand_dims(im, 0)
layers = ['r', 'p', 'c']
init_op = init_op = tf.group(tf.global_variables_initializer(), tf.local_variables_initializer())
sess.run(init_op)
with sess.as_default():
is_success = tfv.activation_visualization(sess_graph_path=tf.get_default_graph(), value_feed_dict = {X : im}, layers=layers)
sess.close()
When I run my code, I get an "InvalidArgumentError" with this traceback:
Traceback (most recent call last):
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
return fn(*args)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'import/batch_normalization_1_input' with dtype float and shape [?,48,64,3]
[[{{node import/batch_normalization_1_input}} = Placeholder[_class=["loc:#import/batch_normalization/cond/FusedBatchNorm_1/Switch"], dtype=DT_FLOAT, shape=[?,48,64,3], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[{{node import/conv2d/Relu/_5}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_50_import/conv2d/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "vis2.py", line 36, in <module>
is_success = tfv.activation_visualization(sess_graph_path=tf.get_default_graph(), value_feed_dict = {X : im}, layers=layers)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 406, in activation_visualization
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 169, in _get_visualization
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 227, in _visualization_by_layer_type
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 288, in _visualization_by_layer_name
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 315, in _activation
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 887, in run
run_metadata_ptr)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1110, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run
run_metadata)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'import/batch_normalization_1_input' with dtype float and shape [?,48,64,3]
[[{{node import/batch_normalization_1_input}} = Placeholder[_class=["loc:#import/batch_normalization/cond/FusedBatchNorm_1/Switch"], dtype=DT_FLOAT, shape=[?,48,64,3], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[{{node import/conv2d/Relu/_5}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_50_import/conv2d/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'import/batch_normalization_1_input', defined at:
File "vis2.py", line 36, in <module>
is_success = tfv.activation_visualization(sess_graph_path=tf.get_default_graph(), value_feed_dict = {X : im}, layers=layers)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 406, in activation_visualization
path_logdir = path_logdir, path_outdir = path_outdir)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 159, in _get_visualization
s = _graph_import_function(PATH,s)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 177, in _graph_import_function
new_saver = tf.train.import_meta_graph(PATH) # Import graph
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1650, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1672, in _import_meta_graph_with_return_elements
**kwargs))
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3426, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3426, in <listcomp>
for c_op in c_api_util.new_tf_operations(self)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3285, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'import/batch_normalization_1_input' with dtype float and shape [?,48,64,3]
[[{{node import/batch_normalization_1_input}} = Placeholder[_class=["loc:#import/batch_normalization/cond/FusedBatchNorm_1/Switch"], dtype=DT_FLOAT, shape=[?,48,64,3], _device="/job:localhost/replica:0/task:0/device:GPU:0"]()]]
[[{{node import/conv2d/Relu/_5}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_50_import/conv2d/Relu", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Now, I've looked around and I've arrived (tentatively) at the conclusion that this is due to a learning phase variable that's set in the BatchNormalization layer that I have in the model. I'm unclear as to how to set the learning phase when you've imported the model. Some people set the learning phase before initializing the model (which as you can see, I have attempted), but in most examples of this they're using one of the large, pre-provided models (such as MNIST). Others provide the learning phase in the feed_dict, which I have also tried, like so:
with sess.as_default():
is_success = tfv.activation_visualization(sess_graph_path=tf.get_default_graph(), value_feed_dict = {X : im, K.learning_phase(): 0}, layers=layers)
But this gives me a different error message:
Traceback (most recent call last):
File "vis2.py", line 36, in <module>
is_success = tfv.activation_visualization(sess_graph_path=tf.get_default_graph(), value_feed_dict = {X : im, K.learning_phase(): 0}, layers=layers)
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 406, in activation_visualization
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 169, in _get_visualization
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 227, in _visualization_by_layer_type
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/tf_cnnvis.py", line 270, in _visualization_by_layer_name
File "/usr/local/anaconda3/lib/python3.6/site-packages/tf_cnnvis-1.0.0-py3.6.egg/tf_cnnvis/utils.py", line 79, in parse_tensors_dict
AttributeError: 'int' object has no attribute 'name'
At this stage, seeing as I'm still not completely sure if the problem I'm trying to fix is even the right one, I would very much appreciate some input. If there's anything else you need me to provide, please ask.

Invalid argument: Incompatible shapes: [4883,1] vs. [68,1]

To optimize my code, I changed following:
view1ID_train_data_sparse = load_sample(batch_size_view1, f_view1ID_train_data_sparse)
row_view1ID = range(len(view1ID_train_data_sparse[:,0].astype(int)))
col_view1ID = view1ID_train_data_sparse[:,1]
value_view1ID = view1ID_train_data_sparse[:,2]
view1ID_train_data = coo_matrix( ( value_view1ID, (row_view1ID, col_view1ID.astype(int)) ), shape=( len(row_view1ID), View1Number ) ).toarray()
to:
View1ID_x_temp = tf.placeholder(tf.int32, shape = [None, None], name = 'View1ID_x_temp')
View1ID_x_label = tf.expand_dims(View1ID_x_temp[:,1],1)
View1ID_x_index = tf.expand_dims(tf.range(0, batch_size_view1),1)
concated_1ID = tf.concat([View1ID_x_index, View1ID_x_label],1)
View1ID_x = tf.sparse_to_dense(concated_1ID, [batch_size_view1,View1Number], 1.0, 0.0)
But there is an error:
2018-02-26 17:25:25.665274: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
2018-02-26 17:25:25.666627: W tensorflow/core/framework/op_kernel.cc:1152] Invalid argument: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
Traceback (most recent call last):
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1039, in _do_call
return fn(*args)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1021, in _run_fn
status, run_metadata)
File "/Users/LA_rovski/anaconda/lib/python3.5/contextlib.py", line 66, in __exit__
next(self.gen)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "Transfer_Model_sparse1.py", line 638, in <module>
Pi: pi})
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 778, in run
run_metadata_ptr)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 982, in _run
feed_dict_string, options, run_metadata)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1032, in _do_run
target_list, options, run_metadata)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1052, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
Caused by op 'gradients/sub_337_grad/BroadcastGradientArgs', defined at:
File "Transfer_Model_sparse1.py", line 489, in <module>
optimize = optimizer.minimize(objective)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 315, in minimize
grad_loss=grad_loss)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/training/optimizer.py", line 386, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in gradients
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 368, in _MaybeCompile
return grad_fn() # Exit early
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 560, in <lambda>
grad_scope, op, func_call, lambda: grad_fn(op, *out_grads))
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 609, in _SubGrad
rx, ry = gen_array_ops._broadcast_gradient_args(sx, sy)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 411, in _broadcast_gradient_args
name=name)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
...which was originally created as op 'sub_337', defined at:
File "Transfer_Model_sparse1.py", line 424, in <module>
sample_log_likelihood_view1 = tf.reduce_sum(log_gaussian(Rating_view1, Y_view1, sigma_prior_y))
File "Transfer_Model_sparse1.py", line 38, in log_gaussian
return -0.5 * np.log(2 * np.pi) - tf.log(tf.abs(sigma)) - tf.square(x - mu) / (2 * tf.square(sigma))
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 821, in binary_op_wrapper
return func(x, y, name=name)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2627, in _sub
result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2336, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/LA_rovski/anaconda/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1228, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Incompatible shapes: [4883,1] vs. [68,1]
[[Node: gradients/sub_337_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/sub_337_grad/Shape, gradients/sub_337_grad/Shape_1)]]
I have checked the dimension of matrix for a lot of time but couldn't find out the solution. And this problem has disturbed me for a long time, thank you so much if you could help me.
It is so weird that if i turn n_batches(the number of iteration) down to 2, the bug would disappear.

Getting "PermissionDeniedError" when running the example program on Tensorflow

Sorry for my lack of knowledge, but I am trying to run the example on Tensorflow:
import numpy as np
import tensorflow as tf
feature_columns = [tf.feature_column.numeric_column("x", shape=[1])]
estimator = tf.estimator.LinearRegressor(feature_columns=feature_columns)
x_train = np.array([1., 2., 3., 4.])
y_train = np.array([0., -1., -2., -3.])
x_eval = np.array([2., 5., 8., 1.])
y_eval = np.array([-1.01, -4.1, -7, 0.])
input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=None, shuffle=True)
train_input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_train}, y_train, batch_size=4, num_epochs=1000, shuffle=False)
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
{"x": x_eval}, y_eval, batch_size=4, num_epochs=1000, shuffle=False)
estimator.train(input_fn=input_fn, steps=1000)
train_metrics = estimator.evaluate(input_fn=train_input_fn)
eval_metrics = estimator.evaluate(input_fn=eval_input_fn)
print("train metrics: %r"% train_metrics)
print("eval metrics: %r"% eval_metrics)
I got the following error message:
PermissionDeniedError: Failed to delete a file: C:\Users\Jeff\AppData\Local\Temp\tmpgpmjek44\graph.pbtxt.tmpe31b9f4677cb426fbaef32dadeaf1a4d; Permission denied
I found the error comes from the line "estimator.train(input_fn=input_fn, steps=1000)". I tried to look at the folder and the file. They are granted full control already. This maybe a stupid question but what can possibly the cause and solution here. Thank you so much in advance!
UPDATE:
I ran it from the root and got the following:
(C:\Users\Jeff\Anaconda3) C:\Users\Jeff>python test.py
WARNING:tensorflow:Using temporary folder as model directory:
C:\Users\Jeff\AppData\Local\Temp\tmp0yywjv30 2017-11-10
22:54:59.808636: I
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:137]
Your CPU supports instructions that this TensorFlow binary was not
compiled to use: AVX AVX2 2017-11-10 22:55:00.096842: I
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1030]
Found device 0 with properties: name: GeForce GTX 1060 major: 6 minor:
1 memoryClockRate(GHz): 1.6705 pciBusID: 0000:01:00.0 totalMemory:
6.00GiB freeMemory: 4.99GiB 2017-11-10 22:55:00.096927: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120]
Creating TensorFlow device (/device:GPU:0) -> (device: 0, name:
GeForce GTX 1060, pci bus id: 0000:01:00.0, compute capability: 6.1)
2017-11-10 22:55:02.512317: E
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366]
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2017-11-10
22:55:02.513461: E
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366]
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2017-11-10
22:55:02.513601: E
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366]
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2017-11-10
22:55:02.514975: E
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\cuda\cuda_blas.cc:366]
failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED 2017-11-10
22:55:02.515067: W
C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\stream_executor\stream.cc:1901]
attempting to perform BLAS operation using StreamExecutor without BLAS
support Traceback (most recent call last): File
"C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\client\session.py",
line 1323, in _do_call
return fn(*args) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\client\session.py",
line 1302, in _run_fn
status, run_metadata) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py",
line 473, in exit
c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InternalError: Blas GEMV
launch failed: m=1, n=4
[[Node: linear/linear_model/x/weighted_sum = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false,
_device="/job:localhost/replica:0/task:0/device:GPU:0"](linear/linear_model/x/Reshape,
linear/linear_model/x/weights)]]
[[Node: linear/gradients/linear/linear_model/x/weighted_sum_grad/tuple/control_dependency_1/_85
= _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0",
send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1,
tensor_name="edge_184_linear/gradients/linear/linear_model/x/weighted_sum_grad/tuple/control_dependency_1",
tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "test.py", line 39, in
estimator.train(input_fn=input_fn, steps=1000) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py",
line 302, in train
loss = self._train_model(input_fn, hooks, saving_listeners) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py",
line 783, in _train_model
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss]) File
"C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py",
line 521, in run
run_metadata=run_metadata) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py",
line 892, in run
run_metadata=run_metadata) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py",
line 967, in run
raise six.reraise(*original_exc_info) File "C:\Users\Jeff\Anaconda3\lib\site-packages\six.py", line 693, in
reraise
raise value File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py",
line 952, in run
return self._sess.run(*args, **kwargs) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py",
line 1024, in run
run_metadata=run_metadata) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\training\monitored_session.py",
line 827, in run
return self._sess.run(*args, **kwargs) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\client\session.py",
line 889, in run
run_metadata_ptr) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\client\session.py",
line 1120, in _run
feed_dict_tensor, options, run_metadata) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\client\session.py",
line 1317, in _do_run
options, run_metadata) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\client\session.py",
line 1336, in _do_call
raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Blas GEMV
launch failed: m=1, n=4
[[Node: linear/linear_model/x/weighted_sum = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false,
_device="/job:localhost/replica:0/task:0/device:GPU:0"](linear/linear_model/x/Reshape,
linear/linear_model/x/weights)]]
[[Node: linear/gradients/linear/linear_model/x/weighted_sum_grad/tuple/control_dependency_1/_85
= _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0",
send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1,
tensor_name="edge_184_linear/gradients/linear/linear_model/x/weighted_sum_grad/tuple/control_dependency_1",
tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Caused by op 'linear/linear_model/x/weighted_sum', defined at: File
"test.py", line 39, in
estimator.train(input_fn=input_fn, steps=1000) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py",
line 302, in train
loss = self._train_model(input_fn, hooks, saving_listeners) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py",
line 711, in _train_model
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\estimator.py",
line 694, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs) File
"C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\canned\linear.py",
line 348, in _model_fn
config=config) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\canned\linear.py",
line 118, in _linear_model_fn
logits = logit_fn(features=features) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\estimator\canned\linear.py",
line 70, in linear_logit_fn
features=features, feature_columns=feature_columns, units=units) File
"C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\feature_column\feature_column.py",
line 321, in linear_model
column, builder, units, weight_collections, trainable)) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\feature_column\feature_column.py",
line 1376, in _create_dense_column_weighted_sum
return math_ops.matmul(tensor, weight, name='weighted_sum') File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py",
line 1891, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File
"C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\ops\gen_math_ops.py",
line 2436, in _mat_mul
name=name) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\framework\op_def_library.py",
line 787, in _apply_op_helper
op_def=op_def) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py",
line 2956, in create_op
op_def=op_def) File "C:\Users\Jeff\Anaconda3\lib\site-packages\tensorflow\python\framework\ops.py",
line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): Blas GEMV launch failed:
m=1, n=4
[[Node: linear/linear_model/x/weighted_sum = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false,
_device="/job:localhost/replica:0/task:0/device:GPU:0"](linear/linear_model/x/Reshape,
linear/linear_model/x/weights)]]
[[Node: linear/gradients/linear/linear_model/x/weighted_sum_grad/tuple/control_dependency_1/_85
= _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0",
send_device="/job:localhost/replica:0/task:0/device:GPU:0",
send_device_incarnation=1,
tensor_name="edge_184_linear/gradients/linear/linear_model/x/weighted_sum_grad/tuple/control_dependency_1",
tensor_type=DT_FLOAT,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Its PermissionDeniedError:
You should run this script from the root as i can see for now.
Try it and update.

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)

i don't know how to solve this problem, this error message is useless for me to locate the problem. Thanks for helping!
here is the data in e.csv, D.csv and F.csv
e.csv: 1,2,3
4,5,6
7,8,9
D.csv: 11,12,13
14,15,16
17,18,19
F.csv: 21,22,23
24,25,26
27,28,29
here is my code
import tensorflow as tf
import os
file_dir = './KDD2'
fileNameQueue = []
for file in os.listdir(file_dir):
fileNameQueue.append(file)
print fileNameQueue
filename_queue = tf.train.string_input_producer(fileNameQueue, shuffle=False)
reader = tf.TextLineReader()
key, value = reader.read(filename_queue)
col1,col2,label = tf.decode_csv(value, record_defaults=[[1],[1],[1]])
example = tf.pack([col1,col2])
example_batch, label_batch = tf.train.batch([example, label], batch_size=5)
with tf.Session() as sess:
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(10):
print example_batch.eval()
coord.request_stop()
coord.join(threads)
here is the error message
root#ubuntumagiclab:/home/magiclab/SAE# python try.py
['e.csv', 'D.csv', 'F.csv']
Traceback (most recent call last):
File "try.py", line 30, in <module>
print example_batch.eval()
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/framework/ops.py", line 575, in eval
return _eval_using_default_session(self, feed_dict, self.graph, session)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/framework/ops.py", line 3633, in _eval_using_default_session
return session.run(tensors, feed_dict)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 766, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 964, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 1014, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-
packages/tensorflow/python/client/session.py", line 1034, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: FIFOQueue '_0_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)
[[Node: batch = QueueDequeueMany[_class=["loc:#batch/fifo_queue"], component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]
Caused by op u'batch', defined at:
File "try.py", line 24, in <module>
example_batch, label_batch = tf.train.batch([example, label], batch_size=5)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 692, in batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1099, in _queue_dequeue_many
timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
self._traceback = _extract_stack()
OutOfRangeError (see above for traceback): FIFOQueue '_0_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)
[[Node: batch = QueueDequeueMany[_class=["loc:#batch/fifo_queue"], component_types=[DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](batch/fifo_queue, batch/n)]]

Problem is with filepaths. Please provide complete paths as shown below to fileName Queue.
This works for me:
fileNameQueue.append('/home/****/Desktop/stackoverflow/data/' +file)
Hope this helps.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

cuDNN launch failure (tensorflow-gpu/CUDA) - python

I met the same error. The reason for mine is that my GPU does not have enough memory for the process.

I'm probably a few of years late to be of any help Alex but I've come up on this issue when on Windows with a specific GPU. Don't ask me why but adding import os os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0' if you have a single GPU works for me

I solved it by adding after imports this: os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true' in the script

Related

How to understand tensorflow error message?

tf.keras - Importing model with batchnormalization layers

Invalid argument: Incompatible shapes: [4883,1] vs. [68,1]

Getting "PermissionDeniedError" when running the example program on Tensorflow

OutOfRangeError (see above for traceback): FIFOQueue '_1_batch/fifo_queue' is closed and has insufficient elements (requested 5, current size 0)

Categories

Resources